First up is the news that there are more than 700 members. Nice.
Important date for your diaries is 25 September 2017 when another round of #corpusmooc is launching. This time new sections are promised and most notable new addition is a new version of LancsBox. Check out the following two cute vids being used to promote #corpusmooc 2017:
Also if you use Twitter you can follow the bot corpusmoocRT
Next up are some great plenary videos from this years Corpus Linguistics 2017 knees-up in Birmingham plus related notes from conference by John Williams.
Checking the distribution of the pair on the one hand/on the other hand in BYU-COCA sections.
A graphic trying to depict keywords as calculated in AntConc.
A possible way to find collocations suitable for various proficiency levels.
And finally for a bit o’ fun is this the longest term in ELT? And The Banbury Corpus Revisted by Michael Swan.
Thanks for reading and for those coming off a summer break much energy to you for the new teaching year.
If you follow events in the UK one can say, without much accusation of hyperbole, that these are indeed strange times.
So why not turn to the relative sanity of corpus linguistics community news 7.
First up is an example of searching BYU-COCA for use of a preposition of place.
Next a post on one way to explore some recent audio-video corpora.
A couple of posts related to history of CL.
A top tip when using BootCat.
My recommended link is to a mini or maybe it’s a micro CL course by Oxford Dictionaries.
A tool that uses TF-IDF scores to extract n-grams using as an example prime minister questions from ex-prime minister David Cameron and the still leader of the opposition Jeremy Corbyn.
Do check previous corpus linguistics community posts if you haven’t yet.
Thanks for reading and have a good summer/winter.
Although some say G+ is a dying forum the CL G+ group now has 482 members, nice. Admittedly not many interact but I do (like to) think a lot do appreciate the resources put on there. I had the pleasure of being a mentor on the Lancaster University Future Learn Corpus Linguistics MOOC last September, which is going to have another round next September so look out for that.
First up for this installment of news I check the claim that elementary my dear Watson was never used in the Sherlock Holmes stories.
Next I challenged readers to check how many business idioms in a list can be accounted for in relevant corpora.
There is a useful table of search syntax for the COCA interface.
My search for publically available spoken corpora.
A list of phrasal verbs Jeremy Corbyn used in his final rally speech before becoming Labour leader.
Using the SKELL interface to get some good examples for a review quiz.
Some AntConc alternatives.
Bypassing limits of spreadsheet rows.
Finally a description of using a scraping tool to get biology and medical abstracts in simpler language that might be suitable for EAP students.
Thanks for reading. And don’t forget to check out the previous corpus linguistics community news if you haven’t already.
It’s been a long time, I shouldn’t have left you
Without some corpus news to read through
(I know you got corpus soul)
First off, if you are a user of BYU suite of corpus tools do consider helping to correct their corpus of soap operas, should you get to 500 words your name will be in the acknowledgements. Nice.
Next up are some interviews with Alex Boulton on some issues in DDL, Ivor Timmis on his new corpus book for ELT and Andrew Caines on a spoken corpus project.
For those interested in XML tagging there is something on using UK 2015 election forewords to follow a tutorial using rhetorical tagging.
For those interested in multi-word tagging some descriptions, part 1 and part 2 of using one program called AMALGrAM 2.0.
Finally an fyi to check out the latest version of Ted Corpus Search Engine which now has translations and synced transcriptions.
Till next time.
Thanks for reading.
Another installment of the Google+ Corpus Linguistics Community news. In addition I include links to some visual aids that were designed to answer questions people in the second round of #corpusmooc posed.
Bigger is not necessarily better – Here I talk about some minor aspects of the paper, and not the more interesting aspects that is, the paper compiled a list of approximately 14000 pairs of collocations that are worth teaching. The list is not available as of yet. The other main finding is that frequency is more important than dispersion or chronological data when identifying collocations, with human judgement remaining a key factor in deciding on useful collocations.
Google as a corpus with students – Some interesting recent developments on using Google as a corpus with a short list of relevant online reading.
How to develop effective concordance materials using online corpus – My interpretation of a slideshow by a Korean researcher on using data driven learning materials.
#corpusmooc Visual Aids:
Tokens, Types, Lemmas, Word families
Collocations and Colligations
Do check out the other corpus lingustics community news if you haven’t already.
I have realised it’s been a while since I have reported any potentially useful posts I have done over at the G+ CL community. So here is bullletin number 3.
Some pointers when re-writing text for graded readers – this is my interpretation of a Japanese researcher’s slide presentation so I may be talking out me backside!
AntWordProfiler and specialised vocabulary profiling – this arose out of a question that a participant on the iTDi ELT Reading materials design course had.
Videogrep, a tool to make concordances of video – very neat tool
Building your own corpus -TagAnt – continuation of my series showing how to use TagAnt POS tagging to mine your DIY corpus.
Semantic tagging Sugata Mitra – oh noes he’s back! Maybe of interest to upcoming round 2 #corpusmooc folk as I try to make sense of semantic tagging.
Seeding BootCat – notes on how best to seed BootCat when building your own corpus from the web.
Don’t forget CL community news 1 and 2 if you have not checked them already.
This will be an occasional thing on postings I’ve been making to the Corpus Linguistics community on Page&Brin’s site which people visiting this blog for such info may find useful (
unfortunately you do need to have a chocolate factory account).
Using regex and the command line
Working with concordances
Example corpus exercises
Making a graded reader corpus
Further corpus related postings will probably end up at the G+ site so do check (and join) that if you are interested in that sort of thing. 🙂
You can read CL community news 2 if you have not already.