Another installment of the Google+ Corpus Linguistics Community news. In addition I include links to some visual aids that were designed to answer questions people in the second round of #corpusmooc posed.
Bigger is not necessarily better – Here I talk about some minor aspects of the paper, and not the more interesting aspects that is, the paper compiled a list of approximately 14000 pairs of collocations that are worth teaching. The list is not available as of yet. The other main finding is that frequency is more important than dispersion or chronological data when identifying collocations, with human judgement remaining a key factor in deciding on useful collocations.
Google as a corpus with students – Some interesting recent developments on using Google as a corpus with a short list of relevant online reading.
How to develop effective concordance materials using online corpus – My interpretation of a slideshow by a Korean researcher on using data driven learning materials.
#corpusmooc Visual Aids:
Tokens, Types, Lemmas, Word families
Collocations and Colligations
Do check out the other corpus lingustics community news if you haven’t already.
I have realised it’s been a while since I have reported any potentially useful posts I have done over at the G+ CL community. So here is bullletin number 3.
Some pointers when re-writing text for graded readers – this is my interpretation of a Japanese researcher’s slide presentation so I may be talking out me backside!
AntWordProfiler and specialised vocabulary profiling – this arose out of a question that a participant on the iTDi ELT Reading materials design course had.
Videogrep, a tool to make concordances of video – very neat tool, see my first comment to post to see examples using The Big Bang Theory.
Building your own corpus -TagAnt – continuation of my series showing how to use TagAnt POS tagging to mine your DIY corpus.
Semantic tagging Sugata Mitra – oh noes he’s back! Maybe of interest to upcoming round 2 #corpusmooc folk as I try to make sense of semantic tagging.
Seeding BootCat – notes on how best to seed BootCat when building your own corpus from the web.
Don’t forget CL community news 1 and 2 if you have not checked them already.
This will be an occasional thing on postings I’ve been making to the Corpus Linguistics community on Page&Brin’s site which people visiting this blog for such info may find useful (
unfortunately you do need to have a chocolate factory account).
Using regex and the command line
Working with concordances
Example corpus exercises
Making a graded reader corpus
Further corpus related postings will probably end up at the G+ site so do check (and join) that if you are interested in that sort of thing. 🙂
You can read CL community news 2 if you have not already.