Corpus linguistics community news 4

Another installment of the Google+ Corpus Linguistics Community news. In addition I include links to some visual aids that were designed to answer questions people in the second round of #corpusmooc posed.

Bigger is not necessarily better – Here I talk about some minor aspects of the paper, and not the more interesting aspects that is, the paper compiled a list of approximately 14000 pairs of collocations that are worth teaching. The list is not available as of yet. The other main finding is that frequency is more important than dispersion or chronological data when identifying collocations, with human judgement remaining a key factor in deciding on useful collocations.

Google as a corpus with students – Some interesting recent developments on using Google as a corpus with a short list of relevant online reading.

How to develop effective concordance materials using online corpus – My interpretation of a slideshow by a Korean researcher on using data driven learning materials.

#corpusmooc Visual Aids:

Tokens, Types, Lemmas, Word families

Genre Register

Collocations and Colligations

Multi-word expressions

Do check out the other corpus lingustics community news if you haven’t already.


Corpus linguistics community news 3

I have realised it’s been a while since I have reported any potentially useful posts I have done over at the G+ CL community. So here is bullletin number 3.

Some pointers when re-writing text for graded readers – this is my interpretation of a Japanese researcher’s slide presentation so I may be talking out me backside!

AntWordProfiler and specialised vocabulary profiling – this arose out of a question that a participant on the iTDi ELT Reading materials design course had.

Videogrep, a tool to make concordances of video – very neat tool, see my first comment to post to see examples using The Big Bang Theory.

Building your own corpus -TagAnt – continuation of my series showing how to use TagAnt POS tagging to mine your DIY corpus.

Semantic tagging Sugata Mitra – oh noes he’s back! Maybe of interest to upcoming round 2 #corpusmooc folk as I try to make sense of semantic tagging.

Seeding BootCat – notes on how best to seed BootCat when building your own corpus from the web.

Don’t forget CL community news 1 and 2 if you have not checked them already.

Corpus linguistics community news 2

Another round of links (here’s first round) to some posts on the G+ corpus linguistics community to notify peeps interested in this sort of thing:

Using BAWE corpus in SketchEngine as a learner corpus (see 1st comment to post)

The Desolation of Smaug and the BYU COCA

Dogme concordancing

The Disabled Access Friendly Campaign, corpus use literacy and

Corpus linguistics community news

This will be an occasional thing on postings I’ve been making to the Corpus Linguistics community on Page&Brin’s site which people visiting this blog for such info may find useful (unfortunately you do need to have a chocolate factory account).

Using regex and the command line

Working with concordances

Example corpus exercises

Making a graded reader corpus

Further corpus related postings will probably end up at the G+ site so do check (and join) that if you are interested in that sort of thing. ­čÖé

You can read CL community news 2 if you have not already.