Monco the web news monitor corpus (which means it is continuously updated) has a tremendous collocation feature. I first saw a reference to the collocation feature from a tweet by Diane Nicholls @lexicoloco but when I tried it the server was acting up. I was reminded to try again by a tweet from Dr. Michael Riccioli @curvedway, whoa it is impressive.
For example let’s see what are the collocates of the famous make and do verbs.
For make here is screenshot of search settings for collocation (to get to collocation function look under tools menu from main Monco page). Note I am looking for nouns that come after the verb make. Also the double asterisk is a short cut to look for all forms of make (try it without the asterisks and see what you get).
I get as results for the top 10 collocates (for all forms of make) the following:
click on image for full results
Interesting collocations include make sense, make way, make debut. The results can show you at a glance the types of constructions involved:
Or you can open another window for more details:
The top 10 collocates for do are:
click on image for full results
Interesting collocates here are do thing, do anything, do something, do nothing makes a change from do shopping, cooking etc : )
This is an interesting claim. It has been shown that passives are more common in abstract, technical and formal writing (Biber, 1988 as cited by McEnery & Xiao, 2005). Here the claim is about specific written texts on natural processes and man-made processes.
Well we can simplify this by asking are there more passives used when writing about man-made processes than when writing about natural processes? Since if you use passive clauses then you don’t use active clauses and we can come to a conclusion by deduction.
BYU-Wikipedia corpus can be used to get approximations of natural process writing and man-made process writing. The keywords I used (for the title word) were ecology and manufacturing. Filtering out unwanted texts took longer than expected especially for the manufacturing corpus. In the end I had an ecology corpus of 77 articles and 153,621 words and a manufacturing corpus of 116 articles and 98,195 words.
The search term I used to look for passives was are|were [v?n*]. This gave me a total of 293 passives for ecology and 304 passives for manufacturing. According to the Lancaster LL calculator this showed a significant overuse of passives in manufacturing compared to ecology. According to the log ratio score this is about 2 times as common (if I understand this statistic correctly). Now this does not mean much as a lot of the texts in the wikipedia corpora won’t be specifically about processes but still it is interesting.
What is more interesting are the types of verbs used in passives in ecology and manufacturing. The top ten in each case:
There’s a regular module I do at one school on writing about processes coming up soon. So a focus here is on use of passive clauses in such contexts. For years I was happily ignorant, induced by inaccurate instruction from books, about this grammar area. So it was a blessing to read and watch noted linguist Geoffrey Pullum pull apart such advice.
It was by chance as these things tend to happen on the net that I read about High Variability Pronunciation Training (HVPT). What are the odds language teachers know about HVPT?
My extremely representative and valid polling on Twitter and G+ gave me a big fat 2 out of 24 teachers who knew the acronym. Of the two who said yes one had looked up the acronym and the other is an expert in pronunciation.
I would put good odds that most language teachers have heard of and use minimal pairs, i.e. pairs of words which differ by one sound, the famous ship/sheep for example.
HVPT can be seen as a souped up minimal pairs where different speakers are used and sounds presented in different contexts. Learners are then required to categorize the sound by picking a label for the sound. Feedback is then given on whether they are correct.
Pronunciation research has shown that providing a variety of input in terms of speakers and phonetic contexts helps learners categorize sounds. That is the V of variability in the acronym. Furthermore such training focuses learners on the phonetic form and thus reduces any effect of semantic meaning since it has been shown that attending to both meaning and form reduces performance.
Currently there is one free (with registration) program that helps with Canadian pronunciation it is called EnglishAccentCoach.1 This web and IOS program is developed by Ron Thompson a notable researcher in this field. It is claimed that it can significantly help learners in only 8 short training sessions and effects last for up to a month. There is a paid program called UCL Vowel Trainer2 which claims learners improved from 65% accuracy to 85% accuracy over 5 sessions.
Another (open source) program is in development called Minimal Bears which is based on PyPhon.3 MinimalBears aims to build up crowdsourcing feature so that many languages can be accommodated. Interested readers may like to see a talk about HVPT from the developers.4
So it is quite amazing as Mark Liberman from Language Log pointed out how little is known by language educators about HPVT. One of the commenters to the Language Log post suggested association with drill and kill stereotypes of language learning may have tainted it. No doubt more research is required to test the limits of HPVT. Hopefully this post will pique interest in readers to investigate these minimal pairs on steroids.
Many thanks to Guy Emerson for additional information and to the poll respondents.
This post reports on a way to learn vocabulary using your favorite film or TV show. You need two programs subs2srs and Anki. I first saw the reference to subs2srs via a post by Olya Sergeeva, a great read by the way.
subs2srs allows you to cut up your video file by its subtitles. Then you can use the resulting files to import into Anki. I won’t go into detail about doing this as the user guide for subs2srs does this well. I will just post some screen recordings to demonstrate how it appears as you use it. In my case I am using it learn more conversational and idiomatic French via the TV show Les Revenants.
The first recording shows what happens as you use Anki with your subs2srs cut-up file. Near the end of the recording I demonstrate one of the features of Anki which allows you to hide/bury cards you don’t want to use:
The second recording shows how to browse cards in a deck and tag them for use in a custom deck:
The third video shows the use of a custom deck made from a particular tag:
A post by polyglot Judith Meyer shows how she used it to study Japanese vocabulary. Most of the instructions for subs2srs in that post are dated but further down she has some nice advice on how to use any Anki decks you may make from subs2srs.
I am not sure how efficient this method is since after about a month of occasional use I have only really learned one expression – je peux pas aller plus vite que la musique/I haven’t got wings! But I feel being able to have the audio is helping.
One thing to be aware of is to make backups of your Anki collections you use on your phone otherwise you risk resetting all the cards you’ve been studying when you add say a new film or episode that has been converted by subs2srs onto your mobile version of Anki.
Thanks for reading and feel free with any questions you may have.
Recycling vocabulary in a variety of contexts is recommended by the vocabulary literature. Simply going back to texts one has used in a coursebook is an option but it misses the variety of context.
I need to recycle vocabulary from Unit 1 of my TOEIC book, so I take the topics from the table of contents as input to create a wiki corpus.
The main title of Unit 1 in my book is careers, with sub topics of professions, recruitment, training. I could also add in job interview, job fair, temp agency.
Note for more details on various features of the BYU WIKI corpus do see the videos by Mark Davies, for the rest of this post I assume you have some familiarity with these.
So when creating a corpus in BYU WIKI corpus in my Title word(s) search I enter career* to find all titles with career and careers.
Then in the Words in pages box I enter professions, profession, recruitment, training. Note search for plural and 300 as number of pages:
After pressing submit a screen of a list of wiki pages is presented, you can scroll through this to find pages that may be irrelevant to you:
After unticking any irrelevant pages press submit. I won’t talk a lot about filtering your corpus build here. As mentioned do make sure to watch Mark Davies series of videos to get more details.
Now you will see your newly created corpus:
Tick the Specific radio button:
and then click the nouns keywords. Skill is the top keyword here which also appears in the wordlist in my book:
What I am more interested in is verbs so I click that:
The noun requirement, which by the way does not come from the careers unit, appears in the book wordlist but not the verb. So now I can look at some example uses of the verb require that I could use in class.
One step is to see what collocates with require:
Clicking on the top 5 collocates brings up some potential language.
Another interesting use is once you have a number of corpora you can see what word appear most in each corpora. The following screenshots show corpora related to the first 3 units of my book i.e. Careers, Workplaces, Communications:
The greyed lines mean those corpora are omitted from my search. This could be a nice exercise where you take some word and get students to see how they are distributed. So for example you may show the distribution of the verb fill:
We see that it appears most in the recruit* corpus. One option now is to get students to predict how the verb is used in that corpus and then click the bar to see some examples.
After this demonstration you can now ask students to guess what words will appear most in the various corpora and do the search for the students to see the resulting graphs.
Hope this has shown how we can use BYU WIKI corpus to recycle vocabulary in different contexts.
Do shoot me any questions as this post may indeed be confusing.
I plan to also use the opportunity of the students getting to know the DokuWiki interface to practice some prepositions of place such as shown in the following screenshot. i.e. on the top right, next to, below:
If there is any interest in detailing how to get this set up on your android phone do leave a comment.
Thanks to Dan for helping me test the set up and thanks to you for reading.