A FLAIR, VIEW of a couple of interesting language learning apps

FLAIR (Form-focused Linguistically Aware Information Retrieval) is a neat search engine that can get web texts filtered through 87 grammar items (e.g. to- infinitives, simple prepositions, copular verbs, auxiliary verbs).

The screenshot below shows results window after a search using the terms “grenfell fire”.

There are 4 areas I have marked A, B ,C and D which attracted my attention the most. There are other features which I will leave for you to explore.

A – Here you can filter the results of your search by CEFR levels. The numbers in faint grey show how many documents there are in this particular search total of 20.

B – Filter by Academic Word List, the first icon to right is to add your own wordlist.

C – The main filter of 87 grammar items. Note that some grammar items are more accurate than others.

D – You can upload you own text for FLAIR to analyze.

Another feature to highlight is that you can use the “site:” command to search within websites, nice. A paper on FLAIR 1 gives the following to try: https://www.gutenberg.org; http://www.timeforkids.com/news; http://www.bbc.co.uk/bitesize; https://newsela.com; http://onestopenglish.com.

The following screenshot shows an article filtered by C1-C2 level, Academic Word List and Phrasal Verbs:

VIEW (Visual Input Enhancement of the Web) is a related tool that learners of English, German, Spanish and Russian can use to highlight web texts for articles, determiners, prepositions, gerunds, noun countability and phrasal verbs (the full set currently available only for English). In addition users can do some activities, such as clicking, multiple-choice and practice (i.e fill in a blank), to identify grammar items. The developers call VIEW an intelligent automatic workbook.

VIEW comes as browser add-on for Firefox, Chrome and Opera as well as a web app. The following screenshot shows the add-on for Firefox menu:

VIEW draws on the ideas of input enhancement as the research rationale behind its approach. 2


1. Chinkina, M., & Meurers, D. (2016). Linguistically aware information retrieval: providing input enrichment for second language learners. In Proceedings of the 11th Workshop on Innovative Use of NLP for Building Educational Applications, San Diego, CA. PDF available [http://anthology.aclweb.org/W/W16/W16-0521.pdf]


2. Meurers, D., Ziai, R., Amaral, L., Boyd, A., Dimitrov, A., Metcalf, V., & Ott, N. (2010, June). Enhancing authentic web pages for language learners. In Proceedings of the NAACL HLT 2010 Fifth Workshop on Innovative Use of NLP for Building Educational Applications (pp. 10-18). Association for Computational Linguistics. PDF available [http://www.sfs.uni-tuebingen.de/~dm/papers/meurers-ziai-et-al-10.pdf]

Monco and “fresh” make & do collocations

Monco the web news monitor corpus (which means it is continuously updated) has a tremendous collocation feature. I first saw a reference to the collocation feature from a tweet by Diane Nicholls ‏@lexicoloco  but when I tried it the server was acting up. I was reminded to try again by a tweet from Dr. Michael Riccioli ‏@curvedway, whoa it is impressive.
For example let’s see what are the collocates of the famous make and do verbs.

For make here is screenshot of search settings for collocation (to get to collocation function look under tools menu from main Monco page). Note I am looking for nouns that come after the verb make. Also the double asterisk is a short cut to look for all forms of make (try it without the asterisks and see what you get).


I get as results for the top 10 collocates (for all forms of make) the following:

Top 10 collocates-make
click on image for full results

Interesting collocations include make sense, make way, make debut. The results can show you at a glance the types of constructions involved:


Or you can open another window for more details:


The top 10 collocates for do are:

Top 10 collocates-do
click on image for full results

Interesting collocates here are do thing, do anything, do something, do nothing makes a change from do shopping, cooking etc : )

Thanks for reading.

Using BYU-Wikipedia corpus to answer genre related questions

A link was posted recently on Twitter to an IELTS site looking at writing processes and describing graphs.
The following caught my eye:

…natural processes are often described using the active voice, whereas man-made or manufacturing processes are usually described using the passive.

The claim seems to go back to 2011 online (http://ielts-simon.com/ielts-help-and-english-pr/2011/02/ielts-writing-task-1-describe-a-process-1.html).

This is an interesting claim. It has been shown that passives are more common in abstract, technical and formal writing (Biber, 1988 as cited by McEnery & Xiao, 2005). Here the claim is about specific written texts on natural processes and man-made processes.

Well we can simplify this by asking are there more passives used when writing about man-made processes than when writing about natural processes? Since if you use passive clauses then you don’t use active clauses and we can come to a conclusion by deduction.

BYU-Wikipedia corpus can be used to get approximations of natural process writing and man-made process writing. The keywords I used (for the title word) were ecology and manufacturing. Filtering out unwanted texts took longer than expected especially for the manufacturing corpus. In the end I had an ecology corpus of 77 articles and  153,621 words and a manufacturing corpus of 116 articles and 98,195 words.

The search term I used to look for passives was are|were [v?n*]. This gave me a total of 293 passives for ecology and 304 passives for manufacturing. According to the Lancaster LL calculator this showed a significant overuse of passives in manufacturing compared to ecology. According to the log ratio score this is about 2 times as common (if I understand this statistic correctly). Now this does not mean much as a lot of the texts in the wikipedia corpora won’t be specifically about processes but still it is interesting.

What is more interesting are the types of verbs used in passives in ecology and manufacturing. The top ten in each case:
























Thanks for reading.


Biber, D. (1988) Variation Across Speech and Writing(Cambridge: Cambridge University Press).

McEnery, A. M. and Xiao, R. Z. (2005) Passive constructions in English and Chinese: A corpus-based contrastive study . Proceedings from the Corpus Linguistics Conference Series, 1 (1). ISSN 1747-9398 Retrieved from http://eprints.lancs.ac.uk/63/1/CL2005_(22)_%2D_passive_paper_%2D_McEnery_and_Xiao.pdf

Impassive Pullum on Passives

There’s a regular module I do at one school on writing about processes coming up soon. So a focus here is on use of passive clauses in such contexts. For years I was happily ignorant, induced by inaccurate instruction from books, about this grammar area. So it was a blessing to read and watch noted linguist Geoffrey Pullum pull apart such advice.

As an exercise for me to try to remember his counsel I knocked up three infographics, some work better than others. The information for these graphics come from Fear and Loathing of the English Passive (html); the 6 part video series Pullum on Passives  and On the myths that passives are wordy (pdf).

Types of Passives


Real rules for Passives


Allegations against Passives


Note that Pullum is not really impassive more impassioned but that makes the title of this post less groovy : )

Hope these are of use to you, thanks for reading.

HVPT or minimal pairs on steroids

It was by chance as these things tend to happen on the net that I read about High Variability Pronunciation Training (HVPT). What are the odds language teachers know about HVPT?

My extremely representative and valid polling on Twitter and G+ gave me a big fat 2 out of 24 teachers who knew the acronym. Of the two who said yes one had looked up the acronym and the other is an expert in pronunciation.

I would put good odds that most language teachers have heard of and use minimal pairs, i.e. pairs of words which differ by one sound, the famous ship/sheep for example.

HVPT can be seen as a souped up minimal pairs where different speakers are used and sounds presented in different contexts. Learners are then required to categorize the sound by picking a label for the sound. Feedback is then given on whether they are correct.

Pronunciation research has shown that providing a variety of input in terms of speakers and phonetic contexts helps learners categorize sounds. That is the V of variability in the acronym. Furthermore such training focuses learners on the phonetic form and thus reduces any effect of semantic meaning since it has been shown that attending to both meaning and form reduces performance.

Currently there is one free (with registration) program that helps with Canadian pronunciation it is called EnglishAccentCoach.1 This web and IOS program is developed by Ron Thompson a notable researcher in this field. It is claimed that it can significantly help learners in only 8 short training sessions and effects last for up to a month. There is a paid program called UCL Vowel Trainer2 which claims learners improved from 65% accuracy to 85% accuracy over 5 sessions.

Another (open source) program is in development called Minimal Bears which is based on PyPhon.3 MinimalBears aims to build up crowdsourcing feature so that many languages can be accommodated. Interested readers may like to see a talk about HVPT from the developers.4

So it is quite amazing as Mark Liberman from Language Log pointed out how little is known by language educators about HPVT. One of the commenters to the Language Log post suggested association with drill and kill stereotypes of language learning may have tainted it. No doubt more research is required to test the limits of HPVT. Hopefully this post will pique interest in readers to investigate these minimal pairs on steroids.

Many thanks to Guy Emerson for additional information and to the poll respondents.


1. EnglishAccentCoach
2. UCL Vowel Trainer
3. PyPhon  I have yet to be able to get this working
4. (video) High Variability and Phonetic Training – Guy Emerson and Stanisław Pstrokoński

Further reading:

Thomson, R. I. (2011). Computer assisted pronunciation training: Targeting second language vowel perception improves pronunciation. Calico Journal, 28(3), 744-765. Retrieved from http://www.equinoxpub.com/journals/index.php/CALICO/article/viewPDFInterstitial/22985/18991
Liberman, M. (2008, July 6) HVPT [Blog post]. Retrieved from http://languagelog.ldc.upenn.edu/nll/?p=328

Learning vocabulary through subs2srs and Anki

This post reports on a way to learn vocabulary using your favorite film or TV show. You need two programs subs2srs and Anki. I first saw the reference to subs2srs via a post by Olya Sergeeva, a great read by the way.

subs2srs allows you to cut up your video file by its subtitles. Then you can use the resulting files to import into Anki. I won’t go into detail about doing this as the user guide for subs2srs does this well. I will just post some screen recordings to demonstrate how it appears as you use it. In my case I am using it learn more conversational and idiomatic French via the TV show Les Revenants.

The first recording shows what happens as you use Anki with your subs2srs cut-up file. Near the end of the recording I demonstrate one of the features of Anki which allows you to hide/bury cards you don’t want to use:

The second recording shows how to browse cards in a deck and tag them for use in a custom deck:

The third video shows the use of a custom deck made from a particular tag:

A post by polyglot Judith Meyer shows how she used it to study Japanese vocabulary. Most of the instructions for subs2srs in that post are dated but further down she has some nice advice on how to use any Anki decks you may make from subs2srs.

I am not sure how efficient this method is since after about a month of occasional use I have only really learned one expression – je peux pas aller plus vite que la musique/I haven’t got wings! But I feel being able to have the audio is helping.

One thing to be aware of is to make backups of your Anki collections you use on your phone otherwise you risk resetting all the cards you’ve been studying when you add say a new film or episode that has been converted by subs2srs onto your mobile version of Anki.

Thanks for reading and feel free with any questions you may have.

Using BYU Wiki corpus to recycle coursebook vocabulary in a variety of contexts

Recycling vocabulary in a variety of contexts is recommended by the vocabulary literature. Simply going back to texts one has used in a coursebook is an option but it misses the variety of context.

I need to recycle vocabulary from Unit 1 of my TOEIC book, so I take the topics from the table of contents as input to create a wiki corpus.

The main title of Unit 1 in my book is careers, with sub topics of professions, recruitment, training. I could also add in job interview, job fair, temp agency.

Note for more details on various features of the BYU WIKI corpus do see the videos by Mark Davies, for the rest of this post I assume you have some familiarity with these.

So when creating a corpus in BYU WIKI corpus in my Title word(s) search I enter career* to find all titles with career and careers.

Then in the Words in pages box I enter professions, profession, recruitment, training. Note search for plural and 300 as number of pages:

Screenshot 1: corpus search terms

After pressing submit a screen of a list of wiki pages is presented, you can scroll through this to find pages that may be irrelevant to you:

Screenshot 2: wiki pages

After unticking any irrelevant pages press submit. I won’t talk a lot about filtering your corpus build here. As mentioned do make sure to watch Mark Davies series of videos to get more details.

Now you will see your newly created corpus:

Screenshot 3: my virtual corpora

Tick the Specific radio button:

Screenshot 4: specific key word radio button

and then click the nouns keywords. Skill is the top keyword here which also appears in the wordlist in my book:

Screenshot 5: noun keywords

What I am more interested in is verbs so I click that:

Screenshot 6: verb keywords

The noun requirement, which by the way does not come from the careers unit, appears in the book wordlist but not the verb. So now I can look at some example uses of the verb require that I could use in class.

One step is to see what collocates with require:

Screenshot 7: collocates of require

Clicking on the top 5 collocates brings up some potential language.

Another interesting use is once you have a number of corpora you can see what word appear most in each corpora. The following screenshots show corpora related to the first 3 units of my book i.e. Careers, Workplaces, Communications:

Screenshot 8: my virtual corpora

The greyed lines mean those corpora are omitted from my search. This could be a nice exercise where you take some word and get students to see how they are distributed. So for example you may show the distribution of the verb fill:

Screenshot 9: distribution of verb fill

We see that it appears most in the recruit* corpus. One option now is to get students to predict how the verb is used in that corpus and then click the bar to see some examples.

After this demonstration you can now ask students to guess what words will appear most in the various corpora and do the search for the students to see the resulting graphs.

Hope this has shown how we can use BYU WIKI corpus to recycle vocabulary in different contexts.

Do shoot me any questions as this post may indeed be confusing.