A #corpusmooc participant in answering a discussion question on what they would like to use corpora for replied that they wanted a reference book that shows various common structures in various genres such as “letters of condolence, public service announcements, obituaries”.

The CORE (Corpus of Online Registers) corpus at BYU along with the virtual corpora feature allows a way to reach for this.

For example, the screenshot below shows the keywords of verbs & adjectives in the Reviews genre:

Before I briefly show how to make a virtual corpus do note that the standard interface allows you do to a lot of things with the various registers. The CORE interface shows you examples of this. For example the following shows the distribution of the present perfect across the genres:

Create virtual corpora

To create a virtual corpus first go to the CORE start page:

Then click on Texts/Virtual and get this screen:

Next press Create corpus to get this screen:

We want the Reviews Genre so choose it from the drop down box:

Then press Submit to get the following screen:

Here you can either accept these texts or say you want to build only a film review corpus manually look through links and filter for film reviews only. Give your corpus a name or add it to an already existing corpus. Here we give it the name “review”:

Then after submitting you will be taken to the following screen which shows you all your virtual corpora collection we can see the corpus we just created at number 5:

Now you can list keywords.

Do note that the virtual corpora feature is available in most of the BYU collection so if genre is not your thing maybe the other choices of corpora might be useful.

Thanks for reading and do let me know if anything appears unclear.



Affix knowledge test and word part technique


There is a new online test, the CAT-WPLT (computerized adaptive testing of Word Part Levels Test) to assess students word part knowledge, i.e. prefix, suffix and stems (though the test only uses affixes for receptive use). The (diagnostic) test is composed of three parts – form, meaning and use. The form part presents 1 real affix and 4 distractor affixes for the test user to choose. The meaning part presents 1 correct meaning and 3 distractor meanings and the use part presents 4 parts of speech to match one of these correctly to the affix.

Try out the test – CAT-WPLT.

The online test takes about 10-15mins to complete and results in a nice feedback screen showing how the test taker did on the form, meaning and use of the affixes. There are comparison advanced, intermediate and beginner profiles.

Figure from Mizumoto, Sasao, & Webb (2017) pg. 14

So say you have a profile of a student who shows weakness in form and meaning. What now? Mizumoto,  Sasao, & Webb (2017) suggest giving learners their pdf list of 118 affixes (assuming you don’t need to use the test again). So if your learner is at level 1 for recognizing the form of an affix, the affixes listed as level 2 can be focused on.

Another possibility is a memory technique called the word part technique.

Word part technique
Very simply it is using an already known word which contains the same word stem/root as the new word to be remembered.

More specifically the system Wei and Nation (2013) describe lists very frequent stems i.e. stems which appear in words in the most frequent 2000 words of the BNC. These are then used to learn stems appearing in the remaining 8000 mid-frequency words in the BNC wordlist. For example a high frequency word like visit has the root -vis- which appears in mid-frequency words such as visible, envisage, revise.

Once a form connection is seen between a known high frequency word and a mid-frequency word a meaning connection needs to be made i.e. explaining the form connection. So to explain the word visible we can say visible is something that you can see. Here the explanation uses the meaning of -vis- i.e. see.

(high freq. word) visit -> go to see someone
(stem)                  vis -> see
(mid-freq. word)  visible -> something that you can see

According to Wei & Nation (2013) the most difficult step is explaining the connection. Though I think the most difficult is the first step – seeing the connection i.e. the stem/root word. Wei & Nation (2013) encouragingly state that making the connection and explaining it can develop with practice.


Click here to see top 25 word stems taken from Wei & Nation (2013)

They go on to recommend that once students have worked with this technique with the teacher they can go on to use it themselves as a strategy.

The technique’s efficacy is on par with the keyword technique and learners own methods or self-strategies (Wei, 2015). The word part technique has the added benefits that come with the nature of etymology and the history of words.

Thanks for reading.


A FLAIR, VIEW of a couple of interesting language learning apps

FLAIR (Form-focused Linguistically Aware Information Retrieval) is a neat search engine that can get web texts filtered through 87 grammar items (e.g. to- infinitives, simple prepositions, copular verbs, auxiliary verbs).

The screenshot below shows results window after a search using the terms “grenfell fire”.

There are 4 areas I have marked A, B ,C and D which attracted my attention the most. There are other features which I will leave for you to explore.

A – Here you can filter the results of your search by CEFR levels. The numbers in faint grey show how many documents there are in this particular search total of 20.

B – Filter by Academic Word List, the first icon to right is to add your own wordlist.

C – The main filter of 87 grammar items. Note that some grammar items are more accurate than others.

D – You can upload you own text for FLAIR to analyze.

Another feature to highlight is that you can use the “site:” command to search within websites, nice. A paper on FLAIR 1 gives the following to try:;;;;

The following screenshot shows an article filtered by C1-C2 level, Academic Word List and Phrasal Verbs:

VIEW (Visual Input Enhancement of the Web) is a related tool that learners of English, German, Spanish and Russian can use to highlight web texts for articles, determiners, prepositions, gerunds, noun countability and phrasal verbs (the full set currently available only for English). In addition users can do some activities, such as clicking, multiple-choice and practice (i.e fill in a blank), to identify grammar items. The developers call VIEW an intelligent automatic workbook.

VIEW comes as browser add-on for Firefox, Chrome and Opera as well as a web app. The following screenshot shows the add-on for Firefox menu:

VIEW draws on the ideas of input enhancement as the research rationale behind its approach. 2


Monco and “fresh” make & do collocations

Monco the web news monitor corpus (which means it is continuously updated) has a tremendous collocation feature. I first saw a reference to the collocation feature from a tweet by Diane Nicholls ‏@lexicoloco  but when I tried it the server was acting up. I was reminded to try again by a tweet from Dr. Michael Riccioli ‏@curvedway, whoa it is impressive.
For example let’s see what are the collocates of the famous make and do verbs.

For make here is screenshot of search settings for collocation (to get to collocation function look under tools menu from main Monco page). Note I am looking for nouns that come after the verb make. Also the double asterisk is a short cut to look for all forms of make (try it without the asterisks and see what you get).


I get as results for the top 10 collocates (for all forms of make) the following:

Top 10 collocates-make
click on image for full results

Interesting collocations include make sense, make way, make debut. The results can show you at a glance the types of constructions involved:


Or you can open another window for more details:


The top 10 collocates for do are:

Top 10 collocates-do
click on image for full results

Interesting collocates here are do thing, do anything, do something, do nothing makes a change from do shopping, cooking etc : )

Thanks for reading.

Using BYU-Wikipedia corpus to answer genre related questions

A link was posted recently on Twitter to an IELTS site looking at writing processes and describing graphs.
The following caught my eye:

…natural processes are often described using the active voice, whereas man-made or manufacturing processes are usually described using the passive.

The claim seems to go back to 2011 online (

This is an interesting claim. It has been shown that passives are more common in abstract, technical and formal writing (Biber, 1988 as cited by McEnery & Xiao, 2005). Here the claim is about specific written texts on natural processes and man-made processes.

Well we can simplify this by asking are there more passives used when writing about man-made processes than when writing about natural processes? Since if you use passive clauses then you don’t use active clauses and we can come to a conclusion by deduction.

BYU-Wikipedia corpus can be used to get approximations of natural process writing and man-made process writing. The keywords I used (for the title word) were ecology and manufacturing. Filtering out unwanted texts took longer than expected especially for the manufacturing corpus. In the end I had an ecology corpus of 77 articles and  153,621 words and a manufacturing corpus of 116 articles and 98,195 words.

The search term I used to look for passives was are|were [v?n*]. This gave me a total of 293 passives for ecology and 304 passives for manufacturing. According to the Lancaster LL calculator this showed a significant overuse of passives in manufacturing compared to ecology. According to the log ratio score this is about 2 times as common (if I understand this statistic correctly). Now this does not mean much as a lot of the texts in the wikipedia corpora won’t be specifically about processes but still it is interesting.

What is more interesting are the types of verbs used in passives in ecology and manufacturing. The top ten in each case:

























Thanks for reading.


Impassive Pullum on Passives

There’s a regular module I do at one school on writing about processes coming up soon. So a focus here is on use of passive clauses in such contexts. For years I was happily ignorant, induced by inaccurate instruction from books, about this grammar area. So it was a blessing to read and watch noted linguist Geoffrey Pullum pull apart such advice.

As an exercise for me to try to remember his counsel I knocked up three infographics, some work better than others. The information for these graphics come from Fear and Loathing of the English Passive (html); the 6 part video series Pullum on Passives  and On the myths that passives are wordy (pdf).

Types of Passives


Real rules for Passives


Allegations against Passives


Note that Pullum is not really impassive more impassioned but that makes the title of this post less groovy : )

Hope these are of use to you, thanks for reading.

HVPT or minimal pairs on steroids

It was by chance as these things tend to happen on the net that I read about High Variability Pronunciation Training (HVPT). What are the odds language teachers know about HVPT?

My extremely representative and valid polling on Twitter and G+ gave me a big fat 2 out of 24 teachers who knew the acronym. Of the two who said yes one had looked up the acronym and the other is an expert in pronunciation.

I would put good odds that most language teachers have heard of and use minimal pairs, i.e. pairs of words which differ by one sound, the famous ship/sheep for example.

HVPT can be seen as a souped up minimal pairs where different speakers are used and sounds presented in different contexts. Learners are then required to categorize the sound by picking a label for the sound. Feedback is then given on whether they are correct.

Pronunciation research has shown that providing a variety of input in terms of speakers and phonetic contexts helps learners categorize sounds. That is the V of variability in the acronym. Furthermore such training focuses learners on the phonetic form and thus reduces any effect of semantic meaning since it has been shown that attending to both meaning and form reduces performance.

Currently there is one free (with registration) program that helps with Canadian pronunciation it is called EnglishAccentCoach.1 This web and IOS program is developed by Ron Thompson a notable researcher in this field. It is claimed that it can significantly help learners in only 8 short training sessions and effects last for up to a month. There is a paid program called UCL Vowel Trainer2 which claims learners improved from 65% accuracy to 85% accuracy over 5 sessions.

Another (open source) program is in development called Minimal Bears which is based on PyPhon.3 MinimalBears aims to build up crowdsourcing feature so that many languages can be accommodated. Interested readers may like to see a talk about HVPT from the developers.4

So it is quite amazing as Mark Liberman from Language Log pointed out how little is known by language educators about HPVT. One of the commenters to the Language Log post suggested association with drill and kill stereotypes of language learning may have tainted it. No doubt more research is required to test the limits of HPVT. Hopefully this post will pique interest in readers to investigate these minimal pairs on steroids.

Many thanks to Guy Emerson for additional information and to the poll respondents.


1. EnglishAccentCoach
2. UCL Vowel Trainer
3. PyPhon  I have yet to be able to get this working
4. (video) High Variability and Phonetic Training – Guy Emerson and Stanisław Pstrokoński

Further reading:

