HVPT or minimal pairs on steroids

It was by chance as these things tend to happen on the net that I read about High Variability Pronunciation Training (HVPT). What are the odds language teachers know about HVPT?

My extremely representative and valid polling on Twitter and G+ gave me a big fat 2 out of 24 teachers who knew the acronym. Of the two who said yes one had looked up the acronym and the other is an expert in pronunciation.

I would put good odds that most language teachers have heard of and use minimal pairs, i.e. pairs of words which differ by one sound, the famous ship/sheep for example.

HVPT can be seen as a souped up minimal pairs where different speakers are used and sounds presented in different contexts. Learners are then required to categorize the sound by picking a label for the sound. Feedback is then given on whether they are correct.

Pronunciation research has shown that providing a variety of input in terms of speakers and phonetic contexts helps learners categorize sounds. That is the V of variability in the acronym. Furthermore such training focuses learners on the phonetic form and thus reduces any effect of semantic meaning since it has been shown that attending to both meaning and form reduces performance.

Currently there is one free (with registration) program that helps with Canadian pronunciation it is called EnglishAccentCoach.1 This web and IOS program is developed by Ron Thompson a notable researcher in this field. It is claimed that it can significantly help learners in only 8 short training sessions and effects last for up to a month. There is a paid program called UCL Vowel Trainer2 which claims learners improved from 65% accuracy to 85% accuracy over 5 sessions.

Another (open source) program is in development called Minimal Bears which is based on PyPhon.3 MinimalBears aims to build up crowdsourcing feature so that many languages can be accommodated. Interested readers may like to see a talk about HVPT from the developers.4

So it is quite amazing as Mark Liberman from Language Log pointed out how little is known by language educators about HPVT. One of the commenters to the Language Log post suggested association with drill and kill stereotypes of language learning may have tainted it. No doubt more research is required to test the limits of HPVT. Hopefully this post will pique interest in readers to investigate these minimal pairs on steroids.

Many thanks to Guy Emerson for additional information and to the poll respondents.


1. EnglishAccentCoach
2. UCL Vowel Trainer
3. PyPhon  I have yet to be able to get this working
4. (video) High Variability and Phonetic Training – Guy Emerson and Stanisław Pstrokoński

Further reading:

Thomson, R. I. (2011). Computer assisted pronunciation training: Targeting second language vowel perception improves pronunciation. Calico Journal, 28(3), 744-765. Retrieved from http://www.equinoxpub.com/journals/index.php/CALICO/article/viewPDFInterstitial/22985/18991
Liberman, M. (2008, July 6) HVPT [Blog post]. Retrieved from http://languagelog.ldc.upenn.edu/nll/?p=328

Lesson Kit 1: Star Wars puns

This post provides some items that one could use to construct a lesson or activity based on the following video:

h/t John Spencer ‏@spencerideas.

The kit includes a bingo pdf (generated from http://print-bingo.com/) and a docx of the relevant puns.

One idea is to test listening skills via the bingo card (columns L and A are the winning columns), further work could be done on explaining the first three/four puns in the text file.
Another thing to do is use phonetically transcribed word in bingo card.

Do let me know of other things that could be done.

Hope this is a lesson kit you are looking for : )


Video: https://www.youtube.com/watch?v=qslJQUMc9yA



Learning vocabulary through subs2srs and Anki

This post reports on a way to learn vocabulary using your favorite film or TV show. You need two programs subs2srs and Anki. I first saw the reference to subs2srs via a post by Olya Sergeeva, a great read by the way.

subs2srs allows you to cut up your video file by its subtitles. Then you can use the resulting files to import into Anki. I won’t go into detail about doing this as the user guide for subs2srs does this well. I will just post some screen recordings to demonstrate how it appears as you use it. In my case I am using it learn more conversational and idiomatic French via the TV show Les Revenants.

The first recording shows what happens as you use Anki with your subs2srs cut-up file. Near the end of the recording I demonstrate one of the features of Anki which allows you to hide/bury cards you don’t want to use:

The second recording shows how to browse cards in a deck and tag them for use in a custom deck:

The third video shows the use of a custom deck made from a particular tag:

A post by polyglot Judith Meyer shows how she used it to study Japanese vocabulary. Most of the instructions for subs2srs in that post are dated but further down she has some nice advice on how to use any Anki decks you may make from subs2srs.

I am not sure how efficient this method is since after about a month of occasional use I have only really learned one expression – je peux pas aller plus vite que la musique/I haven’t got wings! But I feel being able to have the audio is helping.

One thing to be aware of is to make backups of your Anki collections you use on your phone otherwise you risk resetting all the cards you’ve been studying when you add say a new film or episode that has been converted by subs2srs onto your mobile version of Anki.

Thanks for reading and feel free with any questions you may have.

Using BYU Wiki corpus to recycle coursebook vocabulary in a variety of contexts

Recycling vocabulary in a variety of contexts is recommended by the vocabulary literature. Simply going back to texts one has used in a coursebook is an option but it misses the variety of context.

I need to recycle vocabulary from Unit 1 of my TOEIC book, so I take the topics from the table of contents as input to create a wiki corpus.

The main title of Unit 1 in my book is careers, with sub topics of professions, recruitment, training. I could also add in job interview, job fair, temp agency.

Note for more details on various features of the BYU WIKI corpus do see the videos by Mark Davies, for the rest of this post I assume you have some familiarity with these.

So when creating a corpus in BYU WIKI corpus in my Title word(s) search I enter career* to find all titles with career and careers.

Then in the Words in pages box I enter professions, profession, recruitment, training. Note search for plural and 300 as number of pages:

Screenshot 1: corpus search terms

After pressing submit a screen of a list of wiki pages is presented, you can scroll through this to find pages that may be irrelevant to you:

Screenshot 2: wiki pages

After unticking any irrelevant pages press submit. I won’t talk a lot about filtering your corpus build here. As mentioned do make sure to watch Mark Davies series of videos to get more details.

Now you will see your newly created corpus:

Screenshot 3: my virtual corpora

Tick the Specific radio button:

Screenshot 4: specific key word radio button

and then click the nouns keywords. Skill is the top keyword here which also appears in the wordlist in my book:

Screenshot 5: noun keywords

What I am more interested in is verbs so I click that:

Screenshot 6: verb keywords

The noun requirement, which by the way does not come from the careers unit, appears in the book wordlist but not the verb. So now I can look at some example uses of the verb require that I could use in class.

One step is to see what collocates with require:

Screenshot 7: collocates of require

Clicking on the top 5 collocates brings up some potential language.

Another interesting use is once you have a number of corpora you can see what word appear most in each corpora. The following screenshots show corpora related to the first 3 units of my book i.e. Careers, Workplaces, Communications:

Screenshot 8: my virtual corpora

The greyed lines mean those corpora are omitted from my search. This could be a nice exercise where you take some word and get students to see how they are distributed. So for example you may show the distribution of the verb fill:

Screenshot 9: distribution of verb fill

We see that it appears most in the recruit* corpus. One option now is to get students to predict how the verb is used in that corpus and then click the bar to see some examples.

After this demonstration you can now ask students to guess what words will appear most in the various corpora and do the search for the students to see the resulting graphs.

Hope this has shown how we can use BYU WIKI corpus to recycle vocabulary in different contexts.

Do shoot me any questions as this post may indeed be confusing.

Offline (doku) wiki writing

Here are some screenshots of my next little experiment using PirateBox:


This time with an another program (Server for PHP) that serves up a wiki (DokuWiki).

The main task has been described in a previous post.

I plan to also use the opportunity of the students getting to know the DokuWiki interface to practice some prepositions of place such as shown in the following screenshot. i.e. on the top right, next to, below:


If there is any interest in detailing how to get this set up on your android phone do leave a comment.

Thanks to Dan for helping me test the set up and thanks to you for reading.

Quick cup of COCA – compare

How are the words rotate and revolve used?

Have a think, then read this description.

Now using COCA-BYU’s compare feature let’s see if the description is supported and whether we can add anything to the picture painted by the Grammarist.

The screen recording above shows:
1. clicking on the compare radio button
2. entering the word rotate in the first box and the word revolve in the second box
3. looking through the results and seeing a rough pattern (of rotate being used with concrete (body) words and revolve with more abstract words – http://corpus.byu.edu/coca/?c=coca&q=41029926)
4. entering noun part of speech (POS) into the collocates box (from the POS drop down menu)
5. now the results are much clearer – http://corpus.byu.edu/coca/?c=coca&q=41030162

The description from the Grammarist article that revolve is used with astronomical objects such as planets is supported. We also see the stronger collocations of more abstract nouns used with revolve such as questions and lives. COCA-BYU also supports the Grammarist description that the primary meaning of rotate to mean spin on axis seems to be most frequent. And the Grammarist description could be added to by saying that rotate is used a lot  with body parts.

Do sip the other cups of COCA if you have not already.

Thanks for reading.

Corpus Linguistics for Grammar – Christian Jones & Daniel Waller interview

CLgrammarFollowing on from James Thomas’s Discovering English with SketchEngine and Ivor Timmis’s Corpus Linguistics for ELT: Research & Practice I am delighted to add an interview with Christan Jones and Daniel Waller authors of Corpus Linguistics for Grammar: A guide for research.

An added bonus are the open access articles listed at the end of the interview. I am very grateful to Christian () and Daniel for taking time to answer my questions.

1. Can you relate some of your background(s)?

We’ve both been involved in ELT for over twenty years and we both worked as teachers and trainers abroad for around a decade; Chris in Japan, Thailand and the UK and Daniel in Turkey. We are now both senior lecturers at the University of Central Lancashire (UCLan, Preston, UK),  where we’ve been involved in a number of programmes including MA and BA TESOL as well as EAP courses.

We both supervise research students and undertake research. Chris’s research is in the areas of spoken language, corpus-informed language teaching and lexis while Daniel focuses on written language, language testing (and the use of corpora in this area) and discourse. We’ve published a number of research papers in these areas and have listed some of these below. We’ve indicated which ones are open-access.

2. The focus in your book is on grammar could you give us a quick (or not so quick) description of how you define grammar in your book?

We could start by saying what grammar isn’t. It isn’t a set of prescriptive rules or the opinion of a self-appointed expert, which is what the popular press tend to bang on about when they consider grammar! Such approaches are inadequate in the definition of grammar and are frequently contradictory and unhelpful (we discuss some of these shortcomings in the book).  Grammar is defined in our book as being (a) descriptive rather than prescriptive (b) the analysis of form and function (c) linked at different levels (d) different in spoken and written contexts (e) a system which operates in contexts to make meaning (f) difficult to separate from vocabulary (g) open to choice.

The use of corpora has revolutionised the ways in which we are now able to explore language and grammar and provides opportunities to explore different modes of text (spoken or written) and different types of text. Any description of grammar must take these into account and part of what we wanted to do was to give readers the tools to carry out their own research into language. When someone is looking at a corpus of a particular type of text, they need to keep in mind the communicative purpose of the text and how the grammar is used to achieve this.

For example, a written text might have a number of complex sentences containing both main and subordinate clauses. It may do so in order to develop an argument but it can also be more complex because the expectation is that a reader has time to process the text, even though it is dense, unlike in spoken language. If we look at a corpus we can discover if there is a general tendency to use a particular pattern such as complex sentences across a number of texts and how it functions within these texts.

3. What corpora do you use in the book?

We have only used open-access corpora in the book including BYU-BNC, COCA, GloWbe, the Hong Kong Corpus of Spoken English. The reason for using open-access corpora was to enable readers to carry out their own examinations of grammar. We really want the book to be a tool for research.

4. Do you have any opinions on the public availability of corpora and whether wider access is something to push for?

Short answer: yes. Longer answer: We would say it’s essential for the development of good language teaching courses, materials and assessments as well as democratising the area of language research. To be fair to many of the big corpora, some like the BNC have allowed limited access for a long time.

5. The book is aimed at research so what can Language Teachers get out of it?

By using the book teachers can undertake small-scale investigations into a piece of language they are about to teach even if it is as simple as finding out which of two forms is the more frequent. We’ve all had situations in our teaching where we’ve come across a particular piece of language and wondered if a form is as frequent as it is made to appear in a text-book, or had a student come up and say ‘can I say X in this text’ and struggled with the answer. Corpora can help us with such questions. We hope the book might make teachers think again about what grammar is and what it is for.

For example, when we consider three forms of marry (marry, marries and married) we find that married is the most common form in both the BYU-BNC newspaper corpus and the COCA spoken corpus. But in the written corpus, the most common pattern is in non-defining relative clauses (Mark, who is married with two children, has been working for two years…). In the spoken corpus, the most common pattern is going to get married e.g. When are they going to get married?

We think that this shows that separating vocabulary and grammar is not always helpful because if a word is presented without its common grammatical patterns then students are left trying to fit the word into a structure and in fact words are patterned in particular ways. In the case of teachers, there is no reason why an initially small piece of research couldn’t become larger and ultimately a publication, so we hope the book will inspire teachers to become interested in investigating language.

6. Anything else you would like to add?

One of the things that got us interested in writing the book was the need for a book pitched at undergraduate students in their final year of their programme and those starting an MA, CELTA or DELTA programme who may not have had much exposure to corpus linguistics previously. We wanted to provide tools and examples to help these readers carry out their own investigations.

Sample Publications

Jones, C., & Waller, D. (2015). Corpus Linguistics for Grammar: A guide for Research. London: Routledge.

Jones, C. (2015).  In defence of teaching and acquiring formulaic sequences. ELT Journal, 69 (3), pp 319-322.

Golebiewksa, P., & Jones, C. (2014). The Teaching and Learning of Lexical Chunks: A Comparison of Observe Hypothesise Experiment and Presentation Practice Production. Journal of Linguistics and Language Teaching, 5 (1), pp.99–115. OPEN ACCESS

Jones, C., & Carter, R. (2014). Teaching spoken discourse markers explicitly: A comparison of III and PPP. International Journal of English Studies, 14 (1), pp.37–54. OPEN ACCESS

Jones, C., & Halenko, N.(2014). What makes a successful spoken request? Using corpus tools to analyse learner language in a UK EAP context. Journal of Applied Language Studies, 8(2), pp. 23–41. OPEN ACCESS

Jones, C., & Horak, T. (2014). Leave it out! The use of soap operas as models of spoken discourse in the ELT classroom. The Journal of Language Teaching and Learning, 4(1), pp.1–14. OPEN ACCESS

Jones, C, Waller, D., & Golebiewska, P. (2013). Defining successful spoken language at B2 Level: Findings from a corpus of learner test data. European Journal of Applied Linguistics and TEFL, 2(2), pp.29–45.

Waller, D., & Jones, C. (2012). Equipping TESOL trainees to teach through discourse. UCLan Journal of Pedagogic Research, 3, pp. 5–11. OPEN ACCESS