#IATEFL 2016 – Corpus Tweets 2

This is a storify of tweets by Sandy Millan, Dan Ruelle and Leo Selivan on the talk Answering language questions from corpora by James Thomas. Hats off to the tweeters I know it’s not an easy task!

IATEFL 2016 Corpus Tweets 2

Answering language questions from corpora by James Thomas as reported by Sandy Millin, Dan Ruelle & Leo Selivan

  1. James Thomas on answering language questions from corpora. Did not know Masaryk uni was home of Sketch Engine!
  2. JT has written a book about discovering English through SketchEngine with lots of ways you can search and use the corpus
  3. JT trains his trrainees how to use SketchEngine, so they can teach learners how to learn language from language
  4. JT Need to ensure that tasks have a lot of affordances of tasks and texts
  5. We live in an era of collocation, multi-word units, pragmatic competence, fuzziness and multiple affordances – James Thomas
  6. JT Why do SS have language questions? Are the rules inadequate? It’s about hirarchy of choice…
  7. JT Not much choice in terms of letters or morphemes, but lots of choice at text level
  8. JT Patterns are visible in corpora. They are regular features and cover a lot of core English
  9. JT What counts as a language pattern? Collocation, word grammar, language chunks, colligation (and more I didn’t get!)
  10. JT Students have questions about lexical cohesion, spelling mistakes, collocations: at every level of hierarchy
  11. JT Examples of q’s: Does whose refer only to people? Can women be described as handsome? Any patterns with tense/aspect clauses?
  12. JT q’s: Does the truth lie? What is friendly fire? What are the collocations of rule?
  13. JT introduces SKELL: Sketch Engine for Language Learning http://skell. (don’t know!)
  14. “Rules don’t tell whole story” – James Thomas making an analogy w/ Einstein who said same about both the wave & the particle theory
  15. JT SKELL selects useful sentences only, excludes proper nouns, obscure words etc. 40 sentences
  16.  http://skell.sketchengine.co.uk 

    Nice simple interface – need to play with it more. #iatefl

  17. JT searched for mansplain in SKELL and it already has 7 or 8 examples in there
  18. JT Algorithm to reduce amount of sentences only works when there are a lot of examples. With a few, sentences often longer
  19. Sketch Engine is a pretty hardcore linguistic tool, but I can see the use of Skell for language learners. #iatefl
  20. JT Corpora can also teach you more about grammar patterns too, for example periphrasis (didn’t get definition fast enough!)
  21. JT Can search for present perfect continuous for example: have been .*ing
  22. JT You can search for ‘could of’ in SKELL – appears fairly often, but relatively insignificant compared to ‘could have’
  23. Can use frequency in corpus search results to gauge which is “more correct” / “the norm”. #iatefl
  24. JT SKELL can sort collocations by whether a noun is the object or subject of a word for example. Can use ‘word sketch’ function
  25. Unclear whether collocation results in Skell are sorted according to “significance” / frequency or randomly #iatefl
  26. JT See @versatilepub for discounts on book about SKELL


#IATEFL 2016 – Corpus Tweets 1

This is a storified verison of the tweets by Sandy Millinon a talk called Making trouble free corpus tasks in ten minutes by Jennie Wright @teflhelper. Hopefully there will be other tweeters attending the other corpus based talks, who will be up to the standard set by Sandy : )

IATEFL2016 Corpus Tweets 1

Making trouble free corpus tasks in ten minutes – Jennie Wright as reported by Sandy Millin

  1. Jennie Wright now in Hall 8b ‘Making trouble-free corpus tasks in ten minutes’
  2. Jennie Wright runs the TEFL helper blog:  http://teflhelperblog.wordpress.com 
  3. Jennie Wright All you need to make quick corpus tasks is a good copy-paster
  4. Jennie Wright Key terms: corpus/corpora: multi-million word collections of lang, concordance lines: search term presented in middle
  5. Jennie Wright POS/grammatical tagging is tagging with noun, verb etc; KWIC is key word in the middle (was wrong before! Sorry)
  6. Jennie Wright COCA is one her business English students go back to:  http://corpus.byu.edu/coca 
  7. #iatefl Jennie Wright This is the search for COCA. It's vey intuitive. Put a key word in the box https://t.co/8YcJPuSaxh

    Jennie Wright This is the search for COCA. It’s vey intuitive. Put a key word in the box pic.twitter.com/8YcJPuSaxh
  8. Jennie Wright You can click KWIC to see the key word in the centre. Very fast!
  9. Jennie Wright COCA is great because it’s colour coded and the parts of speech are tagged
  10. #iatefl Jennie Wright What's the missing word? Get your concordance lines, then blank out one word https://t.co/owVfQ4xU33

    Jennie Wright What’s the missing word? Get your concordance lines, then blank out one word pic.twitter.com/owVfQ4xU33
  11. Jennie Wright It’s ‘thingie’ – she wanted her students to use something other than ‘thing’ or ‘stuff’! Good for fossilised errors
  12. Jennie Wright To make it, use a screengrab or clipping tool tog et the concordance lines, then block out words you want SS to guess
  13. Jennie Wright Don’t forget to read the concordance lines before you copy and paste! Avoid accidents 🙂
  14. Jennie Wright Activity 2: collocation gamble. Focus on strong and weak collocations and lexical chunking. Good for SS misusing them
  15. #iatefl Jennie Wright. What are the two most common adjective collocations each for bitterly, deeply and sincerely? https://t.co/UAj50bjmuP

    Jennie Wright. What are the two most common adjective collocations each for bitterly, deeply and sincerely? pic.twitter.com/UAj50bjmuP
  16. @teflhelper 2 most common=’bitterly’: ‘disappointed’/’cold’. ‘deeply’:’concerned’/’sorry’. ‘sincerely’:’interested’/’sorry’
  17. @teflhelper To make collocation gamble, use list function, type ‘bitterly [j*]’ in word box, then adj.ALL in POS list
  18. I think Sandy meant you can use POS list to insert code for adjective if you want not in addition to the search term given [mura]
  19. #iatefl @teflhelper 3. Colour-code me. COCA helps you to do this easily. Can you colour-code these sentences? https://t.co/TuRkPYUdu8

    @teflhelper 3. Colour-code me. COCA helps you to do this easily. Can you colour-code these sentences? pic.twitter.com/TuRkPYUdu8
  20. @teflhelper To make this, search for your word, screen shot the answers and retype the sentences for them to colour code
  21. @teflhelper Audience member suggests giving them a key of colours and get them to figure out a sentence which matches
  22. @teflhelper Tips: 1.train a little: better to know one corpus well than a lot of them a little. 2. Imperfections exist in corpora
  23. @teflhelper Tips: Don’t be afraid to oppose what’s in the corpus – you’re the ‘live corpus in the classroom’ Read it carefully
  24. @teflhelper Tips: 3. Choose wisely – never more than 10 lines and don’t overwhelm. 4. What’s the problem you want to solve?
  25. @teflhelper COCA is very helpful when your students don’t believe you 🙂 maximal v. maximum [thing Ngrams useful here too]
  26. @teflhelper Tip 5: consider how to do this online or offline. If online, what’s your backup plan? Have paper copies!
  27. @teflhelper COCA bites on Youtube are 2-minute tutorials on how to use COCA
  28. @teflhelper Thanks very much Jennie for an excellent talk. Exactly what I’ve needed for a long time 🙂

IATEFL 2016 – Corpus carnival!

I started following IATEFL online in 2012 but it is only after seeing this year’s programme that I felt any slight regret not being a member and not going in person. The number of corpus based talks is encouraging though it seems from the presenter’s affiliations that most uses are still based in higher/tertiary education. There is also a forum on using corpora in the classroom.

I hope to blog the conference (do check the list of registered bloggers) and if you want to keep up with either TESOL 2016 or IATEFL 2016 tweeting do consider following the bot @TESOL_IATEFL_50.

Finally there is a new dedicated blog Corpus Linguistics for EFL, do check it.

Note for folks interested in TESOL 2016 corpus related talks see this list.

If you spot any relevant corpus talks that are missing let me know, thanks.

Talks using the word corpus or corpora in the IATEFL 2016 programme (pdf):

Wednesday 13 April
Making trouble-free corpus tasks in ten minutes
Jennie Wright (Target Training)
For business English learners who repeatedly misuse specific vocabulary and grammar, using a corpus (electronic multi-million word collections of real-world language examples) significantly enhances accuracy and competence. Accessible to everyone, with masses of free material to exploit, workshop participants will leave knowing how to quickly and easily use corpora to design activities that take less than ten minutes to create.

Using corpora to remedy language errors in L2 writing
Hulya Can (Bilkent University)
I present a classroom study, conducted with 13 intermediate-level university students, which tested if corpora helped learners improve L2 writing. Participants were asked to use a corpus to correct their written language errors and later questionnaires and interviews were carried out. Data analysis suggested a decrease in the number of language errors; furthermore, participants believed it was an effective language tool.

Classroom applications of corpora training for learner autonomy
Federico Espinosa (The University of Birmingham)
There is an established belief in ELT that training learners in strategies for independent language analysis fosters a deeper understanding of English. Following up from last year’s research talk on corpora training for increasing learner autonomy, this practical workshop will present three fully-developed activities to use corpora with learners in a classroom environment.

Conceptual interface of corpus-based error analysis through error mapping
Paschalis Chliaras (University of Birmingham, UK)
This presentation examines the effectiveness of ‘error mapping’ as a macro and micro error analysis of non-native English language learners’ essays. The procedure involved collecting data from essays, interpreting it, reporting information, and implementing it to teaching and learning. Subsequently, students understood their mistakes, identified their needs, learned to avoid mother tongue interference and handed in a competently proofread essay.

Teaching the pragmatics of spoken requests in EAP
Christian Jones (University of Liverpool, UK)
This talk will describe the impact of one explicit interventional treatment on developing pragmatic awareness and production of spoken requests and apologies in an EAP context at a British higher education institution. The talk will describe the effectiveness of the instruction, the linguistic features of successful spoken requests and apologies in this context, and the implications for EAP teaching. (the presenter here assures us “I am not speaking directly about corpora but may slip some mentions in!”)

Thursday 14 April
Answering language questions from corpora
James Thomas (Masaryk University)
There are many language questions that dictionaries, grammar books and native speakers cannot and do not readily answer. The range of questions extends across the whole hierarchy of language from morphology to sentence building to discourse and pragmatics. This talk offers an approach to asking questions to thousands of native speakers whose language has been sampled and stored in corpora.

Using English Grammar Profile to improve curriculum design
Geraldine Mark (Gloucestershire College/Cambridge University Press) & Anne O’Keeffe (Mary Immaculate College, Limerick/Cambridge University Press)
This talk showcases the English Grammar Profile, a new open educational resource developed to enhance our understanding of English learner grammar. Based on the Cambridge Learner Corpus, it provides over 1,200 corpus-based grammar competency statements across the six levels of the CEFR. The talk will showcase the resource and explore its importance for the design of materials and curricula.

Focus on B2 writing: preparing students for Cambridge English: First
Annette Capel (Freelance)
How can students score top marks? What aspects of writing should they work on at B2? This practical session explores the strengths and weaknesses of candidate performance using real answers from the Cambridge Learner Corpus. Participants will work with the Cambridge English Assessment Scale and evaluate preparation strategies. Learner data from the English Grammar Profile will illustrate useful grammatical development.

Electronic theses online – developing domain-specific corpora from open access
Alannah Fitzgerald (Concordia University) & Chris Mansfield (Queen Mary University of London)
Research findings will be presented from a study into the development and evaluation of domain-specific corpora from the Electronic Theses Online Service (EThOS) at the British Library. These collections were built using the interactive FLAX open-source language software for uptake in English for Specific Academic Purposes (ESAP) programmes at Queen Mary University of London.

Friday 15 April
Grammar for academic purposes
Louise Greenwood (Zayed University, Dubai)
Does an explicit focus on grammar help our students? If so, which grammatical structures should we focus on? This talk will argue that form-focused instruction is valuable and that careful selection of structures based on evidence from a corpus is essential in order to plan a targeted syllabus that meets the needs of students preparing for higher education.

Teacher-driven corpus development: the online restaurant review
Chad Langford & Joshua Albair (University Lille 3, France)
We present our project to develop a user-friendly, high-quality corpus of online restaurant reviews, which we consider a specific genre. Our goals are threefold: to present the genesis and results of our project; to elaborate on concrete pedagogical applications (concerning lexis, discourse, grammar and genre-based writing); and to foster collaboration between colleagues eager to develop and share corpora.
Forum on using corpora in the classroom

Guiding EAP learners to autonomously use online corpora: lessons learned
Daniel Ruelle (RMIT University Vietnam)
This presentation outlines the lessons learned from an initiative to guide upper-intermediate EAP learners to independently use online corpora to improve their written lexical range and accuracy. Experienced and less-experienced educators will leave with a better understanding of the benefits and challenges of training learners to use corpora, and several online tools and practical resources to use with their learners.

Learning academic vocabulary through a discovery-based approach
Nicole Keng (University of Vaasa, Finland)
This talk will examine the effectiveness of using corpora to learn academic vocabulary. The learning experiences and vocabulary knowledge of two groups of Finnish students will be compared. The findings will show how a discovery-based approach to academic vocabulary acquisition can profitably be embedded in EAP course design in a Finnish university context.

Exploring EAP teachers’ familiarity and experiences of corpora
Rachel Peacock (University of Nottingham Ningbo China)
This talk will present findings of a questionnaire investigating 52 EAP teachers’ understanding and practical classroom experience of corpora. Results highlight that the pedagogical potential of corpus-based applications remains at the research level. To address this, three user-friendly online reference tools that can be used by students or teachers in various teaching contexts will be introduced.
Data-driven learning – 25 years on
Crayton Walker (University of Birmingham)
Tim Johns from the University of Birmingham came up with the term Data Driven Learning (DDL) to describe the different ways language teachers can use corpora and corpus-based evidence in the classroom to support learning. In this workshop, I revisit DDL in order to find out how the methodology can be used with the online resources we currently have available.

Chatting in the academy: exploring spoken English for academic purposes
Michael McCarthy (Cambridge University Press)
How does spoken academic English typically differ from academic writing in university settings and how might this influence EAP materials? Using illustrations from corpora, this talk will focus on some key differences to be taken into account when planning materials. Practical examples will be drawn from the
new edition of Academic Vocabulary in Use and from Viewpoint (both CUP).

Skylight interview with Gill Francis & Andy Dickinson

Skylight is a relatively new corpus interface designed with teachers and students in mind. Gill Francis one of the developers kindly answered some questions. The news about forthcoming suggestions for classroom activities is something to look forward to as well as the collocation feature. It is interesting to note that Gill is very much in favour of the use of keyword in context (KWIC) concordance lines. Others such as the FLAX language learning team see KWICs as more of an hinderance and propose their own novel interfaces.

Can you share a little of your background?

Andrew Dickinson is a software writer who is interested in the use of corpora in the classroom and Gill Francis (that’s me) is a corpus linguist. In 1991 I joined the pioneering Cobuild project as Senior Grammarian. Cobuild was founded in 1980 by Professor John Sinclair (University of Birmingham). Its aim was to compile and investigate huge collections of written and spoken language in order to produce a range of dictionaries and grammars for learners that reflect how English is actually spoken and written today. My interest and direction in corpus linguistics owes everything to John Sinclair and our colleagues at Cobuild.

The Bank of English corpora grew to about 450 million words by the late 1990s. We used a fast, versatile, and powerful corpus analysis tool called ‘lookup’. As a grammarian, I was responsible for the grammatical information in the second edition of the Collins Cobuild Advanced Learner’s Dictionary (1995), along with Susan Hunston and Elizabeth Manning. The three of us also wrote the Cobuild Grammar Patterns series (1996, 97, and 98). All these publications reflected a detailed study of corpus evidence.

I’ve continued to work and publish in corpus linguistics since leaving Cobuild. (A list of publications is available.) Then a few years ago I got together with Andy to design Skylight, a program with a clear, easy interface for use by teachers and learners. Since then we have presented Skylight at various corpus linguistics conferences and seminars, and are currently developing it for more general release.

You are targeting classroom use by teachers with Skylight so what do you hope to bring that other corpus tools don’t?

1 – A clear, simple interface

Skylight has a clear, visually attractive interface. The query language is simple and intuitive, and can be learned in a couple of minutes. You can make a query by simply typing in a word or phrase without any special spacing or punctuation, for example “in my opinion” or “in the middle of” or “it’s a case of”.

To vary any word in the query, you use a pipe: “in my|his|her opinion”, or “in the middle|midst of”.

If you want to vary the query and see the range of words in a particular phrase or frame, you use one or more asterisks, for example “in my * opinion” will return “in my humble opinion”, “in my honest opinion”, “in my personal opinion” and so on.

This is about as complex as the query language gets – click on the User Manual from any page of Skylight to see examples of each kind of query. The rules are few and easily mastered by teachers and learners.

2 – Fast, easy alphabetical sorting

If you want to sort concordance lines to the right, or the left, you just click on a button above the lines. This helps you to see at a glance what the right-hand or left-hand collocates of a word or phrase are.

3 – Worksheets and classroom activities

If you are a teacher, you can use Skylight to prepare your own worksheets for corpus-based language activities. When you receive the results of a query, you can tailor the lines to fit your teaching point. This means that you can show only the lines you want, or hide those that you don’t, by clicking or entering text. You can copy the result into Word or another application using the Copy to Clipboard button. The results appear as a neat table, properly displayed and ready for your use. See the User Manual for further details and lots of examples.

Ideally, too, teachers and learners would be able to access a corpus at any point during a class, whenever they want to investigate how a word or phrase is used in a range of real language texts and situations.

For initial guidance and ideas, we are also preparing a large number of suggestions for stand-alone classroom activities practising points of grammar, lexis, and phraseology. Some of these activities address language change and the tension between prescription and description in language teaching. We’ll let you know when we release the first batch of these.

4 – A range of corpora

There are several corpora already available on Skylight – choose any one from the drop-down menu. For example, there is a very large general corpus, ukWaC, which contains 1.4 billion words, as well as smaller corpora like the BNC, BASE, and VOICE. Then there are even smaller corpora – for example a corpus of all Shakespeare’s plays and sonnets that is particularly useful for school children studying English literature.

In addition, any corpus can be compiled in response to the needs of groups of users, such as English school children or intermediate level EFL students. This depends, of course, on copyright restrictions. For more information, see the final sections of the User Manual.

Which other corpus tools would you recommend for teachers either in the classroom or outside?

We don’t feel particularly qualified to answer this question. There a lot of tools that access huge corpora and are extremely useful to linguists and lexicographers, such as Sketch Engine; the COCA (a large corpus of American English) concordancer, and Lancaster’s Corpus Query Processor. If you look up ‘corpus’ and ‘classroom’ together in any search engine, there will be several hits, but we don’t know of anything that combines an easy-to-use interface with really good classroom applications. This doesn’t mean there isn’t anything of course!

What present and/or future do you see for Google as a corpus in language learning?

One of the drawbacks of compiled corpora, such as UkWaC and the BNC, is that they are a snapshot of how language is used at a particular time (or at successive times, if a corpus is updated on a regular basis). The gathering and cleaning-up of text can take many months, so all corpora – even the most recent – are necessarily out-of-date by the time they appear.

The only way to get today’s language today is to use the web as a corpus (see for example Birmingham City University’s WebCorp). This gives results in the KWIC (Key Word in Context) format, with the word or phrase in the centre. The results are not cleaned up or processed, however, which limits their usefulness in the classroom.

But Google itself won’t give you the output you need for focusing on a word or phrase, sorting it, or looking at collocations. You’ll get plenty of examples, of course, but they won’t be shown in the KWIC format. The KWIC display is probably the most important and exciting development in modern corpus linguistics, and you need it if you are to do real corpus-based language work in the classroom or anywhere else.

Anything else you would like to add?

You asked whether we intend to add information about collocation. We are experimenting with a display modelled on the ‘Picture’ technique used in the lookup software used for the Bank Of English, which shows where collocates appear in relation to the node (the central word or phrase) – whether they tend to occur before or after it, for example.

We call the collocation display ‘Searchlight’. The Searchlight display below shows that the most frequent words immediately after obvious are that, then reasons (plural), then choice, then reason (singular). The most frequent words two to the right are of, for, and is. And so on – the columns are not connected, of course; they simply give positional collocations.

The brilliant thing about ‘picture’ that we want to replicate is that you simply click on any word to go to the relevant concordance lines. So if you click on reasons, you’d get all the lines with the combination obvious reasons. So it gives you a subset of the lines, which can then be sorted and tailored in any way you like.


We will add Searchlight to the Skylight website as soon as possible, though we have not yet decided whether to add statistical information – probably not. In the meantime, I’d just like to say that in my many years of scrolling down concordance lines, I find that alphabetical sorting is a very good guide to the collocations of a word. I happened to search for the word intuitively recently, and returned 500 lines. If I sort them one to the right and scroll rapidly down, it’s clear that among the most frequent adjectives that follow it are appealing, correct, and obvious, while the verbs are know and understand. If I sort them one to the left, it is clear that one of the most frequent collocates is the verb be in various forms: ‘it is intuitively obvious’ and so on. Sorting one way and the other gives you a quick thumbnail sketch of a word, and is extremely useful.

So go ahead and try Skylight. And above all, click onto the User Manual, which tells you all you need to know and provides lots of examples of searches using different features.

A huge thanks to the Skylight team and do comment here about your opinions of the interface.

Thanks for reading.

Quick cup of COCA – compound words

A new quick cup of coca post, whayhay. Thanks to Mike Harrison (@harrisonmike) on Twitter who was asking about finding compound adjectives.

Here we can use wildcard asterix, with part of speech.

So say we were looking for adjectives starting with well, we could use [well-*].[j*] to give the following top ten results –

(click on words to see full search results)

To find all compound adjectives we would simply replace the first part of the compound with another wildcard asterix like so:


which gives us the following top 10 results:

(click on words to see full search results)

Similarly if you were looking for noun, adverb or verb compounds simply add the appropriate POS tag i.e. [n*], [r*] and [v*] respectively.

Note do double-check result in concordance lines as sometimes the POS tagging is off.

As an interesting aside a search for compound adjectives historically in COHA gives us a very nice ascending curve. Wonder what the significance of that is?

Compound adjectives over time in COHA (click on graph)

Finally do check out the previous quick cup of coca posts if you want help with searching in COCA.

IATEFL 2015: Recent corpus tools for your students

Jane Templeton’s talk 1 illustrated corpus use by using the wordandphrase tool 2. (Lizzie Pinard has a write-up of the talk 3). I have described using this and other tools on this blog, and there is a nice round-up of corpus tools written by Steve Neufield 4 that looks at just the word, ozdic, word neighbors, netspeak, and stringnet.

This post reports on some more recent tools you may not be aware of (but posted sometime ago in G+ CL community so do check that if you want the skinny early on:\) – WriteAway, Linggle, Skell, Netcollo.

I list them in the order I think students will find easy to use and useful.

1. WriteAway – this tool auto-completes words to help highlight typical structures, so for example it gives two common patterns for Jane’s example of weakness as weakness of something and weakness in something. The first example in pattern one includes the collocation overcomes.

WriteAway screenshot for word weakness


2. Linggle – one could follow-up with a search on Linggle which is basically a souped up version of just the word and uses a 1 trillion word Web based corpus as opposed to the much smaller BNC that just the word uses

It is interesting that overcome weakness is not listed:

Linggle screenshot for verb + weakness
Linggle screenshot for verb + weakness (click image to see results)

but a search for overcome followed by a noun shows that it occurs less than 1% in web pages:

Linggle screenshot for overcome + noun (click image to see results)


3. SkeLL from Sketch Engine is neat for its word-sketch feature so a look at weakness brings up a nice set of collocations and colligations in one screen:

SkeLL wordsketch for weakness (click image to see results)


4. NetCollo corpus tool can compare BNC, a medical corpus and a law corpus, this is useful if you are looking at academic language in medicine and law. For example using the example of weakness we see that it is much more common in BNC:

NetCollo result for weakness (click image to see results)

and we can see that the collocation with overcome only appears once in the Medical corpus.

As ever do try these tools out yourself and then show not tell, as Jane says, your students as and when the need arises in class. By the way do check out the integrative rationale for corpus use by Anna Frankenberg-Garcia5.

Thanks for reading.


1. IATEFL 2015 video – Bringing corpus research into the language classroom

2. Word and phrase.info tool

3. IATEFL 2015 Bringing corpus research into the language classroom – Jane Templeton

4. Teacher Development: Five ways to introduce concordances to your students

5. Integrating corpora with everyday language teaching

Fav the PHaVE Pedagogical List for the New Year

Great New Year news for teachers, a new word list of phrasal verbs, the PHaVE List (Garnier & Schmitt, 2014) finds that of the top 150 most common verbs there are only 288 meanings in total. That is on average about 2 meanings a phrasal verb. Consider that some estimates of the total number of phrasal verbs number it at nearly 9000.

You can try out the PHaVE Dictionary yourself.

What you will see are the 150 verbs ranked from 1 to 150 and their most common meanings.

The study used the following criteria to include verbs and their meanings:

For the top 150 verbs, each occurs at least 10 times per million. For a meaning to be included it needed to have 75 percent coverage in COCA-BYU and if the primary meaning did not reach this then secondary meanings of at least 10 percent were added until either 75 percent was reached or all 10 percent meanings used.

Thus 6 verbs have 4 meanings, 34 verbs have 3 meanings, 52 verbs have two meanings and 58 have one meaning.

As the study notes, in the user manual for the list, some of the verbs may well be easier to understand than others i.e. be more semantically transparent. A reminder to users that the list is a general guide and teachers, as ever, need to exercise their judgement.

If you want the raw lists go check out the G+ Corpus Linguistics community.

So do go on and set about exploring the PHaVE pedagogical list for the new year.

A huge thanks to all the readers for your support of the blog these past couple of years, here’s to more and better for 2015.


Garnier, M., & Schmitt, N. (2014). The PHaVE List: A pedagogical list of phrasal verbs and their most frequent meaning senses. Language Teaching Research, 1362168814559798. Retrieved January 15 2016 http://www.norbertschmitt.co.uk/uploads/pdf-(418-kb).pdf