Quick cup of COCA – compound words

A new quick cup of coca post, whayhay. Thanks to Mike Harrison (@harrisonmike) on Twitter who was asking about finding compound adjectives.

Here we can use wildcard asterix, with part of speech.

So say we were looking for adjectives starting with well, we could use [well-*].[j*] to give the following top ten results –

(click on words to see full search results)

To find all compound adjectives we would simply replace the first part of the compound with another wildcard asterix like so:


which gives us the following top 10 results:

(click on words to see full search results)

Similarly if you were looking for noun, adverb or verb compounds simply add the appropriate POS tag i.e. [n*], [r*] and [v*] respectively.

Note do double-check result in concordance lines as sometimes the POS tagging is off.

As an interesting aside a search for compound adjectives historically in COHA gives us a very nice ascending curve. Wonder what the significance of that is?

Compound adjectives over time in COHA (click on graph)

Finally do check out the previous quick cup of coca posts if you want help with searching in COCA.


Corpus linguistics community news 4

Another installment of the Google+ Corpus Linguistics Community news. In addition I include links to some visual aids that were designed to answer questions people in the second round of #corpusmooc posed.

Bigger is not necessarily better – Here I talk about some minor aspects of the paper, and not the more interesting aspects that is, the paper compiled a list of approximately 14000 pairs of collocations that are worth teaching. The list is not available as of yet. The other main finding is that frequency is more important than dispersion or chronological data when identifying collocations, with human judgement remaining a key factor in deciding on useful collocations.

Google as a corpus with students – Some interesting recent developments on using Google as a corpus with a short list of relevant online reading.

How to develop effective concordance materials using online corpus – My interpretation of a slideshow by a Korean researcher on using data driven learning materials.

#corpusmooc Visual Aids:

Tokens, Types, Lemmas, Word families

Genre Register

Collocations and Colligations

Multi-word expressions

Do check out the other corpus lingustics community news if you haven’t already.


Fav the PHaVE Pedagogical List for the New Year

Great New Year news for teachers, a new word list of phrasal verbs, the PHaVE List (Garnier & Schmitt, 2014) finds that of the top 150 most common verbs there are only 288 meanings in total. That is on average about 2 meanings a phrasal verb. Consider that some estimates of the total number of phrasal verbs number it at nearly 9000.

You can try out the PHaVE Dictionary yourself.

What you will see are the 150 verbs ranked from 1 to 150 and their most common meanings.

The study used the following criteria to include verbs and their meanings:

For the top 150 verbs, each occurs at least 10 times per million. For a meaning to be included it needed to have 75 percent coverage in COCA-BYU and if the primary meaning did not reach this then secondary meanings of at least 10 percent were added until either 75 percent was reached or all 10 percent meanings used.

Thus 6 verbs have 4 meanings, 34 verbs have 3 meanings, 52 verbs have two meanings and 58 have one meaning.

As the study notes, in the user manual for the list, some of the verbs may well be easier to understand than others i.e. be more semantically transparent. A reminder to users that the list is a general guide and teachers, as ever, need to exercise their judgement.

You can access raw lists.

So do go on and set about exploring the PHaVE pedagogical list for the new year.

A huge thanks to all the readers for your support of the blog these past couple of years, here’s to more and better for 2015.


Garnier, M., & Schmitt, N. (2014). The PHaVE List: A pedagogical list of phrasal verbs and their most frequent meaning senses. Language Teaching Research, 1362168814559798. Retrieved January 15 2016 https://afa4be34-0fda-46d9-8e64-5adf13d4216b.filesusr.com/ugd/5f2482_6f61568a40834f168f9424e0cf7d4448.pdf

Corpus linguistics community news 2

Another round of links (here’s first round) to some posts on the G+ corpus linguistics community to notify peeps interested in this sort of thing:

Using BAWE corpus in SketchEngine as a learner corpus (see 1st comment to post)

The Desolation of Smaug and the BYU COCA

Dogme concordancing

The Disabled Access Friendly Campaign, corpus use literacy and wordandphrase.info

Quick cup of COCA – lemma and POS

I was reading the following which is part of a forum discussion by a French poster:

This is clearly more complicated to port, but the benefit can be very important,

OpenPandora Boards comment

It caught my attention as I am interested in the uses French speakers of English make of the word important (e.g. see here). Often they use it instead of an appropriate size adjective, so in this case the forum poster could have written – the benefit can be large.

However the construction was still sounding a little odd to me, so I used COCA to look at the collocates of the noun of the lemma benefit – [benefit].[n*]. A lemma is all forms of the word and is indicated by square brackets. The part of speech can be selected from the POS (part of speech) List drop down box. To use a POS like this, you need to append it with a dot (full stop) to the word you are looking at.

From the results of this search, the rank 6 collocate is potential. Of course! Duh! That’s why the benefit can be sounds odd, whereas potential benefits are  would sound better.

Now you may be saying I did not need COCA to figure that out, sure I could have mulled it over the morning but COCA allowed me to get on with other trivial things than puzzling over this particular one. 🙂

That’s it for another quick cup of COCA. And if you haven’t already you can read some more quick cup of COCA posts.

The Tinkerer – a corpus informed video activity

This post is a response to Vicki Holletts’ hosted ELT blog carnival on the theme of teaching and learning using videos. I had sent in a previous video activity but seeing as that was a bit old a new one was in order. The video is most suited for engineering/technical students.

This is a video activity that is also a little corpus informed. The lead-in is words taken from COCA using its synonym function. So in this case the search term was [=tinker]. I have included my transcription so that variations/extensions can be done such as gap-fills for detailed listening, or noticing spoken grammar. The jumbled text was made from Textivate.

1. Dictate the follow words to the class (the numbers are the rank order frequency from COCA):

tamper(7), fix(2), toy(4), fiddle(6), mend(8), play(1), interfere(5), repair(3).

2. These words are synonyms of this word T_ _K_R. //write gapped word on board, Tinker// What’s the word?

3. What do you call a person who does this? //Tinkerer; check that they understand the word, room here to personalise e.g. do you like tinkering?//

4. Re-arrange the text (that goes with the picture) into the correct order:

original text from Hackaday; scrambled text from Textivate

Original text from Hackaday.

5. Watch the video. //approx 8 mins//

6. What word from the list do you think is the best synonym for JJ, the tinkerer? //you could comment on the rank order frequency of the words if most students pick play as best synonym//

7. Why do you think JJ says things are too easy now?

8. Do you agree with him? Why/why not?


When I was growing up, we grew, uh we grew up in the country. I didn’t have a whole lot. Uhm, my dad is very mechanical, uhm he owned a motorcycle shop when I was growing up. So a lot of what I worked on was with engines. Yeah if a go-kart breaks I would have to fix it myself. And sometimes it was held together with bailing twine and stuff just so that I could ride it, but.

It was in West Virginia [laughs] and I picked up a runt bicycle, a bicycle with little tiny wheels. Monkey bikes or whatever they’re called. Picked one up at a yard sale for about five bucks. And I put an engine on it. And I left the bike the way it was. So it was still a pretty big size. And then I thought to myself I’m gonna make it smaller. And then I cut the frame it half. And then I welded a bunch of stuff on there, a little tiny swing arm and used the wheels off a go-ped, uh the sprocket and chains off a go-ped the engine’s off a wheat eater. It’s a micro-bike I like that. That one’s lingered I, I’ve had that a longtime now and it just keeps going.

Yeah, yeah I do a lot of just research on the internet, or uh random stuff. I’ll get on tangents on scientific topics or, or on something engine related or on some sort of hacking thing. I’ll just absorb knowledge I suppose. I normally, I’ll have some sort of inspiration or see a video or something that, I’m like I gotta do that. Or I’ll do something similar or  beat it or something like that. In fact I gotta an idea. You got, you got a rolly chair and there’s a leaf blower right there. Do we want to interrupt this interview, and? [laughs] See if it works. Nope. Oh well. [laughs] It was stupid. But now we know.

And sometimes I feel like tinkering with engines and sometimes it’s that and I keep focus all my attention on that. And sometimes it’s something electronic. And sometimes it’s sumthin else. It’s just that, it varies. Right now it’s the Tesla coil ’cause I’ve been working on it a while all week. That’s, that’s my top priority. That’s what I’ve been researching. I dunno I saw a Tesla coil video, I think, on the internet when I was a teenager. And I just thought I gotta build one of those. I got my son now, and slowed, slowed down my projects. But that’s okay. He is a project, he’s a good project. I’m forming him, in, into what I want [laughs]. Did he do it? Yeah he did.

I, I’ve always had a knack for finding really good deals and stuff, like I’m good at negotiations, I’m good at spotting things that are worth money at thrift stores. It started at thrift stores. Uhm, go there and I would just see stuff that other people wouldn’t recognize. And I clean them up makes sure they work. Go through it, just resell it on Ebay.

You heard that? They shake their body. And hiss like that to sound like a rattlesnake. But you see there’s no rattle. I’ve always, I, I’ve always been a really really curious person. Hafta explore things if I see sumthin I sometimes have to just pull over and hafta look. I’d be the guy you want in a zombie apocalypse that’s for sure. [laughs] Cause I’m very uh, I’m very resourceful. I can pretty much make anything happen with whatever I’ve got on hand.

It’s too easy now. Like back in the day when you wanted a radio. Like you wanted a transmitter or something, you build it. People don’t build them now, you just go out ‘an buy it. You don’t hafta learn how it works, you just use it. Same with computers, back in the eighties and stuff you hadta know how the computer worked before you could just use one. So, stuff’s too easy nowadays.

Thanks to the ELT blog carnival for the inspiration.

Teaching writing with the aid of COCA – guest post by Monika Sobejko

I am very pleased that Monika Sobjeko agreed to share some of her experiences of using corpora in class. If you have used corpora in your teaching and would like to write about it get in touch or send me a link to a post you have written or will write. And don’t forget the Google+ group (link in menu above) to link up with others interested in the language teaching and learning aspects of corpora. Over to you Monika.
I wish to begin with a word of warning: COCA is a highly addictive tool. I’ve spent many happy hours, using it both in and out of the classroom. I think it’s been a good choice, though there are some limitations, of which I’ll say a few words later.  COCA is freely available and it’s got a user-friendly interface (with lots of help menus whenever and wherever you need one). It’s also linked to related corpora, including the British National Corpus (BNC), and other resources such as WordAndPhrase.

Initially, I explored it mostly for fun, but I quickly realized that it can considerably speed up the process of preparing exercises and tests for my students. It offered plenty of authentic examples of lexical items and grammatical structures – COCA is huge (currently, 450 million words). I only had to sift through concordance lines, eliminating (or slightly adapting) those that would be too difficult for a particular group of students.

Still, I was somewhat reluctant to try any ‘hands-on’ activities with my students, using COCA directly in class. Our syllabi simply do not allow us any leeway, so to speak – no fooling around and wasting our students’ time with any ‘experimental tools’. However, I was teaching an academic writing class, and I thought that it could be a valuable tool for my students – side by side with dictionaries. After all, I’d been using COCA myself to confirm my poor, non-native speaker intuitions about language… If I found it valuable, so could they.

I also did a quick literature review and found some support in Ken Hyland’s positive approach to the use of corpora in the writing classroom. He claims that “the use of corpora and concordancing offers one of the most exciting applications of new technologies to the writing classroom” (Hyland, 2010: 167), and – now, having used it myself  with an academic writing class – I couldn’t agree more.

I was going to use the corpus as “a reference tool” (Hyland, 2010: 170) and was hoping it could be more or less seamlessly integrated into what we were doing in class. Ana Frankenberg-Garcia (2012, pp.41-42) describes a similar approach – she even claims that there is no need to formally train students how to use a corpus. Speaking of integration, you can find very interesting practical ideas about integrating corpora into production activities (writing and speaking) in this post by eflnotes.

Initially, I showed students how to do basic, most useful searches. While writing collaboratively or individually, they were often hesitant about using a particular word or phrase or wanted an alternative to what they already knew. At first, I was doing most of the searches for them, but soon they started performing their own – with or without my help. All we needed was a computer with internet access. They were learning a new skill ‘on the fly’ – exactly when they needed it and as much as they needed in a particular moment.

Basically, four types of searches were most useful, or most often performed by my students: 1) frequency search across different registers; 2) collocations search; 3)synonyms search; 4) word comparison search. I will give some examples of searches 1) and 4) done  by my students.

Frequency searches

Frequency searches across different registers of COCA were particularly useful when students wanted to use a word, but were not sure whether it is ‘formal enough’ for the academic register. Here’s an example of a search for the phrasal verb “boil down”. The student who was doing the search wasn’t sure whether or not he could use the verb in question – he’d been told in the past that in a more formal style, the use of phrasal verbs should be avoided. The settings for the search looked like this:

boil down

A quick search revealed that  “boil down” does occur in the academic register, though not very frequently (138 tokens only, with the frequency of 1.52 per million words), so far less frequently than in other registers, as you can see here:


However, while interpreting this, I think we must remember that, as Hunston puts it, “a corpus will not give information about whether something is possible or not, only whether it is frequent or not” (2010: 22). Ultimately, we have to rely on native speaker intuition to decide whether something is acceptable English or not. Still, I would argue that a lot of helpful information can be collected from frequency searches to help student writers make well-informed choices.

The search for synonyms was a life-saver sometimes – but I’m really not sure whether  a thesaurus wouldn’t be enough for those particular searches. Often, however, they were then followed by other queries – when a student needed to look at more concordance lines to get a better ‘feel’ for the newly discovered synonym, or to better understand the differences between two synonyms. Do look here at eflnotes for an excellent example of such a life-saving search for a synonym.

Word comparisons

A very interesting option offered by  COCA are word comparison searches. Below are the settings for such a word  comparison search – between two noun + preposition combinations: “change of” and “change in” as well as the results of that search.




On the basis of this information, my students concluded that “change of” meant that one thing was substituted for another, and “change in” implied a difference occurring in something, and hence – it was better to write ‘a change in temperature’ rather than ‘a change of temperature’. No dictionary could help us with that problem.

Student thoughts

Finally, I’ll just give voice to my students – most of those who responded to a short survey at the end of the course found the corpus useful (9 out of 12 students), and only one – not useful. And, surprisingly, most of them (eight students) admitted to using COCA both in and out of class. Many commented on the use of COCA in a positive way. Were they just being nice to the teacher while filling in the survey? I wonder.

Here’s a sample of their comments:

(…) I suggest organizing two or even three lessons only for learning how to exactly use COCA

It seems to be a great and a very helpful tool in writing articles. I have to admit that sometimes I had  got problems with using all its applications (…)

(…) on the plus side I’d mention the methods and tools used during the course. Especially original was the usage of the Corpus of American English, a tool I have not been aware of before attending the course (…)

an interesting tool, but you need to have effective  methods of working with it, so you must have some experience. I also think that the database of discipline-specific texts (for example, physics) is not developed well enough to reliably show how some rare words are used

The last comment reflects my own experience of using COCA – namely, it is a general corpus. If your students are interested in a specific field or in writing specific types of texts (genres), building a small, specialist corpus might actually be a better option.

Thanks for reading.


Frankenberg-Garcia, A. (2012). Integrating corpora with everyday language teaching. In: Thomas, J. and Boulton, A. (Eds.) Input, Process and Product: developments in teaching and language corpora. Brno: Masaryk University Press. 33-50. Retrieved from http://www.academia.edu/3368339/Integrating_corpora_with_everyday_language_teaching

Hunston, S. (2010). Corpora in Applied Linguistics. (7th ed.). Cambridge: Cambridge University Press.

Hyland, K. (2010). Second Language Writing. (8th ed.). Cambridge: Cambridge University Press.

Monika is a graduate of the Jagiellonian University in Krakow, Poland where she teaches EFL and ESP to students of archaeology and computer science. She also teaches an academic writing course to graduate students of the university. Her main interests include exploring effective ways to teach writing in a foreign language and using language corpora in EFL. She tweets @SobejM.

No time for corpora? No worries!

For the majority of the ELT world coursebooks and syllabi dominate, consequently teachers have little time for anything unrelated to what they teach from a book and from their set syllabus. This is arguably one of the reasons for the low take up of corpus based teaching.

Frankenberg-Garcia (2012) helpfully outlines several ways teachers can easily integrate corpus information into the classroom without having to outlay much time investment (she does though assume that the teacher knows about corpora, can access them easily and knows the principles of corpus queries, Frankenberg-Garcia, 2012, p.35).

She divides approaches based on production vs reception activities and whole-class vs individual activities.

I have written about reception (e.g. Just the word and TOEIC), whole-class (e.g. general English lexis and DIY corpus) and individual activities (e.g. GloWbE and will suit you; do also see a recent post by Chia Suan Chong/@chiasuan on encouraging learner autonomy via corpora), what caught my attention was the description of the use of corpora in production activities.

Note: I was initially alerted to the Frankenberg-Garcia paper by Wilson (2013), another recommended read for corpora based teaching.

Frankenberg-Garcia gives the example of using collocations of the word beach as a warm-up to speaking or writing about beach holidays.

Looking at Unit 1 Careers in the Cambridge Target Score book (Talcott & Tullis, 2007), Wordandphrase.info gives us the following for career: wordandphraseinfo-career-collocates (click on image for larger resolution)

From the collocates (circled in red above) we can compile say the following list:

  • professional career, successful career
  • career choice, career path
  • begin career, build career

and ask students to use the list to speak say about their current career path, if they know what professional career they want to follow, if so do they know how to build their career and so on. You could give fast finishers the list of synonyms:

  • business
  • profession
  • occupation
  • livelihood
  • calling
  • vocation

and ask them how they would use these when talking about careers.

More interestingly she describes using concordances for the bus that are given to students before they write about something happening on a bus. As the screenshot shows she also highlighted some potentially useful phrases with the bus: the bus concordances (Frankenberg-Garcia, 2012, p.40)

Adapting this for the TOEIC we can use the keyword contract negotiation(s) as appears in Unit 1 Exercise 1 page 9. An extension to this exercise would ask students to write a short news report of the contract negotiation using the picture from the exercise as a prompt: contract-negotiation

(Talcott & Tullis, 2007, p.9)

COCA tells us contract negotiation(s) is most frequent in the news register which can guide us in selecting what examples to use. Wordandphrase.info gives concordances to use to help students before the writing task (note some sentences are adapted and not exact example given by Wordandphrase.info):

  1. They were participating  as  mediators  in  contract negotiations and monitoring  growers’ compliance with labor contracts.
  2. This is specifically  for  contract negotiations and  recruitment.
  3. More than  two  weeks  of  contract negotiations between Air Canada and its pilots broke off this Friday.
  4. The  contract negotiations had   been   confidential.
  5. Trouble has arisen  over  his  fierce  contract negotiations with the management.
  6. They averted a strike and completed the union’s  contract negotiations with the three major North American car makers.
  7. The strike began last October after 10  months  of  stalled  contract negotiations.
  8. During  contract negotiations a few years later, resentment ran high .
  9. Randy  Mueller  handled  contract negotiations and   made   all   personnel  decisions.
  10. They attempted to force a new round of contract negotiations.

Students can be asked to highlight words related to contract negotiations e.g. mediators in example 1 above. They can then proceed to the writing exercise.

It is worth looking up Frankenberg-Garcia in full as she makes a great case for teachers to integrate corpora into the classroom. Thanks for reading.


Frankenberg-Garcia, A. (2012). Integrating corpora with everyday language teaching. In: Thomas, J. and Boulton, A. (Eds.) Input, Process and Product: developments in teaching and language corpora. Brno: Masaryk University Press. 33-50. Retrieved from http://www.academia.edu/3368339/Integrating_corpora_with_everyday_language_teaching

Talcott, C. & Tullis, G. (2007). Target Score: A communicative course for TOEIC Test preparation. (2nd ed.). Cambridge: Cambridge University Press.

Wilson, J. (2013). Technology, pedagogy and promotion: How can we make the most of corpora and Data-Driven Learning (DDL) in language learning and teaching? Higher Education Academy research report (July 2013). Retrieved from https://www.heacademy.ac.uk/sites/default/files/Corpus_Technology_pedagogy_promotion2.pdf

GenGen – a tool to encourage playing with COCA?

Many teachers may not use reference corpora directly in class but may do so for personal language development. One way to encourage more of this use is to play with tools such as COCA. @tinysubversions recently released a web tool GenGen which allows you to generate sentences with variable slots.

At about the same time @mikeharrison tweeted this –

Swimming pools are not places for chatting.

This makes a great sentence frame – X are not places for Y. We can use COCA to look for relevant Xs and Ys, in this case plural nouns and -ing verbs. The COCA code for plural nouns is [*nn2*]  and [v?g*] for  -ing verbs. People plural nouns won’t of course make as much sense in our sentence frame as inanimate plural nouns. We have more choice when choosing -ing verbs.

Here is a quick example:


Having such playful immediate feedback on corpora searches using a tool like GenGen may prompt teachers to further explore the corpora playground.

This has been a quick fire blog post. Apologies if it does not seem to make much sense. I hope to refine my thoughts later 🙂

Quick cup of COCA – quantifier/determiner + preposition + relative pronoun

As part of teaching relative clauses, getting good examples of a structure such as  one of which, many of which, some of whom, i.e. quantifier/determiner + preposition + relative pronoun had always been a bit tricky. Recently I used COCA to help me find some useful sentences.

The appropriate search term is [mc*]|[d*] of which|whom|whose|who.

[mc*] is the tag for singular cardinal number

[d*] is the tag for determiners

| is the syntax for OR operator

See here for the full list of the parts of speech tags, but usually the POS (part of speech) list drop down box is sufficient. And see here for info on the search syntax.


Click on above image to see results.

The results show that all of which, many of whom, and some of which are the top three.

Some of the interesting examples in the academic register are:

1. This study suggests several directions for further work, some of which we have already begun to investigate.

2. Bottlenecks at the Internet’s edge can easily move between the wireless access (when its bandwidth is low) and the provider s uplink, both of which can have highly variable bandwidths.

3. In its next generation of development, the Internet could make its way onto a wider range of instruments, all of which will offer viewers far sharper images, a much quicker connection, and a more reliable service than at present.

There are plenty of other avenues to explore here but that would not be a quick cup of COCA :).