Locating collocation

The Wikipedia entry on collocation says:
“..a collocation is a series of words or terms that co-occur more often than would be expected by chance”. 1
This is the description of collocation that is linked to by leaders in corpus tools SketchEngine in their syllabus for a new online course.2

Note the statistical aspect in the definition “more often than would be expected by chance”.

The wiki entry then reads “There are about six main types of collocations: adjective + noun, noun + noun (such as collective nouns), verb + noun, adverb + adjective, verbs + prepositional phrase (phrasal verbs), and verb + adverb. “

Note the emphasis on the grammar aspect of collocation.

Bill Louw would place this wiki definition (alongside Goran Kjellmer’s definition of collocation – ‘sequence of words that occurs more than once in identical form…and which is grammatically well structured’) at the bottom of the diagram below:

(Louw & Milojkovic 2016: 53)

The diagram shows two dimensions, the vertical dimension is how restrictive a view of collocation is with the most restrictive at the bottom and the least at the top. The horizontal dimension shows how much of the language a view of collocation covers, the top bulb of the diagram is larger than the bottom bulb.

Louw & Milojkovic (2016) argue that the link of collocation to context of situation is of great importance in applications of corpora in literature studies i.e. corpus stylistics.

Context of situation was illustrated by Firth in the following way:

“In his article ‘Personality and language in context’ Firth offers us what he calls a typical Cockney event in ‘one brief sentence’.
‘Ahng gunna gi’ wun fer Ber’. (I’m going to get one for Bert)
What is the minimum number of participants? Three? Four? Where might it happen? In a pub? Where is Bert? Outside? Or playing darts? What are the relevant objects? What is the effect of the sentence? ‘Obvious!’ you say. So is the convenience of the schematic construct called ‘context of situation’. It makes sure of the sociological component.” (Firth 1957: 182 as quoted in Louw & Milojkovic, 2016:61, emphasis added)

Awareness of the importance of context of situation is reflected in the following small Twitter poll where a majority of the 24 respondents opted for “meanings have words” over “words have meanings”:

Twitter poll

Although Louw concedes a view of collocation such as ngrams can reveal contexts of situation, opportunities to do so will be much rarer than if collocation is located near the top of the diagram – “abstracted at the level of syntax” as Firth put it.

Context of situation is also of great importance in language teaching and learning. For example task based teaching can be said to lay great weight on context of situation.

As Louw & Milojkovic (2016:26) put it :

“The closer collocation’s classifications are to context of situation, the more successful and enduring will be the approach of the scholars who placed them there. The more the term is constrained by the notion of language ‘levels’ and the linearity and other constraints of syntax, the less such classifications and the theories perched upon them are likely to endure. The reason for this is, as we shall see, that collocation takes us directly to situational meaning and acts as what Sinclair refers to as the ‘control mechanism’ for meaning”

Thanks for reading.


  1. Wikipedia Collocation https://en.wikipedia.org/wiki/Collocation
  2. Boot Camp online https://www.sketchengine.eu/bootcamp/boot-camp-online/#toggle-id-2


Louw, B., & Milojkovic, M. (2016). Corpus stylistics as contextual prosodic theory and subtext (Vol. 23). John Benjamins Publishing Company.

Discovering English with SketchEngine – James Thomas interview

2015 seems to be turning into a good year for corpus linguistics books on teaching and learning, you may have read about Ivor Timmis’s Corpus Linguistics for ELT: Research & Practice. There is also a book by Christian Jones and Daniel Waller called Corpus Linguistics for Grammar: A guide for research.

This post is an interview with James Thomas,, on Discovering English with SketchEngine.

1. Can you tell us a bit about you background?

2. Who is your audience for the book?

3. Can your book be used without Sketch Engine?

4. How do you envision people using your book?

5. Do you recommend any other similar books?

6. Anything else you would like to add?

1. Can you tell us a bit about your background?^

Currently I’m head of teacher training in the Department of English and American Studies, Faculty of Arts, Masaryk University, Czech Republic. In addition to standard teacher training courses, I am active in e-learning, corpus work and ICT for ELT. In 2010 my co-author and I were awarded the ELTon for innovation in ELT publishing for our book, Global Issues in ELT. I am secretary of the Corpora SIG of EUROCALL, and a committee member of the biennial conference, TALC (Teaching and Language Corpora).

My work investigates the potential for applying language acquisition and contemporary linguistic findings to the pedagogical use of corpora, and training future teachers to include corpus findings in their lesson preparation and directly with students.

In 1990, I moved to the Czech Republic for a one year contract with ILC/IH and have been here ever since. Up until that time, I had worked as a pianist and music teacher, and had two music theory books published in the early 1990s. Their titles also beginning with “Discovering”! 🙂

2. Who is your audience for the book?^

The book uses the acronym DESKE. Quite a broad catchment area:

  • Teachers of English as a foreign language.
  • Teacher trainees – the digital natives – whether they are doing degree courses or CELTA TESOL Trinity courses.
  • People doing any guise of applied linguistics that involve corpora.
  • Translators, especially those translating into their foreign language. (Only yesterday I presented the book at LEXICOM in Telč.)
  • Students and aficionados of linguistics.
  • Test writers.
  • Advanced students of English who want to become independent learners.

3. Can your book be used without Sketch Engine?^

No. (the answer to the next question explains why not).

Like any book it can be read cover to cover, or aspects of language and linguistics can be found via the indices: (1) Index of names and notions, (2) Lexical focus index.

4. How do you envision people using your book?^

It is pretty essential that the reader has Sketch Engine open most of the time. Apart from some discussions of features of linguistic and English, the book primarily consists of 342 language questions/tasks which are followed by instructions – how to derive the data from the corpus recommended for the specific task, and then how to use Sketch Engine tools to process the data, so that the answer is clear.

Example questions:
About words
Can you say handsome woman in English?
Do marriages break up or down?
How is friend used as a verb?
Which two syllable adjectives form their comparatives with more?
Do men say sorry more than women?

About collocation
I’ve come across boldly go a few times and wonder if it is more than a collocation.
It would be reasonable to expect the words that follow the adverb positively
to be positive, would it not?
Is there anything systematic about the uses of little and small?
What are some adjectives suitable for giving feedback to students?

About phrases and chunks
Does at all reinforce both positive and negative things?
What are those phrase with lastleast; believeears; leadhorse?
How do the structures of to photograph differ from take a photo(graph),
guess with make a guess, smile with give a smile?
Which –ing forms follow verbs like like?

About grammar
How do sentences start with Given?
Who or whom?
Which adverbs are used with the present perfect continuous?
Do the subject and verb typically change places in indirect questions?
How new and how frequent is the question tag, innit?

About text
Are both though and although used to start sentences? Equally?
How much information typically appears in brackets?
Does English permit numbers at the beginning of sentences?
Is it really true that academic prose prefers the passive?
In Pride and Prejudice, are the Darcies ever referred to with their first names?

There is an accompanying website with a glossary – a work eternally in progress, and a page with all the links which appear in the footnotes (142 of them), and another page with the list of questions, which a user might copy and paste into their own document so that they can make notes under them.

5. Do you recommend any other similar books?^

The 223 page book has three interwoven training goals, the upper level being SKE’s interface and tools, the second being a mix of language and linguistics, while the third is training in deriving answers to pre-set questions from data.

AFAIK, there is nothing like this.

6. Anything else you would like to add?^

In all the conference presentations and papers and articles that I have seen and heard over the years in connection with using corpora in ELT, with very few exceptions teachers and researchers focus on a very narrow range of language questions. When my own teacher trainees use corpora to discover features of English in the ways of DESKE, they realise that the steep learning curve is worth it. They are being equipped with a skill for life. It is a professional’s tool.

Sketch Engine consists of both data and software. Both are being constantly updated, which argues well for print-on-demand. It’ll be much easier to bring out updated versions of DESKE than through standard commercial publishers. I’m also expecting feedback from readers, which can also be incorporated into new editions.

My interests in self-publishing are partly related to my interest in ICT. This book is printed through the print-on-demand service, Lulu.com. One of the beauties of such a mode of publishing is the relative ease with which the book can be updated as the incremental changes in the software go online. This is in sharp contrast to the economies of scale that dictate large print runs to commercial publishers and the standard five-year interval between editions.

There is a new free student-friendly interface which has its own corpus and interface, known as SKELL which has been available for less than a year. It is also undergoing development at the moment, and I will be preparing a book of worksheets for learners and their teachers (or the other way round). I see it as a 21st cent. replacement of the much missed “COBUILD Corpus Sampler”.

Lastly, I must express my gratitude to Adam Kilgarriff, who owned Sketch Engine until his death from cancer on May 16th, at the age of 55. He was a brilliant linguist, teacher and presenter. He bought 250 copies of my book over a year before it was finished, which freed me up from other obligations – a typical gesture of a wonderful man, greatly missed.

Many thanks to James for taking the time to be interviewed but pity my poor wallet with some very neat CL books to purchase this year. James also mentioned that, for a second edition file, Chapter 1 will be re-written to be able to use the open corpora in SketchEngine.

Corpus linguistics community news 2

Another round of links (here’s first round) to some posts on the G+ corpus linguistics community to notify peeps interested in this sort of thing:

Using BAWE corpus in SketchEngine as a learner corpus (see 1st comment to post)

The Desolation of Smaug and the BYU COCA

Dogme concordancing

The Disabled Access Friendly Campaign, corpus use literacy and wordandphrase.info