This corpora-bashing parrot has ceased to be

Hugh Dellar’s recent What have corpora ever done for us post dismisses the hype behind corpora that was prevalent a few years back with typical gusto. I would like to look at some of the issues raised.

It is curious that his support of teacher intuition over the use of corpora seems to contrast with his support of coursebooks over teacher intuition in his dogme posts. Gabrielatos (2005) describes the example of when a teacher’s intuition that tag questions belonged to the “bowler-hat” past of English use clashed with a finding that one in four questions in dialogues was a question tag.

Another of Dellar’s objections echoes Widdowson’s dichotomy between genuine texts and authentic texts, as cited in Tribble (1997). Concordance lines from corpora represent instances of genuine language use, the products of language communication. This language contrasts with discourse texts which are authentic and represent the process of language communication. Learners need to construct a relationship with language materials so concordance lines need to be filtered so as to be useful in the classroom, what Widdowson calls pedagogic mediation.

A related concern is between indirect uses of corpora by commercial publishers and direct uses by learners and teachers.

Both of these concerns are being addressed by specific corpora such as the Backbone pedagogic corpora for content and language integrated learning; MICASE corpus of academic spoken English, and by the wider availability of general corpora such as COCA (corpus of contemporary American English); BNC (British national corpus).

For instance Dellar’s question regarding [get on with it] and [let’s get down to business] can be answered by using the Phrases in English tool which uses the BNC. Here we find that [get on with it] appears 401 times (4.11 instances per 1 million words) vs 2 times (0.02 instances per million words) for [let’s get down to business].

The Backbone collection is very interesting as it provides a thematically focused database of spoken text for 5 languages plus English as lingua franca, backed up with an assortment of learning resources. The English corpus includes 50 interviews which are annotated for topic, grammar and lexis. This annotation goes some way to address the problem of the way text is coded.

Braun (2005) describes using a small corpus as a way to mediate pedagogically between corpora and learners using “coherent and relevant content, a restricted size, a multimedia format and a pedagogic annotation of the corpus”(Braun, 2005, p61).

The use of home-made corpora is another way to attack the issue of authenticity. I will detail my use of the TextSTAT tool and similar software to build up a corpus of material for multimedia students in a later post (Update: see this series of posts). Although it takes some work teachers can build up formal databases to complement their experience-based intuitive database.

Two other criticisms not mentioned by Dellar are that corpora promote both a bottom up processing of text (vs a top down processing) and an inductive (vs deductive) approach to learning. Flowerdew (2009) discusses these and concludes that top down processing can be used with corpus data and that a mixed approach be used combining elements of a deductive approach into the inductive approach.

Finally turning to learning effects, Oghigian and Chujo (2010) found beginner students improved significantly on all six question types in pre/post test scores in a class using a contrastive (Japanese/English) corpus compared to a class using a listening CD who improved only on three types of questions.

Hopefully this short response shows that the corpora-bashing parrot has shuffled off this metaphorical coil. 🙂


Braun, S. (2005). From pedagogically relevant corpora to authentic language learning contents. ReCALL 17(1): 47-64.

Gabrielatos, C. (2005). Corpora and language teaching: Just a fling, or wedding bells? TESL-EJ 8(4), A1, 1-37.

Flowerdew, L. (2009). Applying corpus linguistics to pedagogy A critical evaluation. International Journal of Corpus Linguistics 14:3 (2009), 393–417

Oghigian, K. & Chujo, K. (2010). An Effective Way to Use Corpus Exercises to Learn Grammar Basics in English. Language Education in Asia, 2010, 1(1), 200-214.

Tribble, C. (1997). Improvising corpora for ELT: Quick-and-dirty ways of developing corpora for language teaching. In J. Melia. & B. Lewandowska-Tomaszczyk (Eds.), PALC ’97 Proceedings, Practical Applications in Language Corpora (pp. 106-117). Lodz: Lodz University Press.


Be sure to read Leo Selivan’s response What corpora have done for us to Hugh Dellar’s post.

15 thoughts on “This corpora-bashing parrot has ceased to be

  1. Hi there –
    I feel pretty much obliged to comment on a post so thoroughly versed in Monty Python – as well as so well read – so here goes!

    Firstly, as I think I make clear in the intro to that particular post, it was an old talk given maybe ten or twelve years ago at IATEFL York, I think, and I only decided to post it up as the issue of corpora use had come up in a discussion I’d been having over on facebook. It was written very much in response to what I felt was the hyperbole being gushed – by insiders – about corpora and their uses.

    I don’t personally see it as a problem to support teacher intuition whilst also supporting coursebooks. My main point about intuition is that often it’s ALL we have to go on in the heat of the moment of the classroom and so HAVE TO trust it (which isn’t, of course, to say we can’t be wrong or that we shouldn’t hone our intuition or check ideas against hard data later – just that we can’t always be stopping and feeling we can’t be sure until we’ve checked. That way lies deskilling). I also feel that much of what I’ve seen corpora linguists ‘reveal’ (and of course there are the odd exceptions) has been stuff that most teachers would – I hope – have known intuitively already. My support of coursebooks – or at least SOME coursebooks (and not only mine, I should hasten to add!!) is based partly on the fact that for the vast majority of teachers, coursebooks are a fact of life and help them get by, partly on a belief that coursebooks can be agents of change in themselves, partly out of a feeling that they can – and should – provide a solid base of good examples and good graded input, as well as stimuli, interest-value and so on. Coursebooks can ‘cook’ the raw data of corpora in a way that teachers can then serve up to students. None of this takes away from the fact that students will still often expect teachers to come up with other examples and / or explanations of things that are found in books – and that teachers will have to use intuition (or, if you prefer, will have to tap into their own ‘inner corpora’!!) to handle.

    If corpora have, on occasion, been able to rectify things like a teacher’s weird belief that question tags were no longer common currency then great. I’d suggest that half an hour watching TV or listening to online talk radio could’ve achieved the same ends, of course, but fine. The end is admirable. I just don’t think many of us persist in such weird beliefs – or that corpora have really done that much to shake many of our core beliefs about language to the core.

    Thanks for introducing me to the Phrases in English tool. Had never seen it before and don’t think it was around when I wrote the initial talk. I think that this – and the other sites you talk about above – are al ways in which corpora have changed partly to address the kinds of concerns I raised in my talk many years ago, and that others have also obviously made.

    That said, I STILL don’t think the examples you find in those concordance lines work as well for anyone below at least CAE level as good, scripted dictionary entries do – or as good well thought-out teacher-driven examples do either!

    Thanks again for mentioning me anyway.

    1. hi Hugh,

      thanks for reminding me that this was based on an old talk, a lot has moved on since then. but as you say the promise of corpora has never really materialised, a situation I think that advances in computer applications may help in improving. call me a techno utopian!

      regarding coursebooks-as-corpora compared to corpora-corpora there are a number of articles that I have not finished which I may write up as a post.

      the Oghigian and Chujo (2010) is an interesting read since they looked at beginner English learners using a contrastive corpus of Japanese/English.

      thanks for getting me to think about this area as uptil now my intuitive reasons for liking corpora have been foggy to say the least.


      1. Inspired by your post the other day I did spend ten minutes showing my students yesterday how to use the Phrases search function on the BNC . . . .only for one of the keener ones to email me late last night saying she’d been tinkering with it – but couldn’t understand either the language or the context of a lot of the connected language!! So it goes.

        Still a useful tool to know about from a writing point of view, mind, so thanks again.

        Be interested to see the fruits of your reading around coursebooks versus corpora.

      2. hi Hugh

        really pleased to hear that the post inspired you, and no surprise that without some form of structured support your keen student felt lost. will post my readings when i get some time, past week been hectic with sick baby!


  2. I have recently been influenced by Daniel Kahnemann’s book Thinking, ‘Fast and Slow’, which has rather confirmed my view that actually most teachers are quite bad at understanding frequency and giving appropriate examples especially in the heat of the moment. If you’re interested you can read more in the CELT blog: . Basically, though, effectively when making fast, intuitive decisions we often replace rational ideas of frequency with ‘availability’, things we can think examples of. So because we can easily think of examples of blonde people but not of an abstract thing like ‘arise’, we assume blonde is much more frequent than arise when the opposite is true (it’s actually some 8 times less common). Kahnemann gives the example of chess players learning thousands of moves before they can become successful intuitive practitioners and I think this is likely to be true for teachers too. The problem is that because of our grammar obsession we underestimate the importance of lexis and good examples and (certainly in teacher education and development programmes) teachers are not encouraged to practice and so often don’t get the amount they need to become that intuitive teacher.

    1. hi Andrew,

      thanks for commenting. i have read that post on blondes before and is a great example of what khanemann (and tvesrky) have found. and this is where corpora tools are useful to correct somewhat for these biases. although there is a danger of what Hugh terms “deskilling” this is an ever present concern concerning any digital tool we use to help us in the classroom.

      so the questions include how do teachers get training in using such tools? (assuming you agree they are necessary!)
      to what extent can they be used in class?
      are there better ways than corpora to correct for human bias?


      1. I think training probably comes from looking at language in coursebooks and considering frequency, examples, questions we might ask and retrospectively thinking about language that comes up in class. Both Cert and Diploma level courses could make teachers aware of the issues around frequency and put far more emphasis on vocabulary and exemplifying it and surrounding grammar and patterns. Corpora and sites such as the phrases in English, google searches, dictionaries can all help this, but I see a limited role for them in class. Good learner dictionaries (online or otherwise) are likely to be the best help for students on their own. I really don’t see referring to these sources as de-skilling – quite the opposite. As with the chess players, what we need teachers to be able to do is go beyond what the corpora provide: good examples at an appropriate level for students but which don’t oversimplify grammar and take into account frequency. Also making connections to other related language – the co-text, opposites, limited other words in a lexical set, collocations and related (but distinct meanings), other word form in the family and their related collocations, longer chunks and patterns in sentences, traditional grammar. These are things that neither corpora, nor dictionaries can do very efficiently and requires immense skill on the part of the teacher. Getting better t this is down to planning – but not the games and adapting of coursebook material that is generally encouraged by initial teacher training.

  3. Interesting little discussion has developed here Mura.
    Nice stuff – and I’m broad agreement with both you and Andrew about much of this.

    Two small things to add:

    Firstly, just for the record, my initial comment about deskilling over in my blog post was really to do with the way in which corpora linguists often (used to, perhaps) present findings in such a way as to mystify and complicate the bleeding obvious (the take the mickey / mickey-takers thing being a case in point!!) and that there was always a subtle implication that we couldn’t be completely sure until we’d checked the corpora. I think that THIS implication downplays the skills many of us bring to class. I agree with Andrew that hunches can be wrong, but I still think that for many teachers more often than not, they aren’t. I always wished the corpora folk would focus more on things that actually surprised or contradicted rather than simply confirming general hunches.

    Secondly, concordancing.

    Since you mentioned it to me, I’ve recommended the phrase finder to my Upper-Int students, but have found them generally very unsure about using it. One of the keenest and most able students emailed me in some distress after first trying it as she’d wanted more info on how to use the chunk ACCORDING TO and had got this:

    Just a very brief perusal of the first three lines and you’ll see a vast array of fairly scary lexis that’s enough to send even good students at this level running for cover! In contrast, the online Macmillan dictionary gives students this:

    Cooked trumps raw every time!

    1. hi Hugh

      thanks for clarifying your point about deskilling.

      regarding your sts experiences i think it is not an either or situation, depending on your classroom aims, clearly one could use the 3 definitions from the macmillan dictionary by getting sts to look for examples of each in the phrases-in-english results.

      Andrew also agrees with you about the use of a good learner dictionary, me too, yet one still needs to teach sts how to use a dictionary effectively, so similarly one could train sts to use corpora. undoubtedly this will be more difficult simply because people grow up with using dictionaries unlike corpora. maybe the growing dominance of electronic means to find info may impact this and make it to easier to incorporate corpora techniques in the future?

      having said that currently i am more interested in teacher development hence some of my short posts here – quick cup of COCA to encourage teacher use.

  4. Have you seen the Macmillan Red Words game? You have to choose how frequent you think words are on a 1-3 star scoring system. It’s a bit of fun, really, but when I did it, I did pretty well, apart from the odd curve-ball, which would support what Hugh says about intuition. Having said that, I also like working with corpora from time to time… (on the fence, moi?)

  5. hi Rachael

    no not seen it before, that’s a neat game, did not do too well on it! they should have some sort of reference score to normalise your score, or a leaderboard!


Penny for your thoughts

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.