This corpora-bashing parrot has ceased to be

Hugh Dellar’s recent What have corpora ever done for us post dismisses the hype behind corpora that was prevalent a few years back with typical gusto. I would like to look at some of the issues raised.

It is curious that his support of teacher intuition over the use of corpora seems to contrast with his support of coursebooks over teacher intuition in his dogme posts. Gabrielatos (2005) describes the example of when a teacher’s intuition that tag questions belonged to the “bowler-hat” past of English use clashed with a finding that one in four questions in dialogues was a question tag.

Another of Dellar’s objections echoes Widdowson’s dichotomy between genuine texts and authentic texts, as cited in Tribble (1997). Concordance lines from corpora represent instances of genuine language use, the products of language communication. This language contrasts with discourse texts which are authentic and represent the process of language communication. Learners need to construct a relationship with language materials so concordance lines need to be filtered so as to be useful in the classroom, what Widdowson calls pedagogic mediation.

A related concern is between indirect uses of corpora by commercial publishers and direct uses by learners and teachers.

Both of these concerns are being addressed by specific corpora such as the Backbone pedagogic corpora for content and language integrated learning; MICASE corpus of academic spoken English, and by the wider availability of general corpora such as COCA (corpus of contemporary American English); BNC (British national corpus).

For instance Dellar’s question regarding [get on with it] and [let’s get down to business] can be answered by using the Phrases in English tool which uses the BNC. Here we find that [get on with it] appears 401 times (4.11 instances per 1 million words) vs 2 times (0.02 instances per million words) for [let’s get down to business].

The Backbone collection is very interesting as it provides a thematically focused database of spoken text for 5 languages plus English as lingua franca, backed up with an assortment of learning resources. The English corpus includes 50 interviews which are annotated for topic, grammar and lexis. This annotation goes some way to address the problem of the way text is coded.

Braun (2005) describes using a small corpus as a way to mediate pedagogically between corpora and learners using “coherent and relevant content, a restricted size, a multimedia format and a pedagogic annotation of the corpus”(Braun, 2005, p61).

The use of home-made corpora is another way to attack the issue of authenticity. I will detail my use of the TextSTAT tool and similar software to build up a corpus of material for multimedia students in a later post (Update: see this series of posts). Although it takes some work teachers can build up formal databases to complement their experience-based intuitive database.

Two other criticisms not mentioned by Dellar are that corpora promote both a bottom up processing of text (vs a top down processing) and an inductive (vs deductive) approach to learning. Flowerdew (2009) discusses these and concludes that top down processing can be used with corpus data and that a mixed approach be used combining elements of a deductive approach into the inductive approach.

Finally turning to learning effects, Oghigian and Chujo (2010) found beginner students improved significantly on all six question types in pre/post test scores in a class using a contrastive (Japanese/English) corpus compared to a class using a listening CD who improved only on three types of questions.

Hopefully this short response shows that the corpora-bashing parrot has shuffled off this metaphorical coil. 🙂


