BNCaudio corpus and TOEIC listening

Those of you who teach the TOEIC or other exams will have wanted from time to time to be able to use “authentic” audio along with its hesitations, pauses, repetitions and so on.

There’s a need to expose learners to the jungle English out in the world compared to the garden English in the classroom, terms coined by Richard Cauldwell and Sheila Thorn, see the short youtube clip.

John Hughes makes the case for materials to use such audio and video. He points out that using corpora data for this requires context. I agree though if you want to focus on decoding and building bottom-up listening skills requiring context is not so important.

I very recently used the Lancaster interface to the BNC audio data in my TOEIC exam class.

For details on getting access to this corpus see Google+ community post.

Once you get access make sure the spoken restrictions link is clicked so that it is greyed out as shown in the following screenshot:

Then after entering the search word – contract, I selected the domain as business:


I then looked through the results for some interesting snippets. Note not all audio can be accessed. Also as it is beta there is still some alignment issues between transcripts and audio but you can adjust that and give feedback so that it can be improved.

I told the students that they will listen to snippets of audio using the word contract. I asked them to listen for other words – nouns, verbs, adjectives, adverbs related to the use of contract in the audio.

The following is the transcript of the first audio I used:

All members of staff have standard conditions of service as set out here, with the exception of temporary staff or staff who are er [pause] on a short time contract or maternity leave cover who may have a short term er notice er [pause] erm for erm [pause] a period of notice.

After the first listen one of the students recognised the word notice; after the second listen two students recognised temporary, conditions and staff. A third listen produced recognition of standard conditions.

I then dictated the transcript to them (without the hesitations, pauses etc) for them to write down. And then went through other relevant lexis (short time/term contract, maternity leave cover) and checked for understanding.

I repeated the procedure for two more audio snippets containing the keyword contract.

The students did of course find the audio difficult but they liked that it was real audio and made a change from the coursebook audio. I plan to use this process in the remaining classes. Next time I will probably start off with a much shorter clip and move to longer ones.

The SpokesBNC interface allows you to display just the concordances in the BNC that have audio recordings, very useful.

StringNet – exploring will suit you

I was tipped off to the latest version of StringNet by Scott Thornbury recently (in the comments section of this post) and decided to do a follow up on my last post using GloWbE. If you recall I had some students asking about adverb placement in the sentence “That will suit you perfectly”. Looking back I was not satisfied with the off the cuff way I dealt with that. Let’s see what StringNet can show us.

This is the screenshot using search term will suit you:


Hmm not much help so far, let’s press the expand icon:


Aha, there’s our adverb, but on clicking the blue [adv] placeholder:


Where’s our adverb perfectly? So I decide to go back to my first screen and go up by pressing the Parent button:


I decide to expand 5 will suit [pers pn]:


Ooh I see an adverb placed before the verb suit, clicking on the [adv] I get:


So I could have told the students that adverbs like probably/not/best/really can come before the verb suit and adverbs like:


Best/perfectly/down/just/very/better/too follow the verb suit. We can also tell them that adverbs don’t tend to appear with that (though clicking on the examples for the that patterns we find 2 sentences with adverb best).

StringNet uses the British National Corpus which is very outdated, one could use the patterns found with StringNet in a corpus like Global Web based English. The above did take some fiddling about admittedly but I can see StringNet becoming very handy.

Read more about using StringNet and also check their blog.

Compare search in GloWbe of will  [r*] suit  [pp*] with search of will suit  [pp*] [r*].

This corpora-bashing parrot has ceased to be

Hugh Dellar’s recent What have corpora ever done for us post dismisses the hype behind corpora that was prevalent a few years back with typical gusto. I would like to look at some of the issues raised.

It is curious that his support of teacher intuition over the use of corpora seems to contrast with his support of coursebooks over teacher intuition in his dogme posts. Gabrielatos (2005) describes the example of when a teacher’s intuition that tag questions belonged to the “bowler-hat” past of English use clashed with a finding that one in four questions in dialogues was a question tag.

Another of Dellar’s objections echoes Widdowson’s dichotomy between genuine texts and authentic texts, as cited in Tribble (1997). Concordance lines from corpora represent instances of genuine language use, the products of language communication. This language contrasts with discourse texts which are authentic and represent the process of language communication. Learners need to construct a relationship with language materials so concordance lines need to be filtered so as to be useful in the classroom, what Widdowson calls pedagogic mediation.

A related concern is between indirect uses of corpora by commercial publishers and direct uses by learners and teachers.

Both of these concerns are being addressed by specific corpora such as the Backbone pedagogic corpora for content and language integrated learning; MICASE corpus of academic spoken English, and by the wider availability of general corpora such as COCA (corpus of contemporary American English); BNC (British national corpus).

For instance Dellar’s question regarding [get on with it] and [let’s get down to business] can be answered by using the Phrases in English tool which uses the BNC. Here we find that [get on with it] appears 401 times (4.11 instances per 1 million words) vs 2 times (0.02 instances per million words) for [let’s get down to business].

The Backbone collection is very interesting as it provides a thematically focused database of spoken text for 5 languages plus English as lingua franca, backed up with an assortment of learning resources. The English corpus includes 50 interviews which are annotated for topic, grammar and lexis. This annotation goes some way to address the problem of the way text is coded.

Braun (2005) describes using a small corpus as a way to mediate pedagogically between corpora and learners using “coherent and relevant content, a restricted size, a multimedia format and a pedagogic annotation of the corpus”(Braun, 2005, p61).

The use of home-made corpora is another way to attack the issue of authenticity. I will detail my use of the TextSTAT tool and similar software to build up a corpus of material for multimedia students in a later post (Update: see this series of posts). Although it takes some work teachers can build up formal databases to complement their experience-based intuitive database.

Two other criticisms not mentioned by Dellar are that corpora promote both a bottom up processing of text (vs a top down processing) and an inductive (vs deductive) approach to learning. Flowerdew (2009) discusses these and concludes that top down processing can be used with corpus data and that a mixed approach be used combining elements of a deductive approach into the inductive approach.

Finally turning to learning effects, Oghigian and Chujo (2010) found beginner students improved significantly on all six question types in pre/post test scores in a class using a contrastive (Japanese/English) corpus compared to a class using a listening CD who improved only on three types of questions.

