You should have the body!

Mark Hancock has a nice write-up of a talk given by Mike McCarthy on spoken English. The write-up concludes with an interesting metaphor of a corpus being a corpse, language that is no longer alive. It asks whether only using corpus examples is the best way of trying to improve a learner’s use of English.

Very few corpus folks would suggest only using corpus examples, and furthermore a lot of corpus work goes beyond the purely quantitative to also consider the teaching implications.

For example there is a great paper Listening for needles in haystacks: how lecturers introduce key terms by Ron Martinez, Svenja Adolphs, and Ronald Carter on the spoken language of academic lecturers.

They extracted lexical bundles from a spoken corpus of 1.7 million words and then went through those manually to keep only the pedagogically interesting ones. e.g. in other words (kept) vs er this is a (discarded).

Manual review of the list also showed them a hitherto under-emphasized aspect of spoken lectures – the introduction and definition of new terms.

Their analysis split these up into the more transparent but less frequent cues such as call and mean, e.g. …what theorists call.., …what do we mean by… and the less transparent but more frequent cues like basically and essentially e.g. …which are basically…, …so it’s essentially

Further they also showed how complex the delivery of a lot of the definitions or concepts were i.e. there was a lot of rephrasing sometimes using the word or but many times using no signposting language and key definitions usually came at the end of a series of connected points (back-loading).

In addition they found that often lecturers did not explicitly refer to their power point slides which could make it difficult for students to pick out the key terms.

A corpus may be like a corpse but like on the crime show CSI there is an awful lot that dead bodies can reveal.

Habeas corpus, you should have the body! 🙂

14 thoughts on “You should have the body!

  1. Agree with you Mura that Needles in Haystacks was a good example of how corpora can be judiciously used. My issue is only with corpus rigor mortis – does that over-stretch the metaphor?. I guess corpus is evidence, not blueprint.

    1. hi Mark

      you have a case in the sense that we don’t see many papers like the needles in haystack one. though i do love Ray Carey’s interpretation below of corpus as heavenly choir.


      1. hi Mura

        Needles in haystacks is a very good metaphor for the problem of spotting definitions in lectures, and the research paper illuminates a complex but crucial area in EAP. There is a similar problem with written academic texts, although here of course the reader is not forced to process the information in real time. I found definitions were ubiquitous but often cryptic in written academic texts in the Heriot Watt University Science and Engineering (HWUSE) and Business Studies (HWUBS) corpora. Yet they were crucial to understanding the unfolding explanations of key concepts.

        I found be + predicate by far the most common way of presenting definitions; whereas called, termed, named, defined as, said to be and refers to, although relatively unambiguous signals, were less commonly used to cue them. To add to the problem, language exponents that generally signal description, such as consists of, involves, is concerned with were also used to define, adding to the likelihood that key definitions would escape the reader’s notice. Added to this, can be described as was quite commonly used to define! Most easy to miss were running, or parenthetical, definitions, where the definition is slipped in almost as an aside – sometimes in brackets or commas, sometimes following or. All this leaves EAP students without much lexical support for spotting definitions.

        However, by working with written text (as well as transcripts of spoken text) students can learn to identify definitions in academic discourse and to produce their own definitions. When we were writing an EAP course to support students on distance / blended learning courses at Heriot Watt, we developed some strategies to help, and we also used these in tasks in Access EAP Frameworks.

        1. Students can be alerted to the commonalities in the structure of definitions, for example class nouns are almost always used in definitions:

        an X is a [class noun] + defining quality
        a [class noun] + defining quality is termed an X

        2. They can practise identifying common types of definitions. An extremely important category is stipulative (or working) definitions, signalled by such phrases as in the context of this research. There are also negative and contrastive definitions and definition by constituents, by purpose or function.

        3. The structures and lexical components of definitions will vary between disciplines, so it’s always useful to get students to bring in their own texts to analyse in this way.

        4. Students can explain to the teacher and to each other the key concepts in their fields

      2. Hi Mura, Sorry, I got so excited by the paper you cited that I didn’t pay attention to the style of the forum you were posting in — I come across as very dry and dusty! But it’s interesting to see how corpus linguistics is developing. Unfortunately, very little published teaching material seems to apply the research in tasks.

        How I got into it was, in 2001 I was given a temporary post at Heriot Watt University to write support EAP materials for their very large distance learning degree project. Because of the nature of the courses, there was a vast amount of written lecturer input in electronic form and I had been using MicroConcord (Tim Johns’s MS-DOS search tool from the mid 80’s) for many years in my previous teaching. So I and some colleagues used bits of the texts as authentic reading for the EAP course and also made the whole collection of first year texts into a corpus to research the usage and lexis. I think lots of people are doing this kind of thing now.

        In those days we couldn’t access any of the big ones, like BNC, and anyway it was hard to find academic corpora. The 2 HWU corpora still exist. I suppose they are dated, but they were spot on for what we wanted at the time. They aren’t available for the public, but anyway, I now tend to use BYU BNC.

        We didn’t publish in any journals — too busy writing the materials, but there were eventually three of us, me Olwyn Alexander and Jenifer Spencer, and we all spoke at conferences and BALEAP PIMs. The bit that you have found was from a 2008 PIM ‘Putting the E back in EAP’ and I was talking about developing a critical voice as a writer (I’ll attach the powerpoint, I don’t know why it wasn’t put up with the other stuff). Most of what we learned found its way into the books we published with Garnet Education — EAP Essentials, Access EAP Foundations and Access EAP Frameworks.

        General or class nouns have lots of different names, shell nouns being one of them. In these definition examples the class noun is ‘measure’:

        pH, a measure of the concentration of H+ ions in solution, is the negative log10 of the H+ ion concentration.

        Hardness is a measure of the resistance of a material to localised plastic deformation.

        I’ll also attach a list from the HWU corpora.

        Best wishes, Sue

      3. hi Sue

        hehe yes well i understand your excitement about the Martinez et al paper, i think he is one of the most interesting researchers and writers at the moment in this area

        and i did not mean to imply your previous comment was dry or dusty, simply very informative!

        i just finished watching your BC seminar here , the notion of graduate attributes is very useful


  2. Hi Mura,

    The corpus as corpse analogy is interesting, but considering that a corpus is made up of many language producers, it’s more like a mass grave. On the other hand, I don’t think the language is any more dead than the people in a photograph; it’s not as if the capturing of a moment in time causes its subjects to cease to exist.

    I see a corpus as a linguistic snapshot in time, but since we know that a great deal of language is formulaic and a reflection of our cumulative experience of it, a corpus can’t really be seen as “dead”, in the sense of “never to be heard from again”. Corpus speakers keep talking and writers keep writing, forming the input for others’ cumulative experience of language.

    Corpus as corpse might make sense to a generativist, but from an emergentist perspective, it consists of the trails of many individuals’ always-shifting language experience, but fixed in time and immortalized. So it’s just as easy to see a corpus as a heavenly choir instead of a pile of death. 🙂

  3. Hi Mura,
    Some interesting thoughts (both in your post and Mark’s). For me, I think it comes down to the skill of the corpus researcher. I’ve certainly come across some corpus researchers (dare I say at the more academic end of the scale) who seem to be rather forensically picking through the evidence in a way that may not be particularly relevant to the average language learner, But then, to be fair, that’s often not their primary audience. As a corpus researcher who’s also a language teacher and materials writer, I certainly look at the corpus with two hats on – I want to look objectively at the data to see what insights it can reveal, but then I also consider what might be useful to transfer to a teaching context and I make professional judgements. And of course, how you treat corpus data will also vary depending on the level of your target learners – for lower levels you’ll take quite a broad brush approach, whereas for advanced learners, you’ll get closer to ‘telling it like it is’. It’s all down to interpretation …

    1. hi Julie

      yes important point about adapting to learner levels, there is some good work relevant to language teaching going on, i hope to summarise another one in a bit more depth that is about DDL soonish


  4. Hello everyone,

    From what I gathered from talks by Nick Ellis and Averil Coxhead a few months back, the trend with some corpus research lately is to take the results of the study and interview teachers directly about what they think of the results to see it it is applicable to the context of the classroom. I think this is a fine practice to put life into a corpse, ooops I mean corpus. 😉

Penny for your thoughts

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.