What to teach from corpora output – frequency and transparency

Frequency of occurrence is the main way for teachers to choose what to teach when using corpora however as Andrew Walkley discusses in “Word choice, frequency and definitions” using just frequency is not without limitations. In addition to frequency we can use semantic transparency/opacity,  that is, how the meaning of the whole differs from its individual parts. This is also sometimes referred to as how idiomatic a phrase is. Martinez (2012) offers a Frequency Transparency Framework that teachers can use to help them choose what phrases to teach. Using four collocates of take he presents the following graphic:

The Frequency-Transparency Framework (FTF) using four collocates of the verb take (Martinez, 2013)

The numbered quadrants are the suggested priority of the verb+noun pairs i.e. the most frequent and most opaque phrase would be taught first (1), then the most frequent and transparent phrase (2), followed by the less frequent but opaque phrase (3) and last the least frequent and most transparent phrase (4). As said this is only a suggested priority which can be changed according to the teaching context. For example a further two factors (in addition to word for word decoding) can be considered when evaluating transparency:

  • Is the expression potentially deceptively transparent? – “every so often” can be misread as often; “for some time” can be misunderstood as short amount of time (Martinez & Schmitt, 2012, p.309)
  • Could the learner’s L1 negatively influence accurate perception?

Applying the framework to the binomials list from my webmonkey corpus – I would place up and running in quadrant 1, latest and greatest in quadrant 2, tried and true in quadrant 3 and layout and design in quadrant 4. Note that I did not place drag and drop, the most frequent and somewhat opaque phrase since it is so well-known with my multimedia students (similar to cut and paste) that it would not need teaching. Thanks for reading.


Martinez, R. (2013). A framework for the inclusion of multi-word expressions in ELT. ELT Journal 67(2): 184-198.

Martinez, R. & Schmitt, N. (2012). A Phrasal Expressions List. Applied Linguistics 33(3): 299-320.


9 thoughts on “What to teach from corpora output – frequency and transparency

  1. Hi,

    Is frequency of occurrence really the main way for teachers to choose what to teach when using corpora?

    1. hi geoff

      thanks for dropping by. i think certainly a lot of teachers equate corpora with frequency, and although other factors such as you mention in your overview (http://canlloparot.wordpress.com/2013/05/16/concorndancers-and-elt/) – range, coverage, learnability and communicative need are important Leech (n.d.) for example makes the case that these are in any case (highly) correlated with frequency.

      it’s a difficult question what language to pick to present to students, i guess what is important is your teaching context. for example i quite agree
      with this quote (if i am interpreting it correctly) by Sinclair i read cited in Wray(2012):

      “It is not the statistical tests in themselves which are problematic; the problem lies in not recognizing that when writers and speakers co-select words, they create a new meaning which makes other instances of the same individual words and other co-selections involving these same words irrelevant. (Cheng et al., 2009, p. 237)”


      Leech, G (n.d.) The Role of Frequency in ELT: New Corpus Evidence Brings a Re-appraisal http://www.lancs.ac.uk/fass/doc_library/linguistics/leechg/leech_2001.pdf

      Wray, A (2012)What Do We (Think We) Know About Formulaic Language? An Evaluation of the Current State of Play http://journals.cambridge.org/action/displayFulltext?type=6&fid=8771499&jid=APL&volumeId=32&issueId=-1&aid=8771498&bodyId=&membershipNumber=&societyETOCSession=&fulltextType=RA&fileId=S026719051200013X&specialArticle=Y

      1. Hi Mura,

        3 points need unpicking (not “deconstructing”, note 🙂 ) here.

        1. Leech’s seminal article, which you cite, argues forcefully for the importance of frequency, but I think it’s going too far to say (not that Leech does) that such matters as valency, for example, correlate with frequency. As I said in my post (and thanks for the plug), descriptions of language can’t be the criterion for prescriptions; i.e., they can’t determine (while they can certainly influence) teachers’ decisions about how to teach English. However much Leech and others might argue for the importance of frequency as a way of understanding how English works, they can’t go from there to prescribing what to teach – it’s a non-sequitur.

        2. Picking what language to present to students depends on so many factors as to make it impossible to suggest either what items to include or when to do so. That, I think was the disastrous mistake of the Cobuild project, and is the necessary limitation of any coursebook.

        3. Sinclair’s statement “It is not the statistical tests in themselves which are problematic; the problem lies in not recognizing that when writers and speakers co-select words, they create a new meaning which makes other instances of the same individual words and other co-selections involving these same words irrelevant” itself needs contextualising. Sinclair was, as I’m sure you’re aware, emphasising the point that you have to be very careful when interpreting the data you get from a concordance. Frequency counts avoid context at their peril! .

        May I finish by congratulating you on this excellent blog: I’m a signed up fan! .

  2. Mura, every now and then, my reading (right now reading Learning Vocabulary in another language by Nation) takes me back to your wonderful posts. I look forward to delve into corpora soon. From now, trying to deepen my knowledge on vocabulary learning/teaching, and finding the balance between grammar and vocabulary focus in teaching. Your post came up in one of my searches and so glad to find such a great discussion. Thank you and Geoff.

    1. hi rose,

      really glad you found this post and discussion useful, i look fwd to any write-ups u may do of your current reading in vocab 🙂

      i enjoyed the vocab related posts you did beginning of this year, was that related to the EVO on vocab?


      1. Sort of Mura. I wish I had been able to participate in EVO vocab all through the course. I’m yet to catch up with it before next year session though (something I promised myself to do as soon as other things are taken cared of). They were actually heavily based on the discussions I had with Anne, Anna and Kevin in the beginning of January in which Kevin added to the conversation the four strands (Nation). In fact the Pursue of Balance blogpost was the kick off of EVO Vocab and my attempt to keep a track of EVO Vocab sessions but with Hubby’s surgery coming up it wasn’t possible. I’ll be blogging soon about it. For now, read the first chapter once, made notes, and going back to add my own reflections. I may post my reflections on it someday.

        I also managed to go through week 1 and 2 of Corpus Linguistic Mooc and learned how to use AntConc which I find quite useful to help me analyse the texts that I’ll be choosing to use with my 9th graders. And I also went through Phillip Kerr’s book on how to present and practice vocabulary.

        Ps. I always think of you when comes up to corpora. But I am far from knowing how to benefit from it yet.

  3. hi again rose,

    that’s great to hear you did wks1 and 2 of the #corpusmooc, i am falling behind it somewhat of late!

    i think you are already benefiting from corpora from the sounds of it 🙂

    i hope your husband’s surgery went well/goes well


Penny for your thoughts

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.