Getting learner data for vocabulary activities – EFCAMDAT

I was reading Philip Kerr’s great article on How to write vocabulary activities while thinking about creating an exercise on using the word important.

French students often make errors when using it to describe meanings related to size. E.g. The Tour First is the most important office building in Paris. Where they meant to say the tallest/biggest office building.

One way to get ready-made examples of learner writing is to use a learner corpus. Unfortunately there is, to the best of my knowledge, only one currently free learner corpus. (update: Diane Nicholls ‏@lexicoloco has written about an Asian learner corpus; an open Polish learners of English corpus is available; a Taiwanese learners of English corpus; a Japanese EFL learner corpus)

This post describes one way to make use of the EF-CAMbridge open language DATabase, EFCAMDAT. This is a corpus of written learner English containing over 30 million words.

You’ll need to register which is free and painless. Other learner corpora peeps take note.

The Select scripts screen looks like this:

EFCAMDAT-selectscript

You can see that I have selected all the scripts (highlighted in blue) and that I have selected only French nationality (highlighted in red). You use the middle windows to make your selection by pressing the add button.

Now, next, you could query the system but their search system takes a bit getting used to. For example the syntax for a simple query for the word important is – [word=”important”]. Furthermore there is no easy way to see the wider text of the search result.

So it is better to just download the corpus by exporting it as shown in the next screen:

EFCAMDAT-exportdata

You can see some info about the scripts you have selected (highlighted in green) – approximately 1.46 million words, from 4138 learners, 1 nationality (French in this case), covering all the 16 levels.

The unit of interest radio button should be selected scripts; the information included should be ticked as raw script text; and the export format radio button should be XML compressed.

Once you have this xml file we can open it up in our favorite concordancer – AntConc.

By sorting the wordlist as shown below we can see any spelling variations of important:

AntConc-wordlist_important-trimmed

So some issue with interference of French spelling though not a major problem here.

Next we can look through the concordance lines for important and pick out some sentences to use in an exercise. The following have been adapted so as to only focus on the use of the word important:

Like a majority of people I have two TVs and watch it more than I did five years ago, mainly because of the important choice of channels.

Flyfair Airlines is one of the most important airline companies in Asia and Creamium aims to increase its market in Asia

I will be a responsible student President and I believe that with your help we could achieve an important improvement in our study conditions.

I don’t have an important vocabulary in English and my accent is not very good

The winner is the person with the most important score.

Although the international sales were higher during the 3 first years , the national sales have been more important since 2006.

I work in a big library, the most important in Montpellier .

It’s by far the most important salary I ‘ve ever seen for this kind of job !

We are an important company in manufacturing based in Manchester.

The damage is very important , everything was destroyed.

Depending on the level of your group you could give them appropriate words to substitute. Further work could be done on most frequent collocates e.g. which is more frequent word:

The damage is very severe/extensive/large/ bad.

Thanks for reading.

2 thoughts on “Getting learner data for vocabulary activities – EFCAMDAT

Penny for your thoughts

This site uses Akismet to reduce spam. Learn how your comment data is processed.