IATEFL 2015: Recent corpus tools for your students

Jane Templeton’s talk 1 illustrated corpus use by using the wordandphrase tool 2. (Lizzie Pinard has a write-up of the talk 3). I have described using this and other tools on this blog, and there is a nice round-up of corpus tools written by Steve Neufield 4 that looks at just the word, ozdic, word neighbors, netspeak, and stringnet.

This post reports on some more recent tools you may not be aware of (but posted sometime ago in G+ CL community so do check that if you want the skinny early on:\) – WriteAway, Linggle, Skell, Netcollo.

I list them in the order I think students will find easy to use and useful.

1. WriteAway – this tool auto-completes words to help highlight typical structures, so for example it gives two common patterns for Jane’s example of weakness as weakness of something and weakness in something. The first example in pattern one includes the collocation overcomes.

WriteAway screenshot for word weakness

2. Linggle – one could follow-up with a search on Linggle which is basically a souped up version of just the word and uses a 1 trillion word Web based corpus as opposed to the much smaller BNC that just the word uses

It is interesting that overcome weakness is not listed:

Linggle screenshot for verb + weakness
Linggle screenshot for verb + weakness (click image to see results)

but a search for overcome followed by a noun shows that it occurs less than 1% in web pages:

Linggle screenshot for overcome + noun (click image to see results)

3. SkeLL from Sketch Engine is neat for its word-sketch feature so a look at weakness brings up a nice set of collocations and colligations in one screen:

SkeLL wordsketch for weakness (click image to see results)

4. NetCollo corpus tool can compare BNC, a medical corpus and a law corpus, this is useful if you are looking at academic language in medicine and law. For example using the example of weakness we see that it is much more common in BNC:

NetCollo result for weakness (click image to see results)

and we can see that the collocation with overcome only appears once in the Medical corpus.

As ever do try these tools out yourself and then show not tell, as Jane says, your students as and when the need arises in class. By the way do check out the integrative rationale for corpus use by Anna Frankenberg-Garcia5.

Thanks for reading.


1. IATEFL 2015 video – Bringing corpus research into the language classroom

2. Word and phrase.info tool

3. IATEFL 2015 Bringing corpus research into the language classroom – Jane Templeton

4. Teacher Development: Five ways to introduce concordances to your students

5. Integrating corpora with everyday language teaching


Corpus linguistics community news 2

Another round of links (here’s first round) to some posts on the G+ corpus linguistics community to notify peeps interested in this sort of thing:

Using BAWE corpus in SketchEngine as a learner corpus (see 1st comment to post)

The Desolation of Smaug and the BYU COCA

Dogme concordancing

The Disabled Access Friendly Campaign, corpus use literacy and wordandphrase.info

Interview with Phil Edmonds from just-the-word.com

Phil Edmonds the current developer of www.just-the-word.com kindly answered some questions about the origins of the tool, future developments, and general thoughts on educational technology. If you have used this tool and/or recommended it to students please do offer any suggestions you may have or any questions about the tool in the comments.

1. Can you tell us little about how the just-the-word tool got started?

Phil: Back in grad school, I was working on artificial intelligence and the idea of developing an intelligent thesaurus that could help you choose the right word based on what you had already written on the page. After graduating, I joined Sharp Laboratories of Europe and had the pleasure of working with Pete Whitelock and a great team. The team had developed an intelligent dictionary that could prioritize the definitions of words given the context. Together, Pete and I realized we could use some of that technology and the recent developments in statistical natural language processing to create a tool to help writers.

2. Did you envision it as a tool more for learners or for teachers or both? Why?

Phil: I always saw Just The Word as a tool to help writers, both English native speakers and advanced learners. English has so many different ways to say the same thing, but if you choose the wrong word you might not get your precise meaning across or worse actually imply the wrong message entirely. Even native speakers have trouble finding the right words. I hadn’t really thought about teachers using it, but it’s been very popular. I would like to hear about how teachers use it. Perhaps teachers could share with each other how they use it.

3. Can you give any breakdown of stats for people who visit the site? Numbers? From where? Time of day?

Phil: The site gets around 1000 queries per day from all over the world. The top 10 countries are UK, USA, China, Taiwan, Turkey, Hong Kong, Brazil, Iran, Poland, and Korea.

4. I am particularly interested in having the learner errors features working. What are the chances of that?

Phil: The feature wasn’t used very much according to my logs. Then I found a small bug in it, so I took it down. However, if there is enough interest I’ll put it back; it is useful.

5. What other language/education projects are you working on?

Phil: At Sharp I worked on a number of language and education projects. One of the hardest things in learning a language is to stay motivated. It’s been shown that giving immediate and relevant feedback on a learner’s work and efforts creates focus and engagement. This applies to learning any skill such as driving a car, excelling at a sport, and learning school-based subjects. My team used this idea to develop an app to help beginners learn English by extensive reading of e-books, that gives the feedback by algorithms. The app tracks you while you read, working out your current word level (eg 1000-word or 2000-word). Then it highlights study words in the text at your level that you can focus more attention on. It learns your level by collecting your right/wrong answers to short vocabulary quizzes on any of the words you click in the story. The quizzes are generated automatically using artificial intelligence algorithms, and your level is tracked using a statistical model that weighs the evidence for and against your progress. We worked with a major educational publisher in Oxford and another in Japan to launch the app into the Japanese market. Mobile learning was also important for us since it can be done anytime anywhere, so we called our technology ELMO – English Language MObile. In the Japanese Android Play Store the app is called Tadoku Academy – for Extensive Reading Academy.

6. What are your opinions about corpora based tools for language learning?

Phil: There is great promise in using corpora and real text from the web for language learning. The best way to learn is to study genuine language and use it in real conversations to complete genuine tasks – as you do when you find yourself in a foreign country. Corpora can also be used to automatically create language resources such as dictionaries, Just The Word, and Elmo. But most tools and resources I’ve seen on the web are run as hobbies – Just The Word is no exception. Building a business is difficult and risky.

7. What are your thoughts on the current and future educational technology scene?

Phil: The educational technology scene is exploding. There are so many opportunities now. After many years of challenge to the very premise of using technology in education, it seems we have reached a turning point where now it is just accepted that it is useful and even necessary. Attention in research has turned to identifying and promoting how it can be used effectively and in what context. In the business world established companies and hundreds of startups are finding traction. I believe the most innovative areas will be in educational data from a small scale (like Elmo) up to large scale (like using the student data collected from MOOCs); in statistical modelling to both give feedback to students and to study the efficacy of teaching and learning approaches; in connecting learners locally and globally; in game-based learning; in spreading and improving education in disadvantaged countries; and in new modes of delivery such as mobile learning and connected classrooms.

Many thanks to Phil for taking the time to respond, you can read a bit more about Phil:

Android learning: Your phone is becoming a classroom

Phil Edmonds: I’ve been coding since I was 12

Personal web site

As Phil says just-the-word is very much a hobby project, so donations are always welcome to keep the tool online, developed and maintained. I have recommended just-the-word often to my TOEIC students, you can read about one of the ways I have used it. I am very much looking forward to having the learner errors function working and so is at least one other teacher:

Thanks for reading and don’t forget questions/suggestions.

Affixes, IntelliText and corpus use literacy

If you follow social media education talk, you will have heard a lot about digital literacies, or 21st century skills. It is an open question as to how most of such literacies are relevant to language learners. However language teachers will recognize that being able to use a dictionary is a key skill, and I would argue that being able to use a corpus is another crucial skill.

This post looks at using the IntelliText corpora interface to extend an exercise from a TOEIC coursebook on prefixes and suffixes.

On page 22 of the Cambridge Target Score coursebook (Talcott & Tullis, 2007) there is an exercise on using prefixes and suffixes to construct a word family diagram of the root word form. Question A asks students to add a list of prefixes and suffixes to the root word grouped by part of speech, see figure below:


(Talcott & Tullis, 2007, p.22)

The last question D asks students to choose one of 6 listed words (draw, present, quest, sign, move, employ) and to use a dictionary to make a word family diagram. This is a major task as dictionaries do not list prefixes and suffixes in an easily accessed way.

The Macmillan Online dictionary is useful though to see which of the words presented are frequent, so all the words here except for quest are three star words meaning they are in the 2500 most common words. Quest is a one star word meaning that it appears in the 7500 most common words.

The IntelliText interface has a dedicated feature to look up affixes. To get to this page as shown below follow Home Page > Search the Standard Corpora > Choose Language > English > Choose Corpora > BNC > Choose Type of Search > Affixes :

Screen shot 2013-08-10 at 1.31.33 PM

(click on image to see full resolution)

The base word draw has been entered and the [with Prefixes] tick box checked, the results are shown in the next screenshot:

Screen shot 2013-08-10 at 1.31.56 PM

(click on image to see full resolution)

Students can do similar searches for suffixes, and both prefixes and suffixes. This feature is certainly much quicker than using just a dictionary to build a word family diagram. Also there is an option to search using part of speech which is handy.

There are many other features in IntelliText e.g. annotation of concordances with CEFR classification that make this interface worth exposing to students and which I may write about later. Do note that certain searches using IntelliText take some time compared to speed of say COCA (Corpus of Contemporary American English).

If you teach the TOEIC you may be interested in using Just the Word and an exercise from the Cambridge Target Score book; and using Wordandphrase.info with production activities.

Thanks for reading.


Talcott, C. & Tullis, G. (2007). Target Score: A communicative course for TOEIC Test preparation. (2nd ed.). Cambridge: Cambridge University Press.