Successful Spoken English – interview with authors

The following is an email interview with the authors, Christian Jones, Shelley Byrne, Nicola Halenko, of the recent Routledge publication Successful Spoken English: Findings from Learner Corpora. Note that I have not yet read this (waiting for a review copy!).

Successful Spoken English

1. Can you explain the origins of the book?

We wanted to explore what successful learners do when they speak and in particular learners from B1-C1 levels, which are, we feel, the most common and important levels. The CEFR gives “can do” statements at each level but these are often quite vague and thus open to interpretation. We wanted to discover what successful learners do in terms of their linguistic, strategic, discourse and pragmatic competence and how this differs from level to level.  

We realised it would be impossible to use data from all the interactions a successful speaker might have so we used interactive speaking tests at each level. We wanted to encourage learners and teachers to look at what successful speakers do and use that, at least in part, as a model to aim for as in many cases the native speaker model is an unrealistic target.

2. What corpora were used?

The main corpus we used was the UCLan Speaking Test Corpus (USTC). This contained data from only students  from a range of nationalities who had been successful (based on holistic test scoring) at each level, B1-C1. As points of comparison, we also recorded native speakers undertaking each test. We also made some comparisons to the LINDSEI (Louvain International Database of Spoken English Interlanguage) corpus and, to a lesser extent, the spoken section of the BYU-BNC corpus.

Test data does not really provide much evidence of pragmatic competence so we constructed a Speech Act Corpus of English (SPACE) using recordings of computer-animated production tasks by B2 level learners  for requests and apologies in a variety of contexts. These were also rated holistically and we used only those which were rated as appropriate or very appropriate in each scenario. Native speakers also recorded responses and these were used as a point of comparison. 

3. What were the most surprising findings?

In terms of the language learners used, it was a little surprising that as levels increased, learners did not always display a greater range of vocabulary. In fact, at all levels (and in the native speaker data) there was a heavy reliance on the top two thousand words. Instead, it is the flexibility with which learners can use these words which changes as the levels increase so they begin to use them in more collocations and chunks and with different functions. There was also a tendency across levels to favour use of chunks which can be used for a variety of functions. For example, although we can presume that learners may have been taught phrase such as ‘in my opinion’ this was infrequent and instead they favoured ‘I think’ which can be used to give opinons, to hedge, to buy time etc .

In terms of discourse, the data showed that we really need to pay attention to what McCarthy has called ‘turn grammar’. A big difference as the levels increased was the increasing ability of learners to co-construct  conversations, developing ideas from and contributing to the turns of others. At B1 level, understandably, the focus was much more on the development of their own turns.

4. What findings would be most useful to language teachers?

Hopefully, in the lists of frequent words, keywords and chunks they have something which can inform their teaching at each of these levels. It would seem to be reasonable to use, as an example, the language of successful B2 level speakers to inform what we teach to B1 level speakers. Also, though tutors may present a variety of less frequent or ‘more difficult’ words and chunks to learners, successful speakers will ultimately employ lexis which is more common and more natural sounding in their speech, just as the native speakers in our data also did.

We hope the book will also give clearer guidance as to what the CEFR levels mean in terms of communicative competence and what learners can actually do at different levels. Finally, and related to the last  point, we hope that teachers will see how successful speakers need to develop all aspects of communicative competence (linguistic, strategic, discourse and pragmatic competence) and that teaching should focus on each area rather than only one of two of these areas.

There has been some criticism, notably by Stefan Th. Gries and collaborators that much learner corpus research is restricting itself factorwise when explaining a linguistic phenomenon. Gries calls for a multi-factor approach whose power can be seen in a study conducted with Sandra C. Deshors, 2014, on the uses of may, can and pouvoir with native English users and French learners of English. Using nearly 4000 examples from 3 corpora, annotated with over 20 morphosyntactic and semantic features, they found for example that French learners of English see pouvoir as closer to can than may.

The analysis for Successful Spoken English was described as follows:

“We examined the data with a mixture of quantitative and qualitative data analysis, using measures such as log-likelihood to check significance of frequency counts but then manual examination of concordance line to analyse the function of language.”

Hopefully with the increasing use of multi-factor methods learner corpus analysis can yield even more interesting and useful results than current approaches allow.

Chris and his colleagues kindly answered some follow-up questions:

5. How did you measure/assign CEFR level for students?  

Students were often already in classes where they had been given a proficiency test and placed in a level . We then gave them our speaking  test and only took data from students who had been given a global pass score of 3.5 or 4 (on a scale of 0-5). The borderline pass mark was 2.5 so we only chose students who had clearly passed but were not at the very top of the level and obviously then only those who gave us permissions to do so. The speaking tests we used were based on Canale’s (1984) oral proficiency interview design and consisted of a warm up phase, a paired interactive discussion task and a topic specific conversation based on the discussion task. Each lasted between 10-15 minutes.

6. So most of the analysis was in relation to successful students who were measured holistically?  


7. And could you explain what holistically means here?

Yes, we looked at successful learners at each CEFR level, according to the test marking criteria. They were graded for grammar, vocabulary, pronunciation, discourse management and interactive ability based on criteria such as  the following (grade 3-3.5) for discourse management ‘Contributions are normally relevant, coherent and of an appropriate length’. These scores were then amalgamated into a global score. These scales are holistic in that they try to assess what learners can do in terms of these competences to gain an overall picture of their spoken English rather than ticking off a list of items they can or cannot use. 

8. Do I understand correctly that comparisons with native speaker corpora were not as much used as with successful vs unsuccessful students? 

No, we did not look at unsuccessful students at all. We were trying to compare successful students at B1-C1 levels and to draw some comparison to native speakers. We also compared our data to the LINDSEI spoken learner corpus to check the use of key words.

9. For the native speaker comparisons what kind of things were compared?

We compared each aspect of communicative competence – linguistic, strategic, discourse and pragmatic competences to some degree. The native speakers took exactly the same tests so we compared (as one example), the most frequent words they used.


Thanks for reading.



Deshors, S. C., & Gries, S. T. (2014). A case for the multifactorial assessment of learner language. Human Cognitive Processing (HCP), 179. Retrieved from



CORE blimey – genre language

A #corpusmooc participant in answering a discussion question on what they would like to use corpora for replied that they wanted a reference book that shows various common structures in various genres such as “letters of condolence, public service announcements, obituaries”.

The CORE (Corpus of Online Registers) corpus at BYU along with the virtual corpora feature allows a way to reach for this.

For example, the screenshot below shows the keywords of verbs & adjectives in the Reviews genre:

Before I briefly show how to make a virtual corpus do note that the standard interface allows you do to a lot of things with the various registers. The CORE interface shows you examples of this. For example the following shows the distribution of the present perfect across the genres:

Create virtual corpora

To create a virtual corpus first go to the CORE start page:

Then click on Texts/Virtual and get this screen:

Next press Create corpus to get this screen:

We want the Reviews Genre so choose it from the drop down box:

Then press Submit to get the following screen:

Here you can either accept these texts or say you want to build only a film review corpus manually look through links and filter for film reviews only. Give your corpus a name or add it to an already existing corpus. Here we give it the name “review”:

Then after submitting you will be taken to the following screen which shows you all your virtual corpora collection we can see the corpus we just created at number 5:

Now you can list keywords.

Do note that the virtual corpora feature is available in most of the BYU collection so if genre is not your thing maybe the other choices of corpora might be useful.

Thanks for reading and do let me know if anything appears unclear.


#TESOL2017 – Corpus related talks and posters

While IATEFL2017 may well have the razzledazzle, TESOL2017 is the big kahuna. Find below corpus related talks and posters (program pdf). There are some well known names here – Kiyomi Chujo, Randi Reppen, Diane Schmitt, Dilin Liu, Keith Folse.

Do TESOL record talks like IATEFL? Otherwise am putting faith in some tweeters to get inkling of what goes down. You know what to do folks.

Tuesday 21 March
Developing Academic Discourse Competence Through Formulaic Sequences
Content Area: Vocabulary/Lexicon
The Academic Formulas List and Phrasal Expressions List include formulaic sequences that build on traditional lists, such as the Academic Word List, to better meet student proficiency needs at the discourse level. Participants investigate the lists; experience collaborative activities designed to assist students in acquisition, including online and corpus-based; and discuss considerations for adaptation and implementation. Step-by-step guides provided.
Alissa Nostas, Arizona State University, USA
Mariah Fairley, American University in Cairo, Egypt
Susanne Rizzo, American University in Cairo, USA

Wednesday 22 March
Engaging Students in Making Grammar Choices: An In‑Depth Approach
Content Area: Grammar
Appropriate use of grammar structures in academic writing can be a challenge even for advanced ESL writers. Drawing on corpus research on the characteristics of written discourse, the presenters demonstrate how to engage students in making effective grammar choices to improve their academic writing. Sample instructional materials are provided.
Wendy Wang, Eastern Michigan University, USA
Susan Ruellan, Eastern Michigan University, USA

Lexical Bundles in L1 and L2 University Student Argumentative Essays
Content Area: Second Language Writing/Composition
This presentation reports findings of a corpus-based analysis of the use, overuse, and misuse of lexical bundles in L2 university student argumentative essays. The presentation also provides ways ESL composition instructors can assist learners in using lexical bundles more appropriately.
Tetyana Bychkovska, Ohio University, USA

Teachers’ U.S. Corpus
Content Area: Research/Research Methodology
The presenters amassed a linguistic corpus-TUSC-representing approximately 4 million words based on over 50 K–12 content area textbooks. Findings of the corpus, including word lists representative of academic language, are offered. Participants are invited to discuss ways this corpus may assist K–12 teachers, especially teachers of ELLs.
Seyedjafar Ehsanzadehsorati, Florida International University, USA

And Furthermore
Content Area: Discourse and Pragmatics
Advanced learner materials offer few guidelines for the use of the expressions “moreover,” “furthermore,” “in fact,” “likewise,” “in turn,” and other additive connectors. Grounded in pragmatic theory and drawing on written corpus examples and experimental speaker judgement data, this talk defines optimal uses and paves a path to enlightened class instruction.
Howard Williams, Teachers College, Columbia University, USA

Teacher Electronic Feedback in ESL Writing Course Chats
Content Area: Second Language Writing/Composition
This corpus-based study analyzes the rhetorical moves, uptake, and student perceptions of the teacher-student chats from five freshman ESL writing courses taught by three expert teachers. Findings show that chats are useful for establishing rapport and clarifying feedback, but we suggest that longer chat sessions may be more effective.
Estela Ene, Indiana University Purdue University Indianapolis, USA
Thomas Upton, Indiana University Purdue University Indianapolis, USA

Using Corpus Linguistics in Teaching ESL Writing
Content Area: Applied Linguistics
This session explores the use of corpus linguistics in teaching L2 writing as an effective way to bring authentic language into the classroom. The presenters discuss ways of incorporating corpora in teaching L2 writing and demonstrate a sample activity of how to use a corpus to address discourse competence.
Gusztav Demeter, Case Western Reserve University, USA
Ana Codita, Case Western Reserve Universtiy, USA
Hee-Seung Kang, Case Western Reserve University, USA

How Technology Shapes Our Language and Feedback: Mode Matters
Content Area: Applied Linguistics
This presentation explores how the use of evaluative language differs between parallel corpora of text and screencast feedback and what this means for the role of feedback and position of instructor. In understanding the implications of technology choices, instructors can better match tools to their pedagogical purposes
Kelly Cunningham, Iowa State University, USA

An Effective Bilingual Sentence Corpus for Low-Proficiency EFL Learners
Content Area: CALL/Computer-Assisted Language Learning/
Technology in Education
Kiyomi Chujo, Nihon University, Japan

Propositional Precision in Learner Corpora: Turkish and Greek EFL Learners
Content Area: English as a Foreign Language
Jülide Inözü, Cukurova University, Turkey
Cem Can, Cukurova University, Turkey

Thursday 23 March
Corpus‑Based Learning of Reporting Verbs in L2 Academic Writing
Content Area: Higher Education
We present findings from our study on the effectiveness of corpus based learning of reporting verbs during a multidraft literature review assignment. The results suggest corpus-based instruction can improve L2 students’ genre awareness and lexical variety without time consuming training. Participants receive sample corpus-based teaching
materials used in the revision workshop.
Ji-young Shin, Purdue University, USA
R. Scott Partridge, Purdue University, USA
Ashley J. Velázquez, Purdue University, USA
Aleksandra Swatek, Purdue University, USA
Shelley Staples, University of Arizona, USA

Providing EAP Listening Input: An Evaluation of Recorded Listening Passages
Content Area: Listening, Speaking/Speech
Are the recorded passages that accompany listening textbooks providing students with exposure to all the necessary elements of academic lecture language? The presenter shares results of a corpusbased study, illustrating what recorded passages do well, where they fall short, and providing activities designed to supplement EAP listening instruction.
Erin Schnur, Northern Arizona University, USA

Developing Learner Resources Using Corpus Linguistics
Randi Reppen, Northern Arizona University, USA

Applying Research Findings to L2 Writing Instruction
Content Area: Second Language Writing/Composition
Effective pedagogical practices have a strong research base and respond directly to students’ learning needs. Presenters share materials developed for such needs in EAP writing classrooms, drawing on grammar/vocabulary corpus research, integration of CBI principles with current L2 writing approaches, and research findings regarding assignment sequencing for larger end-products.
Margi Wald, UC Berkeley, USA
Jan Frodesen, UC Santa Barbara, USA
Diane Schmitt, Nottingham Trent University, United Kingdom (Great Britain)
Gena Bennett, Independent, USA

Teaching Students Self‑Editing in Writing With Interactive Online Corpus Tool
Content Area: CALL/Computer-Assisted Language Learning/
Technology in Education
L2 academic writers often struggle with word choice and collocates when composing in academic English. In this teaching tip, the presenter uses, a free corpus-based online interactive tool, to show how to teach self-editing strategies to L2 writers and demonstrates activities that can be incorporated into EAP writing courses.
Aleksandra Swatek, Purdue University, USA

Corpus 101: Navigating the Corpus of Contemporary American English (COCA)
Content Area: Vocabulary/Lexicon
The Corpus of Contemporary American English (COCA) may look overwhelming at first, but it is in fact an easy-to-use resource. Presenters guide participants through step-by-step navigation of this valuable tool, sharing tips and ideas for teachers and tasks for students that relate to several of COCA’s search and analysis functions.
Heather Gregg Zitlau, Georgetown University, USA
Heather Weger, Georgetown University, USA
Kelly Hill Zirker, Diplomatic Language Services, USA

Using a Medical Research Corpus to Teach ESP Students
Content Area: English for Specific Purposes
The study discussed investigated how expert writers use lexical bundles in medical research articles. More than 200 bundles were identified using a corpus of more than 1 million words. A structural and functional analysis revealed patterns that can be used in developing materials for medical students in international ESP classes.
Ndeye Bineta Mbodj, Health Department Thies University, Senegal

Using Corpora for Engaging Language Teaching: Effective Techniques and Activities
Using concrete examples from their new book published by TESOL, the presenters introduce some common useful procedures and activities for using corpora to teach various aspects of English, including vocabulary, grammar, and writing. They also explain how to develop and use corpora to assess learner language and develop teaching materials.
Dilin Liu, University of Alabama, USA
Lei Lei, Huazhong University of Science and Technology, China

Flexible, Free, and Open Data‑Driven Learning for the Masses
Content Area: Media (Print, Broadcast, Video, and Digital)
This presentation shares findings from multisite research with the open-source FLAX (Flexible Language Acquisition) project. Open digital collections used in formal classroom-based language education and in non-formal online education (MOOCs) are presented to demonstrate how openly licensed linguistic content using data-driven methods can support learning, teaching, and materials development.
Alannah Fitzgerald, Concordia University, USA

Visualizing Vocabulary Across Cultures: Web Images as a Corpus
Content Area: Vocabulary/Lexicon
Cameron Romney, Doshisha University, Japan
John Campbell-Larsen, Kyoto Women’s University, Japan

Developing Autonomous Academic Writing Competence Through Corpus Linguistics
Content Area: CALL/Computer-Assisted Language Learning/
Technology in Education
Chinger Zapata, Universidad Católica del Norte, Chile
Hugo Keith

Data-Driven Learning (DDL) for Teaching Vocabulary and Grammar
Content Area: Teaching Methodology and Strategy
Pramod Sah, University of British Columbia, Canada
Anu Upadhaya, Tribhuvan University, Nepal

Friday 24 March
16 Keys to Teaching ESL Grammar and Vocabulary
Content Area: Grammar
This session uses corpus linguistics data to examine not only which grammar points should be taught but which vocabulary should be taught with each key grammar point. Sample lessons for teaching vocabulary with grammar and tips for designing and teaching these activities are presented.
Keith Folse, University of Central Florida, USA

Beyond Word Lists: Approaching Verbal Complements Lexicogrammatically and Cognitively
Content Area: Grammar
Gerund and infinitive verbal complements are often taught back-to-back via the use of memorization and word lists. This presentation suggests varying lesson placement, approaching the subject from a position of conceptualization of components drawn from Conti’s rule, and incorporating corpus data in classroom materials to improve salience thereof.
Miranda Hartley, University of Alabama, USA

Corpus‑Based Comparison Between Two Lists of Academic English Words
Content Area: Vocabulary/Lexicon
The study discussed compares Coxhead’s Academic Word List and Gardner and Davies’ Academic Vocabulary List in an independently developed 72-million-token university academic corpus to reveal which list is more suitable for academic vocabulary education across different academic disciplines to improve the effectiveness of English‑medium instruction.
Huamin Qi, Western University, Canada

Fostering Effective Participation in L1 Discourse Communities Through Formulaic Sequences
Content Area: Vocabulary/Lexicon
While vocabulary lists contribute substantially to lexical knowledge, discourse-level proficiency remains a challenge. The Academic Formulas List and Phrasal Expressions List, sets of formulaic sequences, address this challenge, helping learners participate more effectively in L1 discourse communities. Facilitators share online and corpus-based activities for formulaic sequence acquisition.
Susanne Rizzo, American University in Cairo, Egypt
Alissa Nostas, Arizona State University, USA
Mariah Fairley, American University in Cairo, Egypt

Developing an Open Educational Resources EAP Corpus
Content Area: English for Specific Purposes
This presentation focuses on the development of an open educational resources EAP corpus. Presenters demonstrate how the corpus can be accessed and downloaded, reused in a variety of ways, revised, remixed, and redistributed to other interested teachers, researchers, and/or students.
Brent Green, Salt Lake Community College, USA
Dean Huber, Salt Lake Community College, USA
George Ellington, Salt Lake Community College, USA

The Emergence of Academic Language Among Advanced Learners
Content Area: Second Language Writing/Composition
This session addresses the gradual changes of academic language based on a pilot study of 35 students over a 16-week graduate course. Suggestions and practical activities, informed by these findings, are demonstrated, including academic discourse techniques and the use of corpora and other online tools for text analysis.
Cheryl Zimmerman, California State University, Fullerton, USA
Jun Li, California State University, Fullerton, USA

Alphabet Street aka Corpus Symposium at VRTwebcon 8

I was delighted to be able to take part in my first webinar as a presenter. Leo Selivan (@leoselivan) asked me to join the corpus symposium for the 8th VRT web conference along side Jenny Wright (@teflhelper) and Sharon Hartle (@hartle). You can find links to our talks at the end of this post as well as my slides.

Presenting on a webinar is definitely a unique experience like talking to yourself knowing others are watching and listening in. Other things to be noted are making sure your microphone is loud enough and that uploaded powerpoints to online systems like Adobe Connect don’t show your slide notes!

My talk was about using BYU-Wikipedia corpus to help recycle coursebook vocabulary and was titled Darling (BYU) Wiki in homage to the recent passing of the great musician Prince. Another webinar note – people can’t hear the music from your computer if you have headphones on!

As I have already posted about using BYU-Wiki for vocabulary recycling, in this post I want to give some brief notes on designing worksheets using some principles from the research literature. When talking about the slide below I did not really explain in the talk what input enhancement and input flood were. And I also did not point out that my adaptation from Barbieri & Eckhardt (2007) was  very loose : ).


Input  enhancement  draws  learners’  attention  to  targeted grammatical features by visually or acoustically flagging L2 input to  enhance  its  perceptual  saliency but  with  no  guarantee  that  learners will attend to the features” (Kim, 2006: 345).

For written text they include things such as underlining, bolding, italicizing, capitalizing, and colouring. Note that the KWIC output from COCA uses colour to label parts of speech.

Input flood similarly enhances saliency through frequency and draws its basis from studies showing importance of repetition in language learning.

Szudarski & Carter (2015) concluded that a combination of input enhancement and input flood can lead to performance gains in collocational knowledge.

Hopefully this post has briefly highlighted some points I did not cover in my 20 min talk. A huge thanks to those who took the time to attend, to Leo and Heike (Philip, @heikephilp) for organizing things smoothly and my co-presenters Jennie and Sharon. Do browse the recordings of the other talks as there are some very interesting ones to check out.

Talk recording links, slides and related blog posts

Jennie Wright, Making trouble-free tasks with corpora

Sharon Hartle, SkELL as a Key to Unlock Exam Preparation

Mura Nava, Darling (BYU) Wiki

Question and Answer Round

My talk slides (pdf)

Summary Post by Sharon Hartle

8th Virtual Round Table Web Conference 6-8 May 2016 program overview

References and further reading:

Barbieri, F., & Eckhardt, S. E. (2007). Applying corpus-based findings to form-focused instruction: The case of reported speech. Language Teaching Research, 11(3), 319-346

Han, Z.,  Park, E. S., & Combs, C. (2008). Textual enhancement of input: issues and possibilities. Applied Linguistics 29.4: 597–618.

Kim,Y. (2006). Effects of input elaboration on vocabulary acquisition through reading by Korean learners of English as a foreign language. TESOL Quarterly 40.2: 341–373.

Szudarski, P., & Carter, R. (2015). The role of input flood and input enhancement in EFL learners’ acquisition of collocations. International Journal of Applied Linguistics.

Monco and “fresh” make & do collocations

Monco the web news monitor corpus (which means it is continuously updated) has a tremendous collocation feature. I first saw a reference to the collocation feature from a tweet by Diane Nicholls ‏@lexicoloco  but when I tried it the server was acting up. I was reminded to try again by a tweet from Dr. Michael Riccioli ‏@curvedway, whoa it is impressive.
For example let’s see what are the collocates of the famous make and do verbs.

For make here is screenshot of search settings for collocation (to get to collocation function look under tools menu from main Monco page). Note I am looking for nouns that come after the verb make. Also the double asterisk is a short cut to look for all forms of make (try it without the asterisks and see what you get).


I get as results for the top 10 collocates (for all forms of make) the following:

Top 10 collocates-make
click on image for full results

Interesting collocations include make sense, make way, make debut. The results can show you at a glance the types of constructions involved:


Or you can open another window for more details:


The top 10 collocates for do are:

Top 10 collocates-do
click on image for full results

Interesting collocates here are do thing, do anything, do something, do nothing makes a change from do shopping, cooking etc : )

Thanks for reading.

Using BYU-Wikipedia corpus to answer genre related questions

A link was posted recently on Twitter to an IELTS site looking at writing processes and describing graphs.
The following caught my eye:

…natural processes are often described using the active voice, whereas man-made or manufacturing processes are usually described using the passive.

The claim seems to go back to 2011 online (

This is an interesting claim. It has been shown that passives are more common in abstract, technical and formal writing (Biber, 1988 as cited by McEnery & Xiao, 2005). Here the claim is about specific written texts on natural processes and man-made processes.

Well we can simplify this by asking are there more passives used when writing about man-made processes than when writing about natural processes? Since if you use passive clauses then you don’t use active clauses and we can come to a conclusion by deduction.

BYU-Wikipedia corpus can be used to get approximations of natural process writing and man-made process writing. The keywords I used (for the title word) were ecology and manufacturing. Filtering out unwanted texts took longer than expected especially for the manufacturing corpus. In the end I had an ecology corpus of 77 articles and  153,621 words and a manufacturing corpus of 116 articles and 98,195 words.

The search term I used to look for passives was are|were [v?n*]. This gave me a total of 293 passives for ecology and 304 passives for manufacturing. According to the Lancaster LL calculator this showed a significant overuse of passives in manufacturing compared to ecology. According to the log ratio score this is about 2 times as common (if I understand this statistic correctly). Now this does not mean much as a lot of the texts in the wikipedia corpora won’t be specifically about processes but still it is interesting.

What is more interesting are the types of verbs used in passives in ecology and manufacturing. The top ten in each case:

























Thanks for reading.


Biber, D. (1988) Variation Across Speech and Writing(Cambridge: Cambridge University Press).

McEnery, A. M. and Xiao, R. Z. (2005) Passive constructions in English and Chinese: A corpus-based contrastive study . Proceedings from the Corpus Linguistics Conference Series, 1 (1). ISSN 1747-9398 Retrieved from

Using BYU Wiki corpus to recycle coursebook vocabulary in a variety of contexts

Recycling vocabulary in a variety of contexts is recommended by the vocabulary literature. Simply going back to texts one has used in a coursebook is an option but it misses the variety of context.

I need to recycle vocabulary from Unit 1 of my TOEIC book, so I take the topics from the table of contents as input to create a wiki corpus.

The main title of Unit 1 in my book is careers, with sub topics of professions, recruitment, training. I could also add in job interview, job fair, temp agency.

Note for more details on various features of the BYU WIKI corpus do see the videos by Mark Davies, for the rest of this post I assume you have some familiarity with these.

So when creating a corpus in BYU WIKI corpus in my Title word(s) search I enter career* to find all titles with career and careers.

Then in the Words in pages box I enter professions, profession, recruitment, training. Note search for plural and 300 as number of pages:

Screenshot 1: corpus search terms

After pressing submit a screen of a list of wiki pages is presented, you can scroll through this to find pages that may be irrelevant to you:

Screenshot 2: wiki pages

After unticking any irrelevant pages press submit. I won’t talk a lot about filtering your corpus build here. As mentioned do make sure to watch Mark Davies series of videos to get more details.

Now you will see your newly created corpus:

Screenshot 3: my virtual corpora

Tick the Specific radio button:

Screenshot 4: specific key word radio button

and then click the nouns keywords. Skill is the top keyword here which also appears in the wordlist in my book:

Screenshot 5: noun keywords

What I am more interested in is verbs so I click that:

Screenshot 6: verb keywords

The noun requirement, which by the way does not come from the careers unit, appears in the book wordlist but not the verb. So now I can look at some example uses of the verb require that I could use in class.

One step is to see what collocates with require:

Screenshot 7: collocates of require

Clicking on the top 5 collocates brings up some potential language.

Another interesting use is once you have a number of corpora you can see what word appear most in each corpora. The following screenshots show corpora related to the first 3 units of my book i.e. Careers, Workplaces, Communications:

Screenshot 8: my virtual corpora

The greyed lines mean those corpora are omitted from my search. This could be a nice exercise where you take some word and get students to see how they are distributed. So for example you may show the distribution of the verb fill:

Screenshot 9: distribution of verb fill

We see that it appears most in the recruit* corpus. One option now is to get students to predict how the verb is used in that corpus and then click the bar to see some examples.

After this demonstration you can now ask students to guess what words will appear most in the various corpora and do the search for the students to see the resulting graphs.

Hope this has shown how we can use BYU WIKI corpus to recycle vocabulary in different contexts.

Do shoot me any questions as this post may indeed be confusing.