#IATEFL 2016 – Corpus Tweets 4

It is good to see a talk on how to create your own corpus as this is arguably one of the key strengths of corpus linguistics i.e. language that is mined for your students in your particular context. Very far from what a coursebook can address. As ever much appreciation to Sandy Millin for bringing us this talk.

IATEFL 2016 Corpus Tweets 4

Teacher-driven corpus development: the online restaurant review by Chad Langford & Joshua Albair as reported by Sandy Millin

  1. Chad Langford and Josh Albair on creating a corpus of restaurant reviews based on TripAdvisor, as they are linguists and teachers

  2. CL/JA They teach adults, not degree seeking, but find writing is a challenge, esp as learners don’t write much, even in L1

  3. CL/JA Genre of these reviews works as learners can relate to it and feel empowered, memberes of non-geographically bound community

  4. CL/JA By crating a corpus, they believed that would characterise the genre as objectively as possible, and improve materials devmnt

  5. #iatefl CL/JA Basic steps for treating the data to make corpus https://t.co/NobXQJ2782

    CL/JA Basic steps for treating the data to make corpus pic.twitter.com/NobXQJ2782

  6. CL/JA They narrowed down TripAdvisor reviews to London, with 100-200 reviews per restaurant, with 3-dot average

  7. CL/JA They copied over 8000 reviews and copied them into Word – pretty tedious! Huge amount of text and lots to be manually deleted

  8. CL/JA Cleaned data in Word is readable and only has tagline and body of review, maintaining paragraphs for later research

  9. CL/JA Needed to standardise, e.g. three dots for ellipsis, standardise common misspellings, removing extraneous spaceing

  10. CL/JA To so this they used Notepad++ which is a free powerful text editor which they used to tidy up formatting

  11. #iatefl CL/JA Examples of coding they were able to lesrn very quickly https://t.co/fk7uHn48zM

    CL/JA Examples of coding they were able to lesrn very quickly pic.twitter.com/fk7uHn48zM

  12. CL/JA Then added POS tagging, metadata about tagline and types of restaurant etc. Used Wordsmith tools which is cheap, but good

  13. CL/JA They used wordlist, keywords and concord tools within WordSmith

  14. CL/JA Final corpus has 67 restaurants, over 8000 reviews and over 1 million words. Can start to identify restaurant review genre

  15. CL/JA Identified positive/neg evaluative adjectives, retaurant-related vocab: experience, description, food, non-food, person, place

  16. CL/JA Also very high frequency of first person pronouns, overwhelming use of was/were (copulative use?)

  17. CL/JA Discourse showed very common to use ‘but’ as marker in 3dot reviews,very rare in 1/5 “good but” v “but good” – meaning change

  18. CL/JA One was much more common the other. Think it was “good but” – missed it!

  19. CL/JA High instance of subject-less clauses, determiner ellipsis and one more grammar feature I missed

  20. CL/JA Determiner ellipsis is very rarely pointed out to our students, except in headlines. e.g. restaurant was dirty, fish was tasty

  21. CL/JA In class they’ve used it for ranking activity – place five taglines on cline on board, next group can add 5 more/move first

  22. CL/JA Second activity is guided discovery sheet based on authentic review which exemplifies characteristics they’ve identified

  23. CL/JA Can get in touch with them at the University of Lille if you’d like to find out more

  24. CL/JA Tilly Harrison brings up the point that this corpus data draws on comments that perhaps people haven’t given permission to use


#IATEFL 2016 – Corpus Tweets 3

The tweeting game is on point this year which means us poor folk at home can feel involved. Cambridge ELT tweeted out Using English Grammar Profile to improve curriculum design by Geraldine Mark & Anne O’Keeffe. One thing to note about this talk is that it, along with all the Cambridge related ones, have been recorded.

IATEFL 2016 Corpus Tweets 3

Using English Grammar Profile to improve curriculum design by Geraldine Mark & Anne O’Keeffe as reported by Cambridge ELT.

  1. We’ll shortly be live-tweeting Anne O’Keeffe & Geraldine Mark’s #IATEFL talk ‘Using English Grammar Profile to improve curriculum design’
  2. The English Grammar Profile (EGP) helps us see how learners develop competence in grammatical form/meaning through the CEFR levels #IATEFL
  3. It provides us with typical grammar profiles for each CEFR level – you can explore EGP here:  http://www.englishprofile.org/english-grammar-profile  #IATEFL
  4. O’Keeffe: The profile is made up of 1222 grammar descriptors describing what learners can do at various CEFR levels #IATEFL
  5. O’Keeffe: As teachers, we think we know a lot about what learners can/can’t do in terms of grammar from intuition/experience #IATEFL
  6. O’Keeffe: EGP shows what we know learners can do with grammar, based on evidence from the Cambridge Learner Corpus #IATEFL
  7. O’Keeffe: The Cambridge Learner Corpus is made up of 200,000 exam scripts, across 140 languages in 200 countries #IATEFL
  8. O’Keeffe: What conditionals do you think learners know at B1 level? #IATEFL
  9. Mark: through the EGP we can identify what learners at B1 can do with conditionals and clauses #IATEFL
  10. Mark: we can see evidence of the 1st, 2nd and 3rd use of the ‘if’ clause for B1 from the EGP for example #IATEFL
  11. Mark: EGP research into adverb & adjective combinations shows a more pragmatic use from C1 learners #IATEFL
  12. O’Keeffe: When you look at errors learners make, there are peaks and troughs – for example Past Simple errors are prolific at B1 #IATEFL
  13. O’Keeffe: When investigating: at A1 they can do 2 things with the Past Simple… But at B1 there are 7 uses with a wider vocab range #IATEFL
  14. O’Keeffe: This explains the relatively high numbers of errors at B1 as they develop this more complex use of the Past Simple #IATEFL
  15. O’Keeffe: More sophisticated uses of the same form are evident at B2 ‘I wondered if you could introduce me…’ #IATEFL
  16. O’Keeffe: This shows how the use of grammatical structures develops incrementally – a developing, non-linear path of language use #IATEFL
  17. O’Keeffe: Considering un/countable nouns, errors are evident up to after C1 – but at A1/2 level you know many fewer nouns! #IATEFL
  18. O’Keeffe: informations, advices and equipments are some of the most error-prone uncountable nouns – but wouldn’t be taught at A1/A2 #IATEFL
  19. Mark: If you do get an uncountable noun wrong it has a ripple effect! The Countability topic across the levels should be recycled #IATEFL
  20. Mark: possessive pronouns are identified at A2 – only ‘mine’ gets used correctly at this level, the EGP shows #IATEFL
  21. O’Keeffe: We hope the EGP will provide a ‘bigger picture’ of grammar – beyond ticking off achieved structures at various levels #IATEFL
  22. O’Keeffe: It could also help with ‘gap analysis’ – where more teaching or attention might be needed – helpful for syllabus design #IATEFL
  23. O’Keeffe: We can plot the lag between explicit input and output of grammar – implicit learning – showing how much time is needed! #IATEFL
  24. O’Keeffe: It can also highlight interesting grammar competencies for teaching advanced level grammar – more sophisticated uses etc. #IATEFL
  25. O’Keeffe: Visit  http://www.englishprofile.org  to explore EGP and the English Vocabulary Profile too! #IATEFL
  26. Make sure to catchup with O’Keeffe and Mark’s English Grammar Profile talk recording tomorrow on  http://iatefltalks.org  #IATEFL


#IATEFL 2016 – Corpus Tweets 2

This is a storify of tweets by Sandy Millan, Dan Ruelle and Leo Selivan on the talk Answering language questions from corpora by James Thomas. Hats off to the tweeters I know it’s not an easy task!

IATEFL 2016 Corpus Tweets 2

Answering language questions from corpora by James Thomas as reported by Sandy Millin, Dan Ruelle & Leo Selivan

  1. James Thomas on answering language questions from corpora. Did not know Masaryk uni was home of Sketch Engine!
  2. JT has written a book about discovering English through SketchEngine with lots of ways you can search and use the corpus
  3. JT trains his trrainees how to use SketchEngine, so they can teach learners how to learn language from language
  4. JT Need to ensure that tasks have a lot of affordances of tasks and texts
  5. We live in an era of collocation, multi-word units, pragmatic competence, fuzziness and multiple affordances – James Thomas
  6. JT Why do SS have language questions? Are the rules inadequate? It’s about hirarchy of choice…
  7. JT Not much choice in terms of letters or morphemes, but lots of choice at text level
  8. JT Patterns are visible in corpora. They are regular features and cover a lot of core English
  9. JT What counts as a language pattern? Collocation, word grammar, language chunks, colligation (and more I didn’t get!)
  10. JT Students have questions about lexical cohesion, spelling mistakes, collocations: at every level of hierarchy
  11. JT Examples of q’s: Does whose refer only to people? Can women be described as handsome? Any patterns with tense/aspect clauses?
  12. JT q’s: Does the truth lie? What is friendly fire? What are the collocations of rule?
  13. JT introduces SKELL: Sketch Engine for Language Learning http://skell. (don’t know!)
  14. “Rules don’t tell whole story” – James Thomas making an analogy w/ Einstein who said same about both the wave & the particle theory
  15. JT SKELL selects useful sentences only, excludes proper nouns, obscure words etc. 40 sentences
  16.  http://skell.sketchengine.co.uk 

    Nice simple interface – need to play with it more. #iatefl

  17. JT searched for mansplain in SKELL and it already has 7 or 8 examples in there
  18. JT Algorithm to reduce amount of sentences only works when there are a lot of examples. With a few, sentences often longer
  19. Sketch Engine is a pretty hardcore linguistic tool, but I can see the use of Skell for language learners. #iatefl
  20. JT Corpora can also teach you more about grammar patterns too, for example periphrasis (didn’t get definition fast enough!)
  21. JT Can search for present perfect continuous for example: have been .*ing
  22. JT You can search for ‘could of’ in SKELL – appears fairly often, but relatively insignificant compared to ‘could have’
  23. Can use frequency in corpus search results to gauge which is “more correct” / “the norm”. #iatefl
  24. JT SKELL can sort collocations by whether a noun is the object or subject of a word for example. Can use ‘word sketch’ function
  25. Unclear whether collocation results in Skell are sorted according to “significance” / frequency or randomly #iatefl
  26. JT See @versatilepub for discounts on book about SKELL