#IATEFL 2016 – Corpus Tweets 4

It is good to see a talk on how to create your own corpus as this is arguably one of the key strengths of corpus linguistics i.e. language that is mined for your students in your particular context. Very far from what a coursebook can address. As ever much appreciation to Sandy Millin for bringing us this talk.

IATEFL 2016 Corpus Tweets 4

Teacher-driven corpus development: the online restaurant review by Chad Langford & Joshua Albair as reported by Sandy Millin

  1. Chad Langford and Josh Albair on creating a corpus of restaurant reviews based on TripAdvisor, as they are linguists and teachers

  2. CL/JA They teach adults, not degree seeking, but find writing is a challenge, esp as learners don’t write much, even in L1

  3. CL/JA Genre of these reviews works as learners can relate to it and feel empowered, memberes of non-geographically bound community

  4. CL/JA By crating a corpus, they believed that would characterise the genre as objectively as possible, and improve materials devmnt

  5. #iatefl CL/JA Basic steps for treating the data to make corpus https://t.co/NobXQJ2782

    CL/JA Basic steps for treating the data to make corpus pic.twitter.com/NobXQJ2782


  6. CL/JA They narrowed down TripAdvisor reviews to London, with 100-200 reviews per restaurant, with 3-dot average


  7. CL/JA They copied over 8000 reviews and copied them into Word – pretty tedious! Huge amount of text and lots to be manually deleted

  8. CL/JA Cleaned data in Word is readable and only has tagline and body of review, maintaining paragraphs for later research


  9. CL/JA Needed to standardise, e.g. three dots for ellipsis, standardise common misspellings, removing extraneous spaceing


  10. CL/JA To so this they used Notepad++ which is a free powerful text editor which they used to tidy up formatting

  11. #iatefl CL/JA Examples of coding they were able to lesrn very quickly https://t.co/fk7uHn48zM

    CL/JA Examples of coding they were able to lesrn very quickly pic.twitter.com/fk7uHn48zM


  12. CL/JA Then added POS tagging, metadata about tagline and types of restaurant etc. Used Wordsmith tools which is cheap, but good


  13. CL/JA They used wordlist, keywords and concord tools within WordSmith


  14. CL/JA Final corpus has 67 restaurants, over 8000 reviews and over 1 million words. Can start to identify restaurant review genre


  15. CL/JA Identified positive/neg evaluative adjectives, retaurant-related vocab: experience, description, food, non-food, person, place


  16. CL/JA Also very high frequency of first person pronouns, overwhelming use of was/were (copulative use?)


  17. CL/JA Discourse showed very common to use ‘but’ as marker in 3dot reviews,very rare in 1/5 “good but” v “but good” – meaning change


  18. CL/JA One was much more common the other. Think it was “good but” – missed it!


  19. CL/JA High instance of subject-less clauses, determiner ellipsis and one more grammar feature I missed


  20. CL/JA Determiner ellipsis is very rarely pointed out to our students, except in headlines. e.g. restaurant was dirty, fish was tasty


  21. CL/JA In class they’ve used it for ranking activity – place five taglines on cline on board, next group can add 5 more/move first


  22. CL/JA Second activity is guided discovery sheet based on authentic review which exemplifies characteristics they’ve identified


  23. CL/JA Can get in touch with them at the University of Lille if you’d like to find out more


  24. CL/JA Tilly Harrison brings up the point that this corpus data draws on comments that perhaps people haven’t given permission to use

 

#IATEFL 2016 – Corpus Tweets 2

This is a storify of tweets by Sandy Millan, Dan Ruelle and Leo Selivan on the talk Answering language questions from corpora by James Thomas. Hats off to the tweeters I know it’s not an easy task!

IATEFL 2016 Corpus Tweets 2

Answering language questions from corpora by James Thomas as reported by Sandy Millin, Dan Ruelle & Leo Selivan

  1. James Thomas on answering language questions from corpora. Did not know Masaryk uni was home of Sketch Engine!
  2. JT has written a book about discovering English through SketchEngine with lots of ways you can search and use the corpus
  3. JT trains his trrainees how to use SketchEngine, so they can teach learners how to learn language from language
  4. JT Need to ensure that tasks have a lot of affordances of tasks and texts
  5. We live in an era of collocation, multi-word units, pragmatic competence, fuzziness and multiple affordances – James Thomas
  6. JT Why do SS have language questions? Are the rules inadequate? It’s about hirarchy of choice…
  7. JT Not much choice in terms of letters or morphemes, but lots of choice at text level
  8. JT Patterns are visible in corpora. They are regular features and cover a lot of core English
  9. JT What counts as a language pattern? Collocation, word grammar, language chunks, colligation (and more I didn’t get!)
  10. JT Students have questions about lexical cohesion, spelling mistakes, collocations: at every level of hierarchy
  11. JT Examples of q’s: Does whose refer only to people? Can women be described as handsome? Any patterns with tense/aspect clauses?
  12. JT q’s: Does the truth lie? What is friendly fire? What are the collocations of rule?
  13. JT introduces SKELL: Sketch Engine for Language Learning http://skell. (don’t know!)
  14. “Rules don’t tell whole story” – James Thomas making an analogy w/ Einstein who said same about both the wave & the particle theory
  15. JT SKELL selects useful sentences only, excludes proper nouns, obscure words etc. 40 sentences
  16.  http://skell.sketchengine.co.uk 

    Nice simple interface – need to play with it more. #iatefl

  17. JT searched for mansplain in SKELL and it already has 7 or 8 examples in there
  18. JT Algorithm to reduce amount of sentences only works when there are a lot of examples. With a few, sentences often longer
  19. Sketch Engine is a pretty hardcore linguistic tool, but I can see the use of Skell for language learners. #iatefl
  20. JT Corpora can also teach you more about grammar patterns too, for example periphrasis (didn’t get definition fast enough!)
  21. JT Can search for present perfect continuous for example: have been .*ing
  22. JT You can search for ‘could of’ in SKELL – appears fairly often, but relatively insignificant compared to ‘could have’
  23. Can use frequency in corpus search results to gauge which is “more correct” / “the norm”. #iatefl
  24. JT SKELL can sort collocations by whether a noun is the object or subject of a word for example. Can use ‘word sketch’ function
  25. Unclear whether collocation results in Skell are sorted according to “significance” / frequency or randomly #iatefl
  26. JT See @versatilepub for discounts on book about SKELL

 

#IATEFL 2016 – Corpus Tweets 1

This is a storified verison of the tweets by Sandy Millinon a talk called Making trouble free corpus tasks in ten minutes by Jennie Wright @teflhelper. Hopefully there will be other tweeters attending the other corpus based talks, who will be up to the standard set by Sandy : )

IATEFL2016 Corpus Tweets 1

Making trouble free corpus tasks in ten minutes – Jennie Wright as reported by Sandy Millin

  1. Jennie Wright now in Hall 8b ‘Making trouble-free corpus tasks in ten minutes’
  2. Jennie Wright runs the TEFL helper blog:  http://teflhelperblog.wordpress.com 
  3. Jennie Wright All you need to make quick corpus tasks is a good copy-paster
  4. Jennie Wright Key terms: corpus/corpora: multi-million word collections of lang, concordance lines: search term presented in middle
  5. Jennie Wright POS/grammatical tagging is tagging with noun, verb etc; KWIC is key word in the middle (was wrong before! Sorry)
  6. Jennie Wright COCA is one her business English students go back to:  http://corpus.byu.edu/coca 
  7. #iatefl Jennie Wright This is the search for COCA. It's vey intuitive. Put a key word in the box https://t.co/8YcJPuSaxh

    Jennie Wright This is the search for COCA. It’s vey intuitive. Put a key word in the box pic.twitter.com/8YcJPuSaxh
  8. Jennie Wright You can click KWIC to see the key word in the centre. Very fast!
  9. Jennie Wright COCA is great because it’s colour coded and the parts of speech are tagged
  10. #iatefl Jennie Wright What's the missing word? Get your concordance lines, then blank out one word https://t.co/owVfQ4xU33

    Jennie Wright What’s the missing word? Get your concordance lines, then blank out one word pic.twitter.com/owVfQ4xU33
  11. Jennie Wright It’s ‘thingie’ – she wanted her students to use something other than ‘thing’ or ‘stuff’! Good for fossilised errors
  12. Jennie Wright To make it, use a screengrab or clipping tool tog et the concordance lines, then block out words you want SS to guess
  13. Jennie Wright Don’t forget to read the concordance lines before you copy and paste! Avoid accidents 🙂
  14. Jennie Wright Activity 2: collocation gamble. Focus on strong and weak collocations and lexical chunking. Good for SS misusing them
  15. #iatefl Jennie Wright. What are the two most common adjective collocations each for bitterly, deeply and sincerely? https://t.co/UAj50bjmuP

    Jennie Wright. What are the two most common adjective collocations each for bitterly, deeply and sincerely? pic.twitter.com/UAj50bjmuP
  16. @teflhelper 2 most common=’bitterly’: ‘disappointed’/’cold’. ‘deeply’:’concerned’/’sorry’. ‘sincerely’:’interested’/’sorry’
  17. @teflhelper To make collocation gamble, use list function, type ‘bitterly [j*]’ in word box, then adj.ALL in POS list
  18. I think Sandy meant you can use POS list to insert code for adjective if you want not in addition to the search term given [mura]
  19. #iatefl @teflhelper 3. Colour-code me. COCA helps you to do this easily. Can you colour-code these sentences? https://t.co/TuRkPYUdu8

    @teflhelper 3. Colour-code me. COCA helps you to do this easily. Can you colour-code these sentences? pic.twitter.com/TuRkPYUdu8
  20. @teflhelper To make this, search for your word, screen shot the answers and retype the sentences for them to colour code
  21. @teflhelper Audience member suggests giving them a key of colours and get them to figure out a sentence which matches
  22. @teflhelper Tips: 1.train a little: better to know one corpus well than a lot of them a little. 2. Imperfections exist in corpora
  23. @teflhelper Tips: Don’t be afraid to oppose what’s in the corpus – you’re the ‘live corpus in the classroom’ Read it carefully
  24. @teflhelper Tips: 3. Choose wisely – never more than 10 lines and don’t overwhelm. 4. What’s the problem you want to solve?
  25. @teflhelper COCA is very helpful when your students don’t believe you 🙂 maximal v. maximum [thing Ngrams useful here too]
  26. @teflhelper Tip 5: consider how to do this online or offline. If online, what’s your backup plan? Have paper copies!
  27. @teflhelper COCA bites on Youtube are 2-minute tutorials on how to use COCA
  28. @teflhelper Thanks very much Jennie for an excellent talk. Exactly what I’ve needed for a long time 🙂