Corpus linguistics community news 7

If you follow events in the UK one can say, without much accusation of hyperbole, that these are indeed strange times.

So why not turn to the relative sanity of corpus linguistics community news 7.

First up is an example of searching BYU-COCA for use of a preposition of place.

Next a post on one way to explore some recent audio-video corpora.

couple of posts related to history of CL.

top tip when using BootCat.

My recommended link is to a mini or maybe it’s a micro CL course by Oxford Dictionaries.

A tool that uses TF-IDF scores to extract n-grams using as an example prime minister questions from ex-prime minister David Cameron and the still leader of the opposition Jeremy Corbyn.

Do check previous corpus linguistics community posts if you haven’t yet.

Thanks for reading and have a good summer/winter.

Alphabet Street aka Corpus Symposium at VRTwebcon 8

I was delighted to be able to take part in my first webinar as a presenter. Leo Selivan (@leoselivan) asked me to join the corpus symposium for the 8th VRT web conference along side Jenny Wright (@teflhelper) and Sharon Hartle (@hartle). You can find links to our talks at the end of this post as well as my slides.

Presenting on a webinar is definitely a unique experience like talking to yourself knowing others are watching and listening in. Other things to be noted are making sure your microphone is loud enough and that uploaded powerpoints to online systems like Adobe Connect don’t show your slide notes!

My talk was about using BYU-Wikipedia corpus to help recycle coursebook vocabulary and was titled Darling (BYU) Wiki in homage to the recent passing of the great musician Prince. Another webinar note – people can’t hear the music from your computer if you have headphones on!

As I have already posted about using BYU-Wiki for vocabulary recycling, in this post I want to give some brief notes on designing worksheets using some principles from the research literature. When talking about the slide below I did not really explain in the talk what input enhancement and input flood were. And I also did not point out that my adaptation from Barbieri & Eckhardt (2007) was  very loose : ).

worksheet-design2

Input  enhancement  draws  learners’  attention  to  targeted grammatical features by visually or acoustically flagging L2 input to  enhance  its  perceptual  saliency but  with  no  guarantee  that  learners will attend to the features” (Kim, 2006: 345).

For written text they include things such as underlining, bolding, italicizing, capitalizing, and colouring. Note that the KWIC output from COCA uses colour to label parts of speech.

Input flood similarly enhances saliency through frequency and draws its basis from studies showing importance of repetition in language learning.

Szudarski & Carter (2015) concluded that a combination of input enhancement and input flood can lead to performance gains in collocational knowledge.

Hopefully this post has briefly highlighted some points I did not cover in my 20 min talk. A huge thanks to those who took the time to attend, to Leo and Heike (Philip, @heikephilp) for organizing things smoothly and my co-presenters Jennie and Sharon. Do browse the recordings of the other talks as there are some very interesting ones to check out.

Talk recording links, slides and related blog posts

Jennie Wright, Making trouble-free tasks with corpora

Sharon Hartle, SkELL as a Key to Unlock Exam Preparation

Mura Nava, Darling (BYU) Wiki

Question and Answer Round

My talk slides (pdf)

Summary Post by Sharon Hartle

8th Virtual Round Table Web Conference 6-8 May 2016 program overview

References and further reading:

Barbieri, F., & Eckhardt, S. E. (2007). Applying corpus-based findings to form-focused instruction: The case of reported speech. Language Teaching Research, 11(3), 319-346

Han, Z.,  Park, E. S., & Combs, C. (2008). Textual enhancement of input: issues and possibilities. Applied Linguistics 29.4: 597–618.

Kim,Y. (2006). Effects of input elaboration on vocabulary acquisition through reading by Korean learners of English as a foreign language. TESOL Quarterly 40.2: 341–373.

Szudarski, P., & Carter, R. (2015). The role of input flood and input enhancement in EFL learners’ acquisition of collocations. International Journal of Applied Linguistics.

Interview with Mike Scott, WordSmith Tools developer

WordSmith Tools, a corpus linguistics program, turned 20 this year quite a feat for software from an independent developer. I have an OSX system so I don’t use WordSmith (though it can be run using Wine and/or virtualization) and also because it is a paid program – always an issue for us poor language teachers. However with the great support and new features on offer the fee seems more and more tempting. Mike Scott kindly answered some questions.

1. Who are you?
A language teacher whose hobby turned into a new career, software development for corpus linguistics. Lucky to get into Corpus Linguistics early on (1980s) and before that lucky to get into EAP early on (in the 1970s). Basically, lucky!

2. What do you think is the most useful feature in WordSmith Tools for language teachers?
WordSmith is used by loads of different types of researchers, many of them not in language teaching: literature, politics, history, medicine, law, sociology. Not many language students use it because they can get free tools elsewhere and many just use Google however much we might wish otherwise. Language teachers probably find the Concord tool and its collocates feature the most useful. 

3. Of the new features in the latest Wordsmith Tools which are you most excited about and why?
I put in new features as I think of them or as people request them. I am usually most excited by the one I’m currently working on because then I’m in the process of struggling to get it working and get it designed elegantly if I can. One I tweeted about recently was video concordancing. I think it will be great when we can routinely concordance enhanced corpora with sound and images as well as words! 

4. How do you see the current corpus linguistic software landscape?
Very much in its infancy. Computer software is only about as old as I am (born soon after WWII). Most other fields of human interest are as old as the hills. We are still feeling our way in a dark cavern full of interesting veins to explore, with only the weakest of illumination. Fun!

Many thanks to Mike for taking the time to respond and to you for reading.

#IATEFL 2016 – Corpus Tweets 4

It is good to see a talk on how to create your own corpus as this is arguably one of the key strengths of corpus linguistics i.e. language that is mined for your students in your particular context. Very far from what a coursebook can address. As ever much appreciation to Sandy Millin for bringing us this talk.

IATEFL 2016 Corpus Tweets 4

Teacher-driven corpus development: the online restaurant review by Chad Langford & Joshua Albair as reported by Sandy Millin

  1. Chad Langford and Josh Albair on creating a corpus of restaurant reviews based on TripAdvisor, as they are linguists and teachers

  2. CL/JA They teach adults, not degree seeking, but find writing is a challenge, esp as learners don’t write much, even in L1

  3. CL/JA Genre of these reviews works as learners can relate to it and feel empowered, memberes of non-geographically bound community

  4. CL/JA By crating a corpus, they believed that would characterise the genre as objectively as possible, and improve materials devmnt

  5. #iatefl CL/JA Basic steps for treating the data to make corpus https://t.co/NobXQJ2782

    CL/JA Basic steps for treating the data to make corpus pic.twitter.com/NobXQJ2782


  6. CL/JA They narrowed down TripAdvisor reviews to London, with 100-200 reviews per restaurant, with 3-dot average


  7. CL/JA They copied over 8000 reviews and copied them into Word – pretty tedious! Huge amount of text and lots to be manually deleted

  8. CL/JA Cleaned data in Word is readable and only has tagline and body of review, maintaining paragraphs for later research


  9. CL/JA Needed to standardise, e.g. three dots for ellipsis, standardise common misspellings, removing extraneous spaceing


  10. CL/JA To so this they used Notepad++ which is a free powerful text editor which they used to tidy up formatting

  11. #iatefl CL/JA Examples of coding they were able to lesrn very quickly https://t.co/fk7uHn48zM

    CL/JA Examples of coding they were able to lesrn very quickly pic.twitter.com/fk7uHn48zM


  12. CL/JA Then added POS tagging, metadata about tagline and types of restaurant etc. Used Wordsmith tools which is cheap, but good


  13. CL/JA They used wordlist, keywords and concord tools within WordSmith


  14. CL/JA Final corpus has 67 restaurants, over 8000 reviews and over 1 million words. Can start to identify restaurant review genre


  15. CL/JA Identified positive/neg evaluative adjectives, retaurant-related vocab: experience, description, food, non-food, person, place


  16. CL/JA Also very high frequency of first person pronouns, overwhelming use of was/were (copulative use?)


  17. CL/JA Discourse showed very common to use ‘but’ as marker in 3dot reviews,very rare in 1/5 “good but” v “but good” – meaning change


  18. CL/JA One was much more common the other. Think it was “good but” – missed it!


  19. CL/JA High instance of subject-less clauses, determiner ellipsis and one more grammar feature I missed


  20. CL/JA Determiner ellipsis is very rarely pointed out to our students, except in headlines. e.g. restaurant was dirty, fish was tasty


  21. CL/JA In class they’ve used it for ranking activity – place five taglines on cline on board, next group can add 5 more/move first


  22. CL/JA Second activity is guided discovery sheet based on authentic review which exemplifies characteristics they’ve identified


  23. CL/JA Can get in touch with them at the University of Lille if you’d like to find out more


  24. CL/JA Tilly Harrison brings up the point that this corpus data draws on comments that perhaps people haven’t given permission to use

 

#IATEFL 2016 – Corpus Tweets 3

The tweeting game is on point this year which means us poor folk at home can feel involved. Cambridge ELT tweeted out Using English Grammar Profile to improve curriculum design by Geraldine Mark & Anne O’Keeffe. One thing to note about this talk is that it, along with all the Cambridge related ones, have been recorded.

IATEFL 2016 Corpus Tweets 3

Using English Grammar Profile to improve curriculum design by Geraldine Mark & Anne O’Keeffe as reported by Cambridge ELT.

  1. We’ll shortly be live-tweeting Anne O’Keeffe & Geraldine Mark’s #IATEFL talk ‘Using English Grammar Profile to improve curriculum design’
  2. The English Grammar Profile (EGP) helps us see how learners develop competence in grammatical form/meaning through the CEFR levels #IATEFL
  3. It provides us with typical grammar profiles for each CEFR level – you can explore EGP here:  http://www.englishprofile.org/english-grammar-profile  #IATEFL
  4. O’Keeffe: The profile is made up of 1222 grammar descriptors describing what learners can do at various CEFR levels #IATEFL
  5. O’Keeffe: As teachers, we think we know a lot about what learners can/can’t do in terms of grammar from intuition/experience #IATEFL
  6. O’Keeffe: EGP shows what we know learners can do with grammar, based on evidence from the Cambridge Learner Corpus #IATEFL
  7. O’Keeffe: The Cambridge Learner Corpus is made up of 200,000 exam scripts, across 140 languages in 200 countries #IATEFL
  8. O’Keeffe: What conditionals do you think learners know at B1 level? #IATEFL
  9. Mark: through the EGP we can identify what learners at B1 can do with conditionals and clauses #IATEFL
  10. Mark: we can see evidence of the 1st, 2nd and 3rd use of the ‘if’ clause for B1 from the EGP for example #IATEFL
  11. Mark: EGP research into adverb & adjective combinations shows a more pragmatic use from C1 learners #IATEFL
  12. O’Keeffe: When you look at errors learners make, there are peaks and troughs – for example Past Simple errors are prolific at B1 #IATEFL
  13. O’Keeffe: When investigating: at A1 they can do 2 things with the Past Simple… But at B1 there are 7 uses with a wider vocab range #IATEFL
  14. O’Keeffe: This explains the relatively high numbers of errors at B1 as they develop this more complex use of the Past Simple #IATEFL
  15. O’Keeffe: More sophisticated uses of the same form are evident at B2 ‘I wondered if you could introduce me…’ #IATEFL
  16. O’Keeffe: This shows how the use of grammatical structures develops incrementally – a developing, non-linear path of language use #IATEFL
  17. O’Keeffe: Considering un/countable nouns, errors are evident up to after C1 – but at A1/2 level you know many fewer nouns! #IATEFL
  18. O’Keeffe: informations, advices and equipments are some of the most error-prone uncountable nouns – but wouldn’t be taught at A1/A2 #IATEFL
  19. Mark: If you do get an uncountable noun wrong it has a ripple effect! The Countability topic across the levels should be recycled #IATEFL
  20. Mark: possessive pronouns are identified at A2 – only ‘mine’ gets used correctly at this level, the EGP shows #IATEFL
  21. O’Keeffe: We hope the EGP will provide a ‘bigger picture’ of grammar – beyond ticking off achieved structures at various levels #IATEFL
  22. O’Keeffe: It could also help with ‘gap analysis’ – where more teaching or attention might be needed – helpful for syllabus design #IATEFL
  23. O’Keeffe: We can plot the lag between explicit input and output of grammar – implicit learning – showing how much time is needed! #IATEFL
  24. O’Keeffe: It can also highlight interesting grammar competencies for teaching advanced level grammar – more sophisticated uses etc. #IATEFL
  25. O’Keeffe: Visit  http://www.englishprofile.org  to explore EGP and the English Vocabulary Profile too! #IATEFL
  26. Make sure to catchup with O’Keeffe and Mark’s English Grammar Profile talk recording tomorrow on  http://iatefltalks.org  #IATEFL

 

#IATEFL 2016 – Corpus Tweets 2

This is a storify of tweets by Sandy Millan, Dan Ruelle and Leo Selivan on the talk Answering language questions from corpora by James Thomas. Hats off to the tweeters I know it’s not an easy task!

IATEFL 2016 Corpus Tweets 2

Answering language questions from corpora by James Thomas as reported by Sandy Millin, Dan Ruelle & Leo Selivan

  1. James Thomas on answering language questions from corpora. Did not know Masaryk uni was home of Sketch Engine!
  2. JT has written a book about discovering English through SketchEngine with lots of ways you can search and use the corpus
  3. JT trains his trrainees how to use SketchEngine, so they can teach learners how to learn language from language
  4. JT Need to ensure that tasks have a lot of affordances of tasks and texts
  5. We live in an era of collocation, multi-word units, pragmatic competence, fuzziness and multiple affordances – James Thomas
  6. JT Why do SS have language questions? Are the rules inadequate? It’s about hirarchy of choice…
  7. JT Not much choice in terms of letters or morphemes, but lots of choice at text level
  8. JT Patterns are visible in corpora. They are regular features and cover a lot of core English
  9. JT What counts as a language pattern? Collocation, word grammar, language chunks, colligation (and more I didn’t get!)
  10. JT Students have questions about lexical cohesion, spelling mistakes, collocations: at every level of hierarchy
  11. JT Examples of q’s: Does whose refer only to people? Can women be described as handsome? Any patterns with tense/aspect clauses?
  12. JT q’s: Does the truth lie? What is friendly fire? What are the collocations of rule?
  13. JT introduces SKELL: Sketch Engine for Language Learning http://skell. (don’t know!)
  14. “Rules don’t tell whole story” – James Thomas making an analogy w/ Einstein who said same about both the wave & the particle theory
  15. JT SKELL selects useful sentences only, excludes proper nouns, obscure words etc. 40 sentences
  16.  http://skell.sketchengine.co.uk 

    Nice simple interface – need to play with it more. #iatefl

  17. JT searched for mansplain in SKELL and it already has 7 or 8 examples in there
  18. JT Algorithm to reduce amount of sentences only works when there are a lot of examples. With a few, sentences often longer
  19. Sketch Engine is a pretty hardcore linguistic tool, but I can see the use of Skell for language learners. #iatefl
  20. JT Corpora can also teach you more about grammar patterns too, for example periphrasis (didn’t get definition fast enough!)
  21. JT Can search for present perfect continuous for example: have been .*ing
  22. JT You can search for ‘could of’ in SKELL – appears fairly often, but relatively insignificant compared to ‘could have’
  23. Can use frequency in corpus search results to gauge which is “more correct” / “the norm”. #iatefl
  24. JT SKELL can sort collocations by whether a noun is the object or subject of a word for example. Can use ‘word sketch’ function
  25. Unclear whether collocation results in Skell are sorted according to “significance” / frequency or randomly #iatefl
  26. JT See @versatilepub for discounts on book about SKELL

 

#IATEFL 2016 – Corpus Tweets 1

This is a storified verison of the tweets by Sandy Millinon a talk called Making trouble free corpus tasks in ten minutes by Jennie Wright @teflhelper. Hopefully there will be other tweeters attending the other corpus based talks, who will be up to the standard set by Sandy : )

IATEFL2016 Corpus Tweets 1

Making trouble free corpus tasks in ten minutes – Jennie Wright as reported by Sandy Millin

  1. Jennie Wright now in Hall 8b ‘Making trouble-free corpus tasks in ten minutes’
  2. Jennie Wright runs the TEFL helper blog:  http://teflhelperblog.wordpress.com 
  3. Jennie Wright All you need to make quick corpus tasks is a good copy-paster
  4. Jennie Wright Key terms: corpus/corpora: multi-million word collections of lang, concordance lines: search term presented in middle
  5. Jennie Wright POS/grammatical tagging is tagging with noun, verb etc; KWIC is key word in the middle (was wrong before! Sorry)
  6. Jennie Wright COCA is one her business English students go back to:  http://corpus.byu.edu/coca 
  7. #iatefl Jennie Wright This is the search for COCA. It's vey intuitive. Put a key word in the box https://t.co/8YcJPuSaxh

    Jennie Wright This is the search for COCA. It’s vey intuitive. Put a key word in the box pic.twitter.com/8YcJPuSaxh
  8. Jennie Wright You can click KWIC to see the key word in the centre. Very fast!
  9. Jennie Wright COCA is great because it’s colour coded and the parts of speech are tagged
  10. #iatefl Jennie Wright What's the missing word? Get your concordance lines, then blank out one word https://t.co/owVfQ4xU33

    Jennie Wright What’s the missing word? Get your concordance lines, then blank out one word pic.twitter.com/owVfQ4xU33
  11. Jennie Wright It’s ‘thingie’ – she wanted her students to use something other than ‘thing’ or ‘stuff’! Good for fossilised errors
  12. Jennie Wright To make it, use a screengrab or clipping tool tog et the concordance lines, then block out words you want SS to guess
  13. Jennie Wright Don’t forget to read the concordance lines before you copy and paste! Avoid accidents:)
  14. Jennie Wright Activity 2: collocation gamble. Focus on strong and weak collocations and lexical chunking. Good for SS misusing them
  15. #iatefl Jennie Wright. What are the two most common adjective collocations each for bitterly, deeply and sincerely? https://t.co/UAj50bjmuP

    Jennie Wright. What are the two most common adjective collocations each for bitterly, deeply and sincerely? pic.twitter.com/UAj50bjmuP
  16. @teflhelper 2 most common=’bitterly’: ‘disappointed’/’cold’. ‘deeply’:’concerned’/’sorry’. ‘sincerely’:’interested’/’sorry’
  17. @teflhelper To make collocation gamble, use list function, type ‘bitterly [j*]’ in word box, then adj.ALL in POS list
  18. I think Sandy meant you can use POS list to insert code for adjective if you want not in addition to the search term given [mura]
  19. #iatefl @teflhelper 3. Colour-code me. COCA helps you to do this easily. Can you colour-code these sentences? https://t.co/TuRkPYUdu8

    @teflhelper 3. Colour-code me. COCA helps you to do this easily. Can you colour-code these sentences? pic.twitter.com/TuRkPYUdu8
  20. @teflhelper To make this, search for your word, screen shot the answers and retype the sentences for them to colour code
  21. @teflhelper Audience member suggests giving them a key of colours and get them to figure out a sentence which matches
  22. @teflhelper Tips: 1.train a little: better to know one corpus well than a lot of them a little. 2. Imperfections exist in corpora
  23. @teflhelper Tips: Don’t be afraid to oppose what’s in the corpus – you’re the ‘live corpus in the classroom’ Read it carefully
  24. @teflhelper Tips: 3. Choose wisely – never more than 10 lines and don’t overwhelm. 4. What’s the problem you want to solve?
  25. @teflhelper COCA is very helpful when your students don’t believe you:) maximal v. maximum [thing Ngrams useful here too]
  26. @teflhelper Tip 5: consider how to do this online or offline. If online, what’s your backup plan? Have paper copies!
  27. @teflhelper COCA bites on Youtube are 2-minute tutorials on how to use COCA
  28. @teflhelper Thanks very much Jennie for an excellent talk. Exactly what I’ve needed for a long time:)