Interview with Mike Scott, WordSmith Tools developer

WordSmith Tools, a corpus linguistics program, turned 20 this year quite a feat for software from an independent developer. I have an OSX system so I don’t use WordSmith (though it can be run using Wine and/or virtualization) and also because it is a paid program – always an issue for us poor language teachers. However with the great support and new features on offer the fee seems more and more tempting. Mike Scott kindly answered some questions.

1. Who are you?
A language teacher whose hobby turned into a new career, software development for corpus linguistics. Lucky to get into Corpus Linguistics early on (1980s) and before that lucky to get into EAP early on (in the 1970s). Basically, lucky!

2. What do you think is the most useful feature in WordSmith Tools for language teachers?
WordSmith is used by loads of different types of researchers, many of them not in language teaching: literature, politics, history, medicine, law, sociology. Not many language students use it because they can get free tools elsewhere and many just use Google however much we might wish otherwise. Language teachers probably find the Concord tool and its collocates feature the most useful. 

3. Of the new features in the latest Wordsmith Tools which are you most excited about and why?
I put in new features as I think of them or as people request them. I am usually most excited by the one I’m currently working on because then I’m in the process of struggling to get it working and get it designed elegantly if I can. One I tweeted about recently was video concordancing. I think it will be great when we can routinely concordance enhanced corpora with sound and images as well as words! 

4. How do you see the current corpus linguistic software landscape?
Very much in its infancy. Computer software is only about as old as I am (born soon after WWII). Most other fields of human interest are as old as the hills. We are still feeling our way in a dark cavern full of interesting veins to explore, with only the weakest of illumination. Fun!

Many thanks to Mike for taking the time to respond and to you for reading.

#IATEFL 2016 – Corpus Tweets 4

It is good to see a talk on how to create your own corpus as this is arguably one of the key strengths of corpus linguistics i.e. language that is mined for your students in your particular context. Very far from what a coursebook can address. As ever much appreciation to Sandy Millin for bringing us this talk.

IATEFL 2016 Corpus Tweets 4

Teacher-driven corpus development: the online restaurant review by Chad Langford & Joshua Albair as reported by Sandy Millin

  1. Chad Langford and Josh Albair on creating a corpus of restaurant reviews based on TripAdvisor, as they are linguists and teachers

  2. CL/JA They teach adults, not degree seeking, but find writing is a challenge, esp as learners don’t write much, even in L1

  3. CL/JA Genre of these reviews works as learners can relate to it and feel empowered, memberes of non-geographically bound community

  4. CL/JA By crating a corpus, they believed that would characterise the genre as objectively as possible, and improve materials devmnt

  5. #iatefl CL/JA Basic steps for treating the data to make corpus https://t.co/NobXQJ2782

    CL/JA Basic steps for treating the data to make corpus pic.twitter.com/NobXQJ2782


  6. CL/JA They narrowed down TripAdvisor reviews to London, with 100-200 reviews per restaurant, with 3-dot average


  7. CL/JA They copied over 8000 reviews and copied them into Word – pretty tedious! Huge amount of text and lots to be manually deleted

  8. CL/JA Cleaned data in Word is readable and only has tagline and body of review, maintaining paragraphs for later research


  9. CL/JA Needed to standardise, e.g. three dots for ellipsis, standardise common misspellings, removing extraneous spaceing


  10. CL/JA To so this they used Notepad++ which is a free powerful text editor which they used to tidy up formatting

  11. #iatefl CL/JA Examples of coding they were able to lesrn very quickly https://t.co/fk7uHn48zM

    CL/JA Examples of coding they were able to lesrn very quickly pic.twitter.com/fk7uHn48zM


  12. CL/JA Then added POS tagging, metadata about tagline and types of restaurant etc. Used Wordsmith tools which is cheap, but good


  13. CL/JA They used wordlist, keywords and concord tools within WordSmith


  14. CL/JA Final corpus has 67 restaurants, over 8000 reviews and over 1 million words. Can start to identify restaurant review genre


  15. CL/JA Identified positive/neg evaluative adjectives, retaurant-related vocab: experience, description, food, non-food, person, place


  16. CL/JA Also very high frequency of first person pronouns, overwhelming use of was/were (copulative use?)


  17. CL/JA Discourse showed very common to use ‘but’ as marker in 3dot reviews,very rare in 1/5 “good but” v “but good” – meaning change


  18. CL/JA One was much more common the other. Think it was “good but” – missed it!


  19. CL/JA High instance of subject-less clauses, determiner ellipsis and one more grammar feature I missed


  20. CL/JA Determiner ellipsis is very rarely pointed out to our students, except in headlines. e.g. restaurant was dirty, fish was tasty


  21. CL/JA In class they’ve used it for ranking activity – place five taglines on cline on board, next group can add 5 more/move first


  22. CL/JA Second activity is guided discovery sheet based on authentic review which exemplifies characteristics they’ve identified


  23. CL/JA Can get in touch with them at the University of Lille if you’d like to find out more


  24. CL/JA Tilly Harrison brings up the point that this corpus data draws on comments that perhaps people haven’t given permission to use

 

#IATEFL 2016 – Corpus Tweets 3

The tweeting game is on point this year which means us poor folk at home can feel involved. Cambridge ELT tweeted out Using English Grammar Profile to improve curriculum design by Geraldine Mark & Anne O’Keeffe. One thing to note about this talk is that it, along with all the Cambridge related ones, have been recorded.

IATEFL 2016 Corpus Tweets 3

Using English Grammar Profile to improve curriculum design by Geraldine Mark & Anne O’Keeffe as reported by Cambridge ELT.

  1. We’ll shortly be live-tweeting Anne O’Keeffe & Geraldine Mark’s #IATEFL talk ‘Using English Grammar Profile to improve curriculum design’
  2. The English Grammar Profile (EGP) helps us see how learners develop competence in grammatical form/meaning through the CEFR levels #IATEFL
  3. It provides us with typical grammar profiles for each CEFR level – you can explore EGP here:  http://www.englishprofile.org/english-grammar-profile  #IATEFL
  4. O’Keeffe: The profile is made up of 1222 grammar descriptors describing what learners can do at various CEFR levels #IATEFL
  5. O’Keeffe: As teachers, we think we know a lot about what learners can/can’t do in terms of grammar from intuition/experience #IATEFL
  6. O’Keeffe: EGP shows what we know learners can do with grammar, based on evidence from the Cambridge Learner Corpus #IATEFL
  7. O’Keeffe: The Cambridge Learner Corpus is made up of 200,000 exam scripts, across 140 languages in 200 countries #IATEFL
  8. O’Keeffe: What conditionals do you think learners know at B1 level? #IATEFL
  9. Mark: through the EGP we can identify what learners at B1 can do with conditionals and clauses #IATEFL
  10. Mark: we can see evidence of the 1st, 2nd and 3rd use of the ‘if’ clause for B1 from the EGP for example #IATEFL
  11. Mark: EGP research into adverb & adjective combinations shows a more pragmatic use from C1 learners #IATEFL
  12. O’Keeffe: When you look at errors learners make, there are peaks and troughs – for example Past Simple errors are prolific at B1 #IATEFL
  13. O’Keeffe: When investigating: at A1 they can do 2 things with the Past Simple… But at B1 there are 7 uses with a wider vocab range #IATEFL
  14. O’Keeffe: This explains the relatively high numbers of errors at B1 as they develop this more complex use of the Past Simple #IATEFL
  15. O’Keeffe: More sophisticated uses of the same form are evident at B2 ‘I wondered if you could introduce me…’ #IATEFL
  16. O’Keeffe: This shows how the use of grammatical structures develops incrementally – a developing, non-linear path of language use #IATEFL
  17. O’Keeffe: Considering un/countable nouns, errors are evident up to after C1 – but at A1/2 level you know many fewer nouns! #IATEFL
  18. O’Keeffe: informations, advices and equipments are some of the most error-prone uncountable nouns – but wouldn’t be taught at A1/A2 #IATEFL
  19. Mark: If you do get an uncountable noun wrong it has a ripple effect! The Countability topic across the levels should be recycled #IATEFL
  20. Mark: possessive pronouns are identified at A2 – only ‘mine’ gets used correctly at this level, the EGP shows #IATEFL
  21. O’Keeffe: We hope the EGP will provide a ‘bigger picture’ of grammar – beyond ticking off achieved structures at various levels #IATEFL
  22. O’Keeffe: It could also help with ‘gap analysis’ – where more teaching or attention might be needed – helpful for syllabus design #IATEFL
  23. O’Keeffe: We can plot the lag between explicit input and output of grammar – implicit learning – showing how much time is needed! #IATEFL
  24. O’Keeffe: It can also highlight interesting grammar competencies for teaching advanced level grammar – more sophisticated uses etc. #IATEFL
  25. O’Keeffe: Visit  http://www.englishprofile.org  to explore EGP and the English Vocabulary Profile too! #IATEFL
  26. Make sure to catchup with O’Keeffe and Mark’s English Grammar Profile talk recording tomorrow on  http://iatefltalks.org  #IATEFL

 

#IATEFL 2016 – Corpus Tweets 2

This is a storify of tweets by Sandy Millan, Dan Ruelle and Leo Selivan on the talk Answering language questions from corpora by James Thomas. Hats off to the tweeters I know it’s not an easy task!

IATEFL 2016 Corpus Tweets 2

Answering language questions from corpora by James Thomas as reported by Sandy Millin, Dan Ruelle & Leo Selivan

  1. James Thomas on answering language questions from corpora. Did not know Masaryk uni was home of Sketch Engine!
  2. JT has written a book about discovering English through SketchEngine with lots of ways you can search and use the corpus
  3. JT trains his trrainees how to use SketchEngine, so they can teach learners how to learn language from language
  4. JT Need to ensure that tasks have a lot of affordances of tasks and texts
  5. We live in an era of collocation, multi-word units, pragmatic competence, fuzziness and multiple affordances – James Thomas
  6. JT Why do SS have language questions? Are the rules inadequate? It’s about hirarchy of choice…
  7. JT Not much choice in terms of letters or morphemes, but lots of choice at text level
  8. JT Patterns are visible in corpora. They are regular features and cover a lot of core English
  9. JT What counts as a language pattern? Collocation, word grammar, language chunks, colligation (and more I didn’t get!)
  10. JT Students have questions about lexical cohesion, spelling mistakes, collocations: at every level of hierarchy
  11. JT Examples of q’s: Does whose refer only to people? Can women be described as handsome? Any patterns with tense/aspect clauses?
  12. JT q’s: Does the truth lie? What is friendly fire? What are the collocations of rule?
  13. JT introduces SKELL: Sketch Engine for Language Learning http://skell. (don’t know!)
  14. “Rules don’t tell whole story” – James Thomas making an analogy w/ Einstein who said same about both the wave & the particle theory
  15. JT SKELL selects useful sentences only, excludes proper nouns, obscure words etc. 40 sentences
  16.  http://skell.sketchengine.co.uk 

    Nice simple interface – need to play with it more. #iatefl

  17. JT searched for mansplain in SKELL and it already has 7 or 8 examples in there
  18. JT Algorithm to reduce amount of sentences only works when there are a lot of examples. With a few, sentences often longer
  19. Sketch Engine is a pretty hardcore linguistic tool, but I can see the use of Skell for language learners. #iatefl
  20. JT Corpora can also teach you more about grammar patterns too, for example periphrasis (didn’t get definition fast enough!)
  21. JT Can search for present perfect continuous for example: have been .*ing
  22. JT You can search for ‘could of’ in SKELL – appears fairly often, but relatively insignificant compared to ‘could have’
  23. Can use frequency in corpus search results to gauge which is “more correct” / “the norm”. #iatefl
  24. JT SKELL can sort collocations by whether a noun is the object or subject of a word for example. Can use ‘word sketch’ function
  25. Unclear whether collocation results in Skell are sorted according to “significance” / frequency or randomly #iatefl
  26. JT See @versatilepub for discounts on book about SKELL

 

#IATEFL 2016 – Corpus Tweets 1

This is a storified verison of the tweets by Sandy Millinon a talk called Making trouble free corpus tasks in ten minutes by Jennie Wright @teflhelper. Hopefully there will be other tweeters attending the other corpus based talks, who will be up to the standard set by Sandy : )

IATEFL2016 Corpus Tweets 1

Making trouble free corpus tasks in ten minutes – Jennie Wright as reported by Sandy Millin

  1. Jennie Wright now in Hall 8b ‘Making trouble-free corpus tasks in ten minutes’
  2. Jennie Wright runs the TEFL helper blog:  http://teflhelperblog.wordpress.com 
  3. Jennie Wright All you need to make quick corpus tasks is a good copy-paster
  4. Jennie Wright Key terms: corpus/corpora: multi-million word collections of lang, concordance lines: search term presented in middle
  5. Jennie Wright POS/grammatical tagging is tagging with noun, verb etc; KWIC is key word in the middle (was wrong before! Sorry)
  6. Jennie Wright COCA is one her business English students go back to:  http://corpus.byu.edu/coca 
  7. #iatefl Jennie Wright This is the search for COCA. It's vey intuitive. Put a key word in the box https://t.co/8YcJPuSaxh

    Jennie Wright This is the search for COCA. It’s vey intuitive. Put a key word in the box pic.twitter.com/8YcJPuSaxh
  8. Jennie Wright You can click KWIC to see the key word in the centre. Very fast!
  9. Jennie Wright COCA is great because it’s colour coded and the parts of speech are tagged
  10. #iatefl Jennie Wright What's the missing word? Get your concordance lines, then blank out one word https://t.co/owVfQ4xU33

    Jennie Wright What’s the missing word? Get your concordance lines, then blank out one word pic.twitter.com/owVfQ4xU33
  11. Jennie Wright It’s ‘thingie’ – she wanted her students to use something other than ‘thing’ or ‘stuff’! Good for fossilised errors
  12. Jennie Wright To make it, use a screengrab or clipping tool tog et the concordance lines, then block out words you want SS to guess
  13. Jennie Wright Don’t forget to read the concordance lines before you copy and paste! Avoid accidents:)
  14. Jennie Wright Activity 2: collocation gamble. Focus on strong and weak collocations and lexical chunking. Good for SS misusing them
  15. #iatefl Jennie Wright. What are the two most common adjective collocations each for bitterly, deeply and sincerely? https://t.co/UAj50bjmuP

    Jennie Wright. What are the two most common adjective collocations each for bitterly, deeply and sincerely? pic.twitter.com/UAj50bjmuP
  16. @teflhelper 2 most common=’bitterly’: ‘disappointed’/’cold’. ‘deeply’:’concerned’/’sorry’. ‘sincerely’:’interested’/’sorry’
  17. @teflhelper To make collocation gamble, use list function, type ‘bitterly [j*]’ in word box, then adj.ALL in POS list
  18. I think Sandy meant you can use POS list to insert code for adjective if you want not in addition to the search term given [mura]
  19. #iatefl @teflhelper 3. Colour-code me. COCA helps you to do this easily. Can you colour-code these sentences? https://t.co/TuRkPYUdu8

    @teflhelper 3. Colour-code me. COCA helps you to do this easily. Can you colour-code these sentences? pic.twitter.com/TuRkPYUdu8
  20. @teflhelper To make this, search for your word, screen shot the answers and retype the sentences for them to colour code
  21. @teflhelper Audience member suggests giving them a key of colours and get them to figure out a sentence which matches
  22. @teflhelper Tips: 1.train a little: better to know one corpus well than a lot of them a little. 2. Imperfections exist in corpora
  23. @teflhelper Tips: Don’t be afraid to oppose what’s in the corpus – you’re the ‘live corpus in the classroom’ Read it carefully
  24. @teflhelper Tips: 3. Choose wisely – never more than 10 lines and don’t overwhelm. 4. What’s the problem you want to solve?
  25. @teflhelper COCA is very helpful when your students don’t believe you:) maximal v. maximum [thing Ngrams useful here too]
  26. @teflhelper Tip 5: consider how to do this online or offline. If online, what’s your backup plan? Have paper copies!
  27. @teflhelper COCA bites on Youtube are 2-minute tutorials on how to use COCA
  28. @teflhelper Thanks very much Jennie for an excellent talk. Exactly what I’ve needed for a long time:)

#IATEFL 2016 – Can a language test measure integration

This was billed as TELC Signature Event – Can a language test measure integration. The discussion is interesting with loads of great quotes. I do recommend you watching it. One of the lines I liked most was by someone called Horatio Clare who abhors the notion of borders, at 25:31 mark he says:

A lot of Brits don’t eat Turkey on Christmas day, most Brits can’t tell you when women got the vote. Knowing these things doesn’t tell you that most women in Britain don’t vote and that the government was elected with 24 percent of the electorate. These things are the real Britain to which they are coming to.

Horatio Clare

All the participants recognised the deep political aspects of this issue that is beyond any simple debate on test validity.

Video – TELC Signature Event – Can a language test measure integration

#IATEFL 2016 – conference app review

Four years ago my first IATEFL 2012 blog post was a quick review of the conference mobile app. Back then mobile apps and mobile learning was the buzz. Back then I stubbornly used the term program in my post instead of app as my way of highlighting the consumerisation of software. Either that or I just wanted to be contrary and pretentious. Anyhoo, onto 2016 and another quick & dirty review of the conference app.

By some accounts there were issues with its availability and some problems with some of the functions. Now everything seems to be sorted and the app is ok. When I say ok, this is less a comment on the app itself and more a comment on today’s blasé adoption of apps into the everyday. I’ll stop there and spare you any more musings, onto the app!

Downloading the app from Google Play Store, the permissions pop up poses the question as to why the identity permission is asked for?

permissions

In my case not too bothered as I have a program that can disable any permissions I do not want.

The home page shows a number of sections on the left and two sections on the bottom:
home2

There is a nice touch with the keyword hunt to win a prize via the Reward Card section to enter a prize draw (anyone know what the prize is?):

rewardcard

Adding items to My Agenda is straightforward and it’s nice to be able to link through to item details easily from an agenda entry. I recall in the 2012 version this was more cumbersome.

agenda

Likewise the My Notes are ok except not sure how to enter entry in category other than General Notes?

notes

Clicking on Notes on Documentation or Notes on Conference had no effect.

The Business Card Swap is a nice idea, as I rarely remember to print business cards this could be a nice replacement:

businesscard

The Floor Plan section seems standard, good resolution, able to zoom in and out:

floorplan

So all in all an ok app. I wonder how the software landscape will change in 4 more years?

Thanks for reading and enjoy the conference.