Corpus linguistics community news 5

It’s been a long time, I shouldn’t have left you
Without some corpus news to read through
(I know you got corpus soul)

First off, if you are a user of BYU suite of corpus tools do consider helping to correct their corpus of soap operas, should you get to 500 words your name will be in the acknowledgements. Nice.

Next up are some interviews with Alex Boulton on some issues in DDL, Ivor Timmis on his new corpus book for ELT and Andrew Caines on a spoken corpus project.

For those interested in XML tagging there is something on using UK 2015 election forewords to follow a tutorial using rhetorical tagging.

For those interested in multi-word tagging some descriptions, part 1 and part 2 of using one program called AMALGrAM 2.0.

Finally an fyi to check out the latest version of Ted Corpus Search Engine which now has translations and synced transcriptions.

Till next time.

Thanks for reading.

A 2nd tipple of the TED Corpus Search Engine

Here is a short (and no doubt to some obvious) note about working with the TED Corpus Search Engine,TCSE. I was looking at examples of the use of route, there are 126 results. I want my students to go through these and transcribe the 2 different ways the word is said. 126 is however too long, so a simple way to reduce this is to use the n-gram feature.

For 2-grams we get:

1 route to 13
2 the route 12
3 route of 5
4 a route 6
5 this route 4

So I could use route to with strong students with 13 examples that they would have to listen to and this route with weak students as they would just need to examine 4 examples.

FYI TCSE can now play search term immediately with an option to start 10 seconds earlier.

Thanks for reading.

P.S. Do check out the first tipple of the TCSE if you haven’t yet.

A tipple of the TED Corpus Search Engine

Maybe a series of short posts if things pan out 🙂
Two of my classes are learning the phonetic alphabet, they have already been introduced to it, they have had a couple of exercises on it and they have had a go playing with the Cambridge English phonetics focus set of games and activities.

In a bid to keep a low level of revision going the Ted Corpus Search Engine (TCSE) could be useful. Taking the example of neither (borrowed from a Guy Aston workshop on spoken corpora at Lancaster TaLC 11 this summer) I intend to ask them how they think it is spelt phonetically.

Then I will ask them to search for the word in the TCSE and to look at entry 555 – Michelle Obama and then entry 768 David Cameron and get them to see if they can transcribe the phonetic differences (/ni:ðər/ and /naiðə/ respectively).

Update 1:

I used the above in my classes recently and it went very well, it was integrated with another worksheet they were already doing on pronunication and phonetics. I introduced it with Google images of Michelle Obama and David Cameron.

The following are some more words I may try in future classes:

880 Rory Sutherland: Sweat the small stuff UK

1931 Christopher Ryan: Are we designed to be sexual omnivores? US

1911 Yves Morieux: As work gets more complex, 6 rules to simplify Fr

561 Yann Arthus-Bertrand: A wide-angle view of fragile Earth Fr

1768 Didier Sornette: How we can predict the next financial crisis Fr

535 Al Gore: What comes after An Inconvenient Truth? US

1699 Richard Turere: My invention that made peace with lions Kenyan

735 Kiran Sethi: Kids, take charge Ind

1701 Colin Camerer: Neuroscience, game theory, monkeys US

1103 Paul Root Wolpe: It’s time to question bio-engineering  US

2069 Andrew Connolly: What’s the next window into our universe? UK

2067 Martin Rees:Can we prevent the end of the world? UK

2035 Chris Domas: The 1s and 0s behind cyber warfare US

1979 Michel Laberge: How synchronized hammer strikes could generate nuclear fusion Fr

Update 2:

The TCSE puts in a delay of 10 seconds when playing the youtube video, to get youtube to play your search term immediately you need to add in 10s, have a read here by the developer on how to do this.

Update 3:

TCSE plays your search term immediately now with an option to play 10 seconds earlier.