Guy Aston talks speech corpora

I had the pleasure of chatting to Guy Aston as he was staying in Paris on his way back to Italy, where he works at the University of Bologna. Guy has been an active researcher in corpora over the years. Here he recollects one significant event  that encouraged him to pursue his interest in corpora and mentions his current area of investigation (best heard using headphones):

Regular readers may know that I have been using the TED Corpus Search Engine a few times recently to get my students to work on phonetic transcriptions. Multi-media corpora offer the possibilities to examine the prosodic features of language and this is what interests Guy with speech corpora.

For example the phrase Thank you was found to have a falling tone most of the time and frequently occurring phrases such as don’t you, last year, I don’t know and I don’t care are expected to have a fast rhythm (examples taken from The prosody of formulaic expressions in the IBM Lancaster Spoken English Corpus by Phoebe Lin).

Guy went on to detail some requirements and challenges involved when setting up speech corpora:

In the next audio Guy gives some examples of a learner using a speech corpus:

The following phone camera video of Guy’s TED speech corpus using Mike Scott’s Wordsmith (version 5 or later) illustrates listening to concordances of matter of fact:

Finally I asked Guy the old favourite about the misplaced early optimism of using corpuses in the language classroom:

Guy hinted that a version of his corpus may be available for AntConc (it is currently only compatible with WordSmith) but at the same time hinting not to hold your breath waiting for one :).

Once again thanks to Guy for sharing some of his current work. Check out some of his publications.

Do also check out an interview with another corpus linguist Costas Gabrielatos.

Thanks for reading.

A 2nd tipple of the TED Corpus Search Engine

Here is a short (and no doubt to some obvious) note about working with the TED Corpus Search Engine,TCSE. I was looking at examples of the use of route, there are 126 results. I want my students to go through these and transcribe the 2 different ways the word is said. 126 is however too long, so a simple way to reduce this is to use the n-gram feature.

For 2-grams we get:

1 route to 13
2 the route 12
3 route of 5
4 a route 6
5 this route 4

So I could use route to with strong students with 13 examples that they would have to listen to and this route with weak students as they would just need to examine 4 examples.

FYI TCSE can now play search term immediately with an option to start 10 seconds earlier.

Thanks for reading.

P.S. Do check out the first tipple of the TCSE if you haven’t yet.

A tipple of the TED Corpus Search Engine

Maybe a series of short posts if things pan out 🙂
Two of my classes are learning the phonetic alphabet, they have already been introduced to it, they have had a couple of exercises on it and they have had a go playing with the Cambridge English phonetics focus set of games and activities.

In a bid to keep a low level of revision going the Ted Corpus Search Engine (TCSE) could be useful. Taking the example of neither (borrowed from a Guy Aston workshop on spoken corpora at Lancaster TaLC 11 this summer) I intend to ask them how they think it is spelt phonetically.

Then I will ask them to search for the word in the TCSE and to look at entry 555 – Michelle Obama and then entry 768 David Cameron and get them to see if they can transcribe the phonetic differences (/ni:ðər/ and /naiðə/ respectively).

Update 1:

I used the above in my classes recently and it went very well, it was integrated with another worksheet they were already doing on pronunication and phonetics. I introduced it with Google images of Michelle Obama and David Cameron.

The following are some more words I may try in future classes:

880 Rory Sutherland: Sweat the small stuff UK

1931 Christopher Ryan: Are we designed to be sexual omnivores? US

1911 Yves Morieux: As work gets more complex, 6 rules to simplify Fr

561 Yann Arthus-Bertrand: A wide-angle view of fragile Earth Fr

1768 Didier Sornette: How we can predict the next financial crisis Fr

535 Al Gore: What comes after An Inconvenient Truth? US

1699 Richard Turere: My invention that made peace with lions Kenyan

735 Kiran Sethi: Kids, take charge Ind

1701 Colin Camerer: Neuroscience, game theory, monkeys US

1103 Paul Root Wolpe: It’s time to question bio-engineering  US

2069 Andrew Connolly: What’s the next window into our universe? UK

2067 Martin Rees:Can we prevent the end of the world? UK

2035 Chris Domas: The 1s and 0s behind cyber warfare US

1979 Michel Laberge: How synchronized hammer strikes could generate nuclear fusion Fr

Update 2:

The TCSE puts in a delay of 10 seconds when playing the youtube video, to get youtube to play your search term immediately you need to add in 10s, have a read here by the developer on how to do this.

Update 3:

TCSE plays your search term immediately now with an option to play 10 seconds earlier.