Learning vocabulary through subs2srs and Anki

This post reports on a way to learn vocabulary using your favorite film or TV show. You need two programs subs2srs and Anki. I first saw the reference to subs2srs via a post by Olya Sergeeva, a great read by the way.

subs2srs allows you to cut up your video file by its subtitles. Then you can use the resulting files to import into Anki. I won’t go into detail about doing this as the user guide for subs2srs does this well. I will just post some screen recordings to demonstrate how it appears as you use it. In my case I am using it learn more conversational and idiomatic French via the TV show Les Revenants.

The first recording shows what happens as you use Anki with your subs2srs cut-up file. Near the end of the recording I demonstrate one of the features of Anki which allows you to hide/bury cards you don’t want to use:

The second recording shows how to browse cards in a deck and tag them for use in a custom deck:

The third video shows the use of a custom deck made from a particular tag:

A post by polyglot Judith Meyer shows how she used it to study Japanese vocabulary. Most of the instructions for subs2srs in that post are dated but further down she has some nice advice on how to use any Anki decks you may make from subs2srs.

I am not sure how efficient this method is since after about a month of occasional use I have only really learned one expression – je peux pas aller plus vite que la musique/I haven’t got wings! But I feel being able to have the audio is helping.

One thing to be aware of is to make backups of your Anki collections you use on your phone otherwise you risk resetting all the cards you’ve been studying when you add say a new film or episode that has been converted by subs2srs onto your mobile version of Anki.

Thanks for reading and feel free with any questions you may have.

Using BYU Wiki corpus to recycle coursebook vocabulary in a variety of contexts

Recycling vocabulary in a variety of contexts is recommended by the vocabulary literature. Simply going back to texts one has used in a coursebook is an option but it misses the variety of context.

I need to recycle vocabulary from Unit 1 of my TOEIC book, so I take the topics from the table of contents as input to create a wiki corpus.

The main title of Unit 1 in my book is careers, with sub topics of professions, recruitment, training. I could also add in job interview, job fair, temp agency.

Note for more details on various features of the BYU WIKI corpus do see the videos by Mark Davies, for the rest of this post I assume you have some familiarity with these.

So when creating a corpus in BYU WIKI corpus in my Title word(s) search I enter career* to find all titles with career and careers.

Then in the Words in pages box I enter professions, profession, recruitment, training. Note search for plural and 300 as number of pages:

Screenshot 1: corpus search terms

After pressing submit a screen of a list of wiki pages is presented, you can scroll through this to find pages that may be irrelevant to you:

Screenshot 2: wiki pages

After unticking any irrelevant pages press submit. I won’t talk a lot about filtering your corpus build here. As mentioned do make sure to watch Mark Davies series of videos to get more details.

Now you will see your newly created corpus:

Screenshot 3: my virtual corpora

Tick the Specific radio button:

Screenshot 4: specific key word radio button

and then click the nouns keywords. Skill is the top keyword here which also appears in the wordlist in my book:

Screenshot 5: noun keywords

What I am more interested in is verbs so I click that:

Screenshot 6: verb keywords

The noun requirement, which by the way does not come from the careers unit, appears in the book wordlist but not the verb. So now I can look at some example uses of the verb require that I could use in class.

One step is to see what collocates with require:

Screenshot 7: collocates of require

Clicking on the top 5 collocates brings up some potential language.

Another interesting use is once you have a number of corpora you can see what word appear most in each corpora. The following screenshots show corpora related to the first 3 units of my book i.e. Careers, Workplaces, Communications:

Screenshot 8: my virtual corpora

The greyed lines mean those corpora are omitted from my search. This could be a nice exercise where you take some word and get students to see how they are distributed. So for example you may show the distribution of the verb fill:

Screenshot 9: distribution of verb fill

We see that it appears most in the recruit* corpus. One option now is to get students to predict how the verb is used in that corpus and then click the bar to see some examples.

After this demonstration you can now ask students to guess what words will appear most in the various corpora and do the search for the students to see the resulting graphs.

Hope this has shown how we can use BYU WIKI corpus to recycle vocabulary in different contexts.

Do shoot me any questions as this post may indeed be confusing.

Skylighting: sub-query searching

I was using a press release text from the company of a student recently. He was drawn immediately to two items – We raise the bar every year to remain contemporary; the high standards that we hold ourselves to in our people practices.

We did a bit of work on the use of these two items in the text.

After the session I was thinking that it would have been good for the student to have been able to see other examples of use of the language he had pointed out. That is although the language in the press release was authentic to what extent was it typical and so possibly worth learning by the student?

Skylight offers a sub-query search feature which allows one to see collocating words that can appear with several intervening words and in any position.

For example does the phrase raise the bar always appear in this form or are there other versions?

Enter bar into the search (with corpus selection of ukWaC):

Enter initial search term (bar)
Enter initial search term (bar)

You will get a result screen such as:

Result screen from initial search (bar)
Result screen from initial search (bar)

then enter raise and you will get results such as:

Result screen from second term (raise)
Result screen from second term (raise)

The results show that yes raise the bar is most common form, there are some uses (which can be found through the sort feature) where an adjective such as quality or performance is placed in-between. i.e. raise the quality bar”; raise the performance bar”. An interesting use is with a film that manages to raise an emotional bar.

One could then further filter results by looking for instances of standard or standards (use the | pipe command as an OR operator i.e. standard|standards) and we get uses such as “set up new standards that raise the emissions bar extremely high”

Turning now to we hold ourselves to

first plug in hold
then to
then ourselves|myself|himself|herself|themselves
and then finally standard|standards.

We get such variations as:

“People hold footballers to standards that they wouldn’t dream of
“can law schools hold themselves accountable to other people’s standards”
“Schools that hold parents to account, and that are themselves accountable
“For university law schools to hold themselves accountable to externally generated criteria”

As to what extent a student like mine who was interested in human resources/training issues would be as interested in such sentences is worth asking and exploring. Since the ukWaC corpus samples general web texts we can assume some example uses would interest our learner.

The key import is that this method allows a way to quickly extend a text without relying on one’s wits in the class.

Thanks for reading.

The Pirate Bay AFK – web related lexis

The documentary The Pirate Bay Away from Keyboard was released recently. I used some of the text from the English subtitles in a gapped sentences exercise (most of the film is in Swedish):

1: Half of all BitTorrent tr_____ is coordinated by the Pirate Bay. It’s extreme amounts of tr_____.

2: There are 22-25 million us____ at this very moment.

3: A user is defined as one ongoing up_____ or download.

4: This is the web s____. Data base and search fu____n.

5: The _______ are over there. This little piece is the ______. It’s the world’s biggest ______.

6: Not so many computers,but powerful and well-con_____.

7: How the hell can prosecutor Roswall mix up mega___ and megabyte?

8: Generally speaking, for st_____ you use byte and when you measure speed you use bit.

9: I had a spare l___ which I let him use for the site. It was from British Telecom.

10: So the US government ordered us to re_____ the site. We fought them for a long time before we re_____ed it.

11: After a while we cl____ it d____, when it became too much of a fuss.

12: Two months later Gottfrid needed more ba_____ for the Pirate Bay. I still had that line available.

Unsurprisingly sentence 5 caused the most difficulty, sentence 9 was also tricky. I then showed the film up to the 19 minute 29 seconds mark.

I asked the class to watch the rest of the film and think about two things – Was it a good documentary? and What issues were raised in the film? They should be prepared to discuss this for the next class.

You may be wondering why I chose to use a film that was mainly spoken in Swedish? The average listening comprehension level of my multi-media classes is at A2/B1 and from previous experience asking them to listen to a 1h 22m film in English would be asking too much. The translated titles are simple enough for them to read.

I intend to work with the gapped sentences again in the next class, possibly in the form of a dictogloss.

Key to gapped sentences:

1: traffic

2: users

3: upload

4: server; function

5: tracker

6: configured

7: megabit

8: storage

9: link

10: remove

11: close down

12: bandwidth

Hope you found this activity useful.

Quick cup of COCA – bring to * boil


Click on image above to go to result screen.

Short post to show another example and power of the wildcard asterisk. The image shows the results of comparing /bring to * boil/ in the American COCA and the British BNC.

A tweet  by @AnneHendler asked /In British English, is it considered more acceptable to say “Bring to the boil” or “bring to a boil”?/

@Marie_Sanako replied /I would always say ‘bring to the boil’./, both @cgoodey /I’d bring something to THE boil too!/ and @GemL1 agreed /”bring to the boil” is what I would say./ whilst @michaelegriffin added /if it’s about cooking I can only imagine myself or other am eng users sayin “a boil”. If metaphoric I dont know./.


I was reminded recently by a twitter exchange with  @rosemerebard that this online workshop is a great primer to using a corpus BYU: BNC & COCA.

One of the things I hope is clear with these Quick cup of COCA posts is that having a clearly stated problem/question will facilitate the search process.

Quick cup of COCA – synonym brackets equals

Click image to see results of search.

The class was looking at phrasal verbs and one of the example sentences had the word “chimney”, someone asked what are synonyms to chimney. I was stumped but a quick cuppa COCA came to the rescue. In order of most frequent to least COCA said: Pipe, stack, chimney, funnel, conduit, flue, smokestack. From this list I suggested smokestack and flue. A student added helpfully that stack is the word used in industry.

Bonus points to readers – can you guess the phrasal verb used in the example sentence with “chimney”?

Quick cup of COCA – wildcard asterisk

Click image to see results of search.

This may or may not turn into a series of short posts on my experience of trying to use the COCA corpus in class. Inspired by the comments to this post by Kevin Stein/@kevchanwow.

A student in my TOEIC class, when we were looking at adjective endings -ED and -ING, asked what was the difference between “unmotivated” and “demotivated”. I replied that demotivated describes someone after some experience whereas unmotivated is a general state of being. I wasn’t too sure if that was sufficient so whilst the class was engaged in the following part of the lesson I used the wildcard asterisk, see image above.

And found out that the instances of “demotivated” were pretty low compared to “unmotivated”. I only transmitted the frequency information to the said student. If I had more time and a projector hooked up to the computer I would have looked at the example sentences in each case.

Do let me know of any searches you have done in class.