Grassroots language technology: Wiktor Jakubczyc, vocab.today

It’s been a while since the last post on teachers doing it for themselves technology wise. Do check those out if you have not or need a reminder. The teacher/developer who kindly answered questions for this post, Wiktor Jakubczyc, I stumbled across when looking for a github source on vocabulary profilers. And what a find his github pages are.

I think there are good reasons for teaching and education to have a default “inertia” regarding “innovation” (which Wiktor laments in one of his responses) but I won’t discuss this here. Maybe readers may prod me on this in the comments? 😁 I would like to refer to a (pdf) point I’ve made before – that there is a middle ground for teachers to explore regarding grassroots technology.

Anyway enough of my rambling here’s Wiktor and there is a marvelous bonus at the end for all you CALL geeks:

1. Can you explain your background a little?

I’m an English teacher with over 10 years of experience and an IT freelancer. I’ve taught English all over Europe, in London, Moscow, Warsaw, Bratislava, Sevilla and Wrocław, my home town in Poland. Since I was a kid I’ve loved computers – and that was in the ’80s when an Atari couldn’t really do very much. I passionately want teachers to make the most of digital technologies.

2. What was the first tool you designed for learning languages?

The first tool I designed to help students learn English was a dictionary lookup program for Windows, way back in 2007. Back then, there were good dictionaries you could get for your computer, but I wanted to be able to look up a word in many dictionaries at once. That option simply didn’t exist, so I created The Ultimate Dictionary (http://creative.sourceforge.net) . I got great feedback from my students, fellow teachers and friends – they still use it, and they love it! It’s a very rewarding feeling to create something of value for other people, and to be able to give it to them for free.

A few years later, I discovered that another developer, Konstantin Isakov, had the same idea and made an even better dictionary application – GoldenDict. I used his source code as the base for a redesign of my dictionary, now called Nomad Dictionary. Nomad Dictionary now has Windows, Android and MacOS editions, all available to download at http://dictionaries.sf.net.

My second project was a Half a Crossword creator. Half a crossword is a type of communicative activity for ESL classrooms which emphasizes speaking and vocabulary, two key skills in speaking a language. Students get half a crossword each, split evenly between two students, and have to ask each other for missing information and give definitions for the words they have in their crossword. It’s a fantastic way to revise and recycle vocabulary, while practicing the much-needed skills of asking for and giving information. And students love it!

Again, no such tool existed, which is why I decided to create one. I first made a version of Half a Crossword for Windows (http://creative.sourceforge.net) because at the time Delphi was the only language I could program in. I found it immensely useful in my classes – it was a perfect activity to check how many words students knew before moving on to new material. I tried to get other teachers involved, to spread the word and encourage them to use it, but I found a lot of people were resistant. They loved the idea, but few actually decided to use it in their classrooms.

A few years later, thinking that maybe the problem was accessibility – you needed to download a program, install it, write a wordlist in word and then save it… it was a bit complicated – I decided to create an online version written in JavaScript. I posted the code for Half a Crossword Online on GitHub (https://github.com/monolithpl/half-a-crossword). Despite the fact that it wasn’t advertised anywhere, quite a few people found out about it, and two people even contributed code! Teachers I talked to also found the online version easier to use, and came to use them with their classes.

3. What do you think of as a relevant tool?

That’s a very good question, which is to say a very hard question. I think a relevant tool has to be both personally important enough for the creator to design it (especially if it’s a hobby project) at the same time good enough so that other people later also find it useful to them. It’s rare for these two things to coincide.

Another difficulty lies in the fact that the world of teaching, broadly speaking, is averse to innovation. Very few teachers care to experiment with new methodologies, paradigms or teaching tools. There’s extreme inertia. So getting teachers to change their habits and try something new is very challenging, especially when it comes to technology.

Relevant tools, in my mind, would be those that embrace the DOGME/Teaching Unplugged methodology, the Lexical Approach, personalized teaching, the explosion of mobile computing, just to name a few – all the radical new ideas that have appeared in the last 10 years in language teaching. And they would have to be loved by students, teachers and administrators alike.

4. Do you create tools for languages other than English?

I would love to, someday. I simply don’t have the time to do that now. This is a hobby, after all. The language learning tools I create are useful to my students, my colleagues and myself in learning and teaching English, which is what we do everyday. So that is the priority for now.

I hope other people around the world will find the time and be inspired to create tools for their languages. Unfortunately, there is a huge gap between the English-speaking world and the rest of the people out there when it comes to technology: just compare the size of the English Wikipedia versus editions in other languages. The same is true for language data: there are far fewer corpora, frequency wordlists, audiovisual materials etc for languages other than English. There’s lots of catching up to do.

I also think that the world needs a world language, so that we can all start to understand things not just around us, in our local environment, but on a more global level. For that, we need English, so I can understand why most of the interesting developments in language teaching are designed for English students. It’s simply the largest market and user base.

5. What tools are you working on at the moment? What do you have planned for future developments?

Right now I’m working on projects related to wordlists. I have a new version of a Vocabulary Profiler (https://github.com/monolithpl/range.web) almost ready. It’s an app that visualizes word frequency in a text, or in more practical terms tells a teacher how difficult a text is and which words are going to be most challenging for their students. Developing it was an incredible learning experience as I had to figure out how to compress large wordlists so that the app could work on mobile phones and discovered trie algorithms, which are a super clever concept of packing words into a small space. I’d like to mention the groundbreaking work of Paul Nation on teaching and researching vocabulary, especially his Range program (https://www.victoria.ac.nz/lals/about/staff/paul-nation#vocab-programs), which I tried to recreate for the modern web.

My most ambitious project to date is an extension of this work – it’s an app to highlight collocations, chunks etc. in a text called Fraze Finder (https://github.com/monolithpl/fraze-finder). It takes the concept of profiling vocabulary to the next level by analyzing multi-word elements, like phrasal verbs, which students most often struggle with. The idea is to help students and teachers notice collocations, to identify them and understand their importance in written and spoken language. The difficulty here is building a good library of these expressions and accurately finding them (with all their variations) in texts. I have lots of ideas for future projects, which I’ve tried to gather together on my personal website vocab.today (https://vocab.today/teacher). I hope one day to complete them all!

6. Are there any tools (not yours) that you yourself use for learning languages?

Over the years, I’ve tried and experimented with dozens of language learning solutions. Let me focus on three main areas:

Language Management Systems (LMSs) – these are content delivery platforms, basically, websites where teachers upload material for their classes and students do their homework, complete tests, review their progress and exchange messages with one another.

I gave Moodle a try, but it was just horrible to use for both teachers and students, and I think other people agreed with me for it seems to be fading away into a well-deserved oblivion.

Later, I tried Edmodo, which was a lot easier to use, and obviously inspired by Facebook, which was just starting to be the big thing at the time. I ran into numerous limitations using it, and finally, out of sheer frustration, just gave up. It was very pretty on the surface, but you couldn’t do much with it. And students prefered to use Facebook for their day-to-day communication, so it was difficult to make them use something else.

So today, I create Facebook groups for my students and use Google Drive, Forms and Docs to share documents and tests. It’s still not a perfect solution, but it has the advantage of being familiar to everyone and easy to use. Unlike the many solutions I’ve used before, I think these are versatile enough to do the job and are actively being developed and improved.

Flashcards – There are hundreds of apps and websites that help students learn through flashcards. I’ve tried many of them with my students, including Anki (which is a great piece of software). However, I’ve found that Quizlet is the most easy to set up and easy to use. And there’s a huge library of flashcards made by talented teachers around the world available for anyone to use. It’s quite amazing, and it’s free.

Mobile Apps – I’ve also experimented with several dozen different learning tools for mobile phones. This is a very new market, as the iPhone only came out ten years ago. There is currently much hype around apps like DuoLingo, Babbel or Memrise, but personally I found them to be quite boring. The activities are very repetitive, and apart from situations where I would be forced to use them (on a crowded train with nothing else to do), I can’t imagine myself ever using them long-term.

This is still a very experimental field, which is why I find it shocking that the three biggest apps offer just two types of activities: multiple choice or fill-in-the-gap exercises. I would love to see more variety. There’s also the fact that due to their novelty, the claims of effectiveness these apps advertise with is often greatly overstated – just see what happened to all the “brain training” apps like Lumosity which now have to pay multi-million dollar fines for lying to their customers (https://arstechnica.com/science/2016/06/billion-dollar-brain-training-industry-a-sham-nothing-but-placebo-study-suggests/). There’s definitely room for improvement.

7. Any advice for people interested in learning to design such tools?

The most important thing is to have an idea on what to create: something that would be useful for you or your students that doesn’t yet exist, a faster and better way of doing something you do every day or a radical improvement on a tool or solution you currently use.

Programming skills are secondary and you can always find people who can help you out with technical stuff on StackOverflow. I’ve met a few programmers who after completing their studies had no idea what they wanted to create. Knowing what you’d like to create is the key.

It’s much easier to get into hobby development than it was 5 or 10 years ago. GitHub makes it super easy to upload your code and create a website for your project – all for free! It’s also a great way to discover other projects, make use of ready-made components and participate in the open source community by commenting or finding bugs.

JavaScript is one of the easiest programming languages you can learn, and it’s everywhere – on PCs, Macs, iPhones and Androids. With just one language, you can design for almost any device out there – the developments on the technological front are simply amazing.

On the teaching side, I could recommend no better than Scott Thornbury’s excellent article How could SLA research inform EdTech? (https://eltjam.com/how-could-sla-research-inform-edtech) which describes the needs of language learners and offers a list of requirements that should be met in order to create a truly excellent, cutting-edge language learning tool. To my knowledge, no such tool exists. Not by a long shot. It’s a great opportunity for creative minds.

8. Anything you want to add?

Thank you for noticing my work and giving me an opportunity to speak about it. Up until now I’ve been working on my projects almost in secret. It would be amazing if this interview inspired creative young minds to design new tools for language teaching, especially in languages other than English. I hope teachers will discover new tools that will help them teach better with less effort.

Technology has so much to offer in the field of learning languages, and there’s so much innovation to come. I’m looking forward to the bold new ideas of the future. Follow my work at vocab.today or on github!

Many thanks to Wiktor for spending time answering these questions. And here is the bonus link – Wiktor is compiling classic CALL programs that you can run in your browser, how awesome is that?! I am sure Wiktor would be glad to take some suggestions of some classic gems.

Advertisements

Affix knowledge test and word part technique

CAT-WPLT

There is a new online test, the CAT-WPLT (computerized adaptive testing of Word Part Levels Test) to assess students word part knowledge, i.e. prefix, suffix and stems (though the test only uses affixes for receptive use). The (diagnostic) test is composed of three parts – form, meaning and use. The form part presents 1 real affix and 4 distractor affixes for the test user to choose. The meaning part presents 1 correct meaning and 3 distractor meanings and the use part presents 4 parts of speech to match one of these correctly to the affix.

Try out the test – CAT-WPLT.

The online test takes about 10-15mins to complete and results in a nice feedback screen showing how the test taker did on the form, meaning and use of the affixes. There are comparison advanced, intermediate and beginner profiles.

Figure from Mizumoto, Sasao, & Webb (2017) pg. 14

So say you have a profile of a student who shows weakness in form and meaning. What now? Mizumoto,  Sasao, & Webb (2017) suggest giving learners their pdf list of 118 affixes (assuming you don’t need to use the test again). So if your learner is at level 1 for recognizing the form of an affix, the affixes listed as level 2 can be focused on.

Another possibility is a memory technique called the word part technique.

Word part technique
Very simply it is using an already known word which contains the same word stem/root as the new word to be remembered.

More specifically the system Wei and Nation (2013) describe lists very frequent stems i.e. stems which appear in words in the most frequent 2000 words of the BNC. These are then used to learn stems appearing in the remaining 8000 mid-frequency words in the BNC wordlist. For example a high frequency word like visit has the root -vis- which appears in mid-frequency words such as visible, envisage, revise.

Once a form connection is seen between a known high frequency word and a mid-frequency word a meaning connection needs to be made i.e. explaining the form connection. So to explain the word visible we can say visible is something that you can see. Here the explanation uses the meaning of -vis- i.e. see.

(high freq. word) visit -> go to see someone
|
|
(stem)                  vis -> see
|
|
(mid-freq. word)  visible -> something that you can see

According to Wei & Nation (2013) the most difficult step is explaining the connection. Though I think the most difficult is the first step – seeing the connection i.e. the stem/root word. Wei & Nation (2013) encouragingly state that making the connection and explaining it can develop with practice.

 

Screen Shot 2017-09-07 at 2.03.44 PMScreen Shot 2017-09-07 at 2.04.08 PMScreen Shot 2017-09-07 at 2.04.21 PMScreen Shot 2017-09-07 at 2.04.31 PM

Click here to see top 25 word stems taken from Wei & Nation (2013)


They go on to recommend that once students have worked with this technique with the teacher they can go on to use it themselves as a strategy.

The technique’s efficacy is on par with the keyword technique and learners own methods or self-strategies (Wei, 2015). The word part technique has the added benefits that come with the nature of etymology and the history of words.

Thanks for reading.

References

Mizumoto, A., Sasao, Y., & Webb, S. A. (2017). Developing and evaluating a computerized adaptive testing version of the Word Part Levels Test. Language Testing, 0265532217725776.

Wei, Z., & Nation, P. (2013). The word part technique: A very useful vocabulary teaching technique. Modern English Teacher, 22, 12–16.

Wei, Z. (2015). Does teaching mnemonics for vocabulary learning make a difference? Putting the keyword method and the word part technique to the test. Language Teaching Research, 19(1), 43-69.

Learning vocabulary through subs2srs and Anki

This post reports on a way to learn vocabulary using your favorite film or TV show. You need two programs subs2srs and Anki. I first saw the reference to subs2srs via a post by Olya Sergeeva, a great read by the way.

subs2srs allows you to cut up your video file by its subtitles. Then you can use the resulting files to import into Anki. I won’t go into detail about doing this as the user guide for subs2srs does this well. I will just post some screen recordings to demonstrate how it appears as you use it. In my case I am using it learn more conversational and idiomatic French via the TV show Les Revenants.

The first recording shows what happens as you use Anki with your subs2srs cut-up file. Near the end of the recording I demonstrate one of the features of Anki which allows you to hide/bury cards you don’t want to use:

The second recording shows how to browse cards in a deck and tag them for use in a custom deck:

The third video shows the use of a custom deck made from a particular tag:

A post by polyglot Judith Meyer shows how she used it to study Japanese vocabulary. Most of the instructions for subs2srs in that post are dated but further down she has some nice advice on how to use any Anki decks you may make from subs2srs.

I am not sure how efficient this method is since after about a month of occasional use I have only really learned one expression – je peux pas aller plus vite que la musique/I haven’t got wings! But I feel being able to have the audio is helping.

One thing to be aware of is to make backups of your Anki collections you use on your phone otherwise you risk resetting all the cards you’ve been studying when you add say a new film or episode that has been converted by subs2srs onto your mobile version of Anki.

Thanks for reading and feel free with any questions you may have.

Update:

Alternative program to subs2srs for non-windows systems – youtube flashcards

Using BYU Wiki corpus to recycle coursebook vocabulary in a variety of contexts

Recycling vocabulary in a variety of contexts is recommended by the vocabulary literature. Simply going back to texts one has used in a coursebook is an option but it misses the variety of context.

I need to recycle vocabulary from Unit 1 of my TOEIC book, so I take the topics from the table of contents as input to create a wiki corpus.

The main title of Unit 1 in my book is careers, with sub topics of professions, recruitment, training. I could also add in job interview, job fair, temp agency.

Note for more details on various features of the BYU WIKI corpus do see the videos by Mark Davies, for the rest of this post I assume you have some familiarity with these.

So when creating a corpus in BYU WIKI corpus in my Title word(s) search I enter career* to find all titles with career and careers.

Then in the Words in pages box I enter professions, profession, recruitment, training. Note search for plural and 300 as number of pages:

wiki-search-terms
Screenshot 1: corpus search terms

After pressing submit a screen of a list of wiki pages is presented, you can scroll through this to find pages that may be irrelevant to you:

list-wiki-pages
Screenshot 2: wiki pages

After unticking any irrelevant pages press submit. I won’t talk a lot about filtering your corpus build here. As mentioned do make sure to watch Mark Davies series of videos to get more details.

Now you will see your newly created corpus:

my-virtual-corpora
Screenshot 3: my virtual corpora

Tick the Specific radio button:

specific-keys
Screenshot 4: specific key word radio button

and then click the nouns keywords. Skill is the top keyword here which also appears in the wordlist in my book:

career*-noun-key
Screenshot 5: noun keywords

What I am more interested in is verbs so I click that:

career*-verb-key
Screenshot 6: verb keywords

The noun requirement, which by the way does not come from the careers unit, appears in the book wordlist but not the verb. So now I can look at some example uses of the verb require that I could use in class.

One step is to see what collocates with require:

collocates-require
Screenshot 7: collocates of require

Clicking on the top 5 collocates brings up some potential language.

Another interesting use is once you have a number of corpora you can see what word appear most in each corpora. The following screenshots show corpora related to the first 3 units of my book i.e. Careers, Workplaces, Communications:

my-virtual-corpora
Screenshot 8: my virtual corpora

The greyed lines mean those corpora are omitted from my search. This could be a nice exercise where you take some word and get students to see how they are distributed. So for example you may show the distribution of the verb fill:

distribution-fill-my-corpora
Screenshot 9: distribution of verb fill

We see that it appears most in the recruit* corpus. One option now is to get students to predict how the verb is used in that corpus and then click the bar to see some examples.

After this demonstration you can now ask students to guess what words will appear most in the various corpora and do the search for the students to see the resulting graphs.

Hope this has shown how we can use BYU WIKI corpus to recycle vocabulary in different contexts.

Do shoot me any questions as this post may indeed be confusing.

Skylighting: sub-query searching

I was using a press release text from the company of a student recently. He was drawn immediately to two items – We raise the bar every year to remain contemporary; the high standards that we hold ourselves to in our people practices.

We did a bit of work on the use of these two items in the text.

After the session I was thinking that it would have been good for the student to have been able to see other examples of use of the language he had pointed out. That is although the language in the press release was authentic to what extent was it typical and so possibly worth learning by the student?

Skylight offers a sub-query search feature which allows one to see collocating words that can appear with several intervening words and in any position.

For example does the phrase raise the bar always appear in this form or are there other versions?

Enter bar into the search (with corpus selection of ukWaC):

Enter initial search term (bar)
Enter initial search term (bar)

You will get a result screen such as:

Result screen from initial search (bar)
Result screen from initial search (bar)

then enter raise and you will get results such as:

Result screen from second term (raise)
Result screen from second term (raise)

The results show that yes raise the bar is most common form, there are some uses (which can be found through the sort feature) where an adjective such as quality or performance is placed in-between. i.e. raise the quality bar”; raise the performance bar”. An interesting use is with a film that manages to raise an emotional bar.

One could then further filter results by looking for instances of standard or standards (use the | pipe command as an OR operator i.e. standard|standards) and we get uses such as “set up new standards that raise the emissions bar extremely high”

Turning now to we hold ourselves to

first plug in hold
then to
then ourselves|myself|himself|herself|themselves
and then finally standard|standards.

We get such variations as:

“People hold footballers to standards that they wouldn’t dream of
“can law schools hold themselves accountable to other people’s standards”
“Schools that hold parents to account, and that are themselves accountable
“For university law schools to hold themselves accountable to externally generated criteria”

As to what extent a student like mine who was interested in human resources/training issues would be as interested in such sentences is worth asking and exploring. Since the ukWaC corpus samples general web texts we can assume some example uses would interest our learner.

The key import is that this method allows a way to quickly extend a text without relying on one’s wits in the class.

Thanks for reading.

The Pirate Bay AFK – web related lexis

The documentary The Pirate Bay Away from Keyboard was released recently. I used some of the text from the English subtitles in a gapped sentences exercise (most of the film is in Swedish):

1: Half of all BitTorrent tr_____ is coordinated by the Pirate Bay. It’s extreme amounts of tr_____.

2: There are 22-25 million us____ at this very moment.

3: A user is defined as one ongoing up_____ or download.

4: This is the web s____. Data base and search fu____n.

5: The _______ are over there. This little piece is the ______. It’s the world’s biggest ______.

6: Not so many computers,but powerful and well-con_____.

7: How the hell can prosecutor Roswall mix up mega___ and megabyte?

8: Generally speaking, for st_____ you use byte and when you measure speed you use bit.

9: I had a spare l___ which I let him use for the site. It was from British Telecom.

10: So the US government ordered us to re_____ the site. We fought them for a long time before we re_____ed it.

11: After a while we cl____ it d____, when it became too much of a fuss.

12: Two months later Gottfrid needed more ba_____ for the Pirate Bay. I still had that line available.

Unsurprisingly sentence 5 caused the most difficulty, sentence 9 was also tricky. I then showed the film up to the 19 minute 29 seconds mark.

I asked the class to watch the rest of the film and think about two things – Was it a good documentary? and What issues were raised in the film? They should be prepared to discuss this for the next class.

You may be wondering why I chose to use a film that was mainly spoken in Swedish? The average listening comprehension level of my multi-media classes is at A2/B1 and from previous experience asking them to listen to a 1h 22m film in English would be asking too much. The translated titles are simple enough for them to read.

I intend to work with the gapped sentences again in the next class, possibly in the form of a dictogloss.

Key to gapped sentences:

1: traffic

2: users

3: upload

4: server; function

5: tracker

6: configured

7: megabit

8: storage

9: link

10: remove

11: close down

12: bandwidth

Hope you found this activity useful.

Quick cup of COCA – bring to * boil

Image

Click on image above to go to result screen.

Short post to show another example and power of the wildcard asterisk. The image shows the results of comparing /bring to * boil/ in the American COCA and the British BNC.

A tweet  by @AnneHendler asked /In British English, is it considered more acceptable to say “Bring to the boil” or “bring to a boil”?/

@Marie_Sanako replied /I would always say ‘bring to the boil’./, both @cgoodey /I’d bring something to THE boil too!/ and @GemL1 agreed /”bring to the boil” is what I would say./ whilst @michaelegriffin added /if it’s about cooking I can only imagine myself or other am eng users sayin “a boil”. If metaphoric I dont know./.

Update:

I was reminded recently by a twitter exchange with  @rosemerebard that this online workshop is a great primer to using a corpus BYU: BNC & COCA.

One of the things I hope is clear with these Quick cup of COCA posts is that having a clearly stated problem/question will facilitate the search process.