Grassroots language technology: Adam Leskis, grammarbuffet.org

Language learning technology can be so much more than what commercial groups are offering right now. The place to look is to independent developers and teachers who are innovating in this area. Adam Leskis is one such person and here he discusses his views and projects.

1. Can you tell us a little of your background.

I started out in my first career as an English teacher, and it was clear to me that there were better ways we could both create and distribute digital materials for our students. As an example, during my last year of professional teaching (2015), the state of cutting edge tech integration was taking a powerpoint from class and uploading it to youtube.


What struck me in particular was the way in which technology was being used primarily only in a capacity to reproduce traditional classroom methods of input rather than actually taking advantage of the advanced capabilities of the digital medium. I saw paper handout being replaced by uploaded PDFs, classroom discussions replaced by online forums, and teacher-fronted lectures replaced by videos of teachers speaking.


I knew I wanted to at least try to do something about it, so I set off teaching myself how to use the tools to create things on the internet. I eventually got good enough to be hired to do web development full time, and that’s what I’ve been doing ever since.

2. In what ways do you feel technology can help with learning languages?

Obviously, given the very social nature of education and human language use, technology could never fully replace a teacher, and so this isn’t really what I’m setting out to do. Where I see technology being able to make an enormous impact, though, is in its ability to automate and scale a lot of the things on the periphery that language learning involves.


As an example, vocabulary is a very important component to being able to use and understand language. Thankfully, we now have the insights from corpus-based methods to help us identify which vocabulary items deserve primary focus, and it’s a fairly straightforward task to create materials including these.


However, what this means in practice is either students need to pay for expensive course books containing materials created with a corpus-informed approach to vocabulary, or the teachers and students themselves need to spend time creating these materials. Course books tend to be very expensive, and even those which come with online materials aren’t updated very frequently. Teachers and students creating their own materials are left to scour the internet for items to then analyze and filter for appropriate vocabulary inclusion, and then beyond that they need to construct materials to target the particular skill areas they would like to use the vocabulary for (eg, writing, listening), and which target the authentic contexts they are interested in, which is a very time-consuming manual process.


Technology has the ability to address both of these concerns (lack of updates and requirements of time). As one example, I created a very simple web app that pulls in content from the writing prompts sub-reddit (https://www.reddit.com/r/WritingPrompts/) and uses it to help students work on identifying appropriate articles (a/an/the) to accompany nouns and noun phrases. The content is accessed in real time when the student is using the application, and given the fast turnover in this particular sub-reddit, this means that using it once a day would incorporate completely different content, essentially forming a completely new set of activities.
One of the other advantages to this approach is the automated feedback available to the user. So in essence, it’s a completely automated system to that uses authentic materials (created largely by native speakers for native speaker consumption) to instantly generate and assess activities focused on one specific learning objective.


The approach does still have its shortcomings, in that this particular system is just finding all the articles and replacing them with a selection drop-down, so it’s only able to give feedback on whether the user’s selection is the same as the original article. Also, since this is a very informal genre, the language used might not be suitable for all ages of users.


3. What are your current projects?


I wish I had more time do work on these, since I currently only have early mornings and commuting time on the train to use for side projects, but there are a few things I’m working on that I’m really excited about.


Now that I have one simple grid-based game up and running (https://www.grammarbuffet.org/rhyme-game/), I’m thinking about how I can re-use that same user interface to target other skills. If, instead of needing to tap on the words that rhyme, we could just have the users say them, that would be a much more authentic way to assess whether the user is able to “do something” with their knowledge of rhymes. There is an HTML5 Speech API that I’ve been meaning to play around with, so that could be a potential way to create an alternate version based on actual speaking skills rather than just reading skills.


Another permutation of the grid-based game template would be integrating word stress instead of rhymes. I’m currently trying to get a good dataset containing word stress information for all the words in the Academic Word List (Coxhead, 2000), which I suppose is a bit dated now as a corpus-based vocabulary list, but it was my first introduction to the power of a corpus approach, and so I’ve always wanted to use it to generate materials on the web. The first version of this will probably also just involve seeing the word and using stress knowledge to tap it, rather than speaking, but I’m also imagining how we could use the capabilities of mobile devices to allow the user to shake or just move their phone up and down to give their answers on word stress. Once that’s up and running it’s  very simple to incorporate more modern corpus-based vocabulary lists (eg, the Academic Spoken Words List, 2017). Moreover, since this is all open source, anyone could adapt it for their particular vocabulary needs and deploy a custom web app via tech like Netlify.


Beyond these simple games, I’m also starting to work on a way to take authentic texts (possibly from a more academic genre on reddit like /r/science or text of articles on arXiv) to create cloze test types of materials using the AWL. The user would need to supply the words instead of select, which is a much more authentic assessment of their ability to understand and actually use these words in written English.


4. I really like the idea of offline access, how can people interested in this learn more?


The technology that enables this is currently referred to as Progressive Web Apps (PWAs), and relies on the technology of Web Workers. Essentially, because website development relies on javascript, we’re able to put javascript processes between the user’s browser and the network to intercept network requests and just return things that have already been downloaded. So for applications where all the data is included in the initial page load, this means that the entire website will work offline.


It’s a very relevant concept for our users who either have very unreliable network access, or even relatively expensive network costs. If we’re discussing applications that users engage in every single day, the network access becomes non-trivial, especially if it’s using the old website model of full page reload on every change in the view, rather than a modern single page app, written in either Angular or React. So absolutely, I would say it matters whether modern learning materials are using the latest technology to enable all of these enhancements to traditional webpages.

Much of this movement towards “offline-first” is informed by the JAMstack, which itself is a movement towards static sites that are deployable without any significant backend resources. This speaks to one of the goals of the micromaterials movement, which is the separation of getting that data from actually doing something with it in the web application. One early attempt in terms of setting up a backend API to be consumed is https://micromaterials.org, which just returns sentences from the WritingPrompts subreddit. It’s admittedly very crude (and even written in python 2, yuck!), but shows what could eventually be a model for data services that could feed into front-end micromaterials apps.


 5. Ideas/Plans for the future?


These disadvantages are a lot more obvious if this remains one of only a few such applications, but imagine if there were hundreds or even thousands of these forming something much more like an ecosystem. And then extrapolate that further to imagine thousands of backend server-side APIs for each conceivable genre of English enabling a multitiude of frontend applications to consume the data and create materials for different learners. As soon as you have one server-side service providing data on AWL words, that allows any number of web applications to consume and transform that data into activities.


The plan all along was not for me to create all of these applications, but to inspire others to begin creating similar type of micromaterials. It hasn’t yet caught on, and clearly, expecting teachers to take up this kind of development is not sustainable. I’m hoping that other developers see the value in these and join the movement.


In a sense, the sever-side API’s are a bigger prerequisite to getting this whole thing off the ground, so I’m very happy to work with any backend developers on what we need going forward, but I’m also going to continue developing things myself until we have a big enough community to take over.


I think whether all of these micromaterials exist under the umbrella of one single sign-on with tracking and auditing is beyond the scope of where we’re currently at, though I’m imagining a world where users could initiate their journey into the service, take a simple test involving all four of the main skills (reading, writing, speaking, and listening), and then be recommended a slew of micromaterials to help them out. 


For some users that might focus more on the reading and writing components, whereas for others that might focus more on the speaking and listening ones. The barrier to this currently being available is not at all significant and just involves getting development time invested in crating the materials. If I had them all created right now, I would be able to deploy them today with modern tooling like Netlify.


The problem is more one of availability and time, and I’m more than happy to work with other developers and teachers to bring this closer to a reality for our students.

Thanks for reading and many thanks to Adam for sharing his time on this blog; you can follow Adam on his blog [https://micromaterialsblog.wordpress.com/] and on twitter @BaronVonLeskis.

Please do read the other posts in the Grassroots language technology series if you have not done so already.

Grassroots language technology: Fred Lieutaud, FLGames, Planet Alert

I recently used the soccer game from this open source set of language games designed by Fred Lieutand – FLGames. Worked really well in a revision class, I often forget how competitive my first year engineering students are.

With no further ado here is Fred talking about technology and language learning. Many thanks to Fred and if you are interested in talking about this area do get in touch.

1. Can you share some of your background?
I’m a French teacher of English working in a middle-school in the North East of France. I’ve been teaching in the same school for 18 years now. I am interested in computers, but mostly in knowledge sharing through open-source licenses and I like creating things, so I started developing my own tools to teach.

2. You mentioned your current project, Planet Alert, can you explain
that a little?

Planet Alert is the answer I’ve found so far to the problem of student’s motivation. One of my goals is to have the kids take pleasure in coming to class and learning English. I have a feeling that this is possible through the use of ‘games’ (hence my FLGames – sources on GitHub).

Planet Alert is then a sort of ‘game’ providing class managing tools and trying to keep in mind as much as possible that technology should serve the classroom and help students improving their skills. If it doesn’t fulfill these goals, it shouldn’t be used in class.

In Planet Alert, students have their own avatar, and they need to take care of it to help the team (i.e. the class) succeed in the ‘game’. The scenario is not that interesting in itself : Humans wanted to conquer Mars, but Martians were first: they have invaded the Earth, and they have emptied human brains. To resist and free the planet, humans have to re-learn a language (English !).

The game is strongly connected to the classroom in many ways. Lots of ‘real’ actions have an effect on the avatars. Participation, group work, individual exercises, helping other kids. Each positive action increases player’s experience (XP), but also increases his or her gold coins (GC). Each negative action causes health points loss and might also cause GC loss. Thanks to the GC, a player can free places throughout the world (famous monuments – the goal is to have them developing geographical skills), free people, buy equipment (to earn even more), buy protections (to lose less than normal), buy potions (no homework for 1 lesson, changing seat in class for 1 lesson, assisting the teacher), donate to another player (to help him buy a health potion, for example). Once bought, players get the element to stick in their copybook and scores are updated for real in the classroom and on the website.

I try to encourage team work with special elements (group items) such as the Memory helmet or the Book of Knowledge : the first is a helmet giving access to online exercises (created inside Planet Alert), the second gives access to lessons that can be copied in the copybook (to validate an extra-homework, which gets credited with extra XP and extra GC). This gives also the kids a possibility to work outside of the classroom and revise vocabulary or go a little further than what has been done in class. For students not having an easy internet access, they can also do extra-work in their copybook. When shown in class, they get credited of a positive action.

At the beginning of the lesson, I often check the ‘Main Office’ page so we have the recent news and discuss things (someone needs help, monuments). An exercise in class becomes a ‘Group mission’, a test is a ‘Monster Attack’ and so on. Most things are related to the ‘game’.

Some roles exist : ‘Ambassadors’ for players having 10 positive actions in a row, ‘Captains’ for players having the best karma in each group. This is useful in class, for example to start an activity : Captains first !

Anyway, I guess you get the picture. It’s hard to be concise since Planet Alert offers many possibilities. It is really a way to manage class differently. Teachers can also generate reports over selected periods and see who has done extra-work, who has forgotten their material, who has participated. This is a great help for parents’ meetings.

Well, I could go on for hours about everything that is behind this website. But from my own (much biased !) point of view, the results are encouraging. If you want to have a look, the official website is https://planetalert.tuxfamily.org.

3. How do you decide on whether to use technology or not in class?
From what I’ve answered from the preceding question, you can imagine that using computers in the classroom is often a necessity for me. Although my focus is not to use the tool in itself for the sake of using it. I want to use it to share with the class. It has to prove its added value: either in helping communication, or in helping students learn. Planet Alert is an example of a common sharing, but the FLGames are another example for helping memorizing (Soccer for increasing speed, Grammar Gamble to improve the written skills, Car Race to encourage group work and cooperation). I believe technology in class should always be a means to promote real interaction. It should trigger some sort of desire to work, to speak, to get involved.

4. What kinds of tools (apart from your own) have you found most useful?
As you can see, I mostly use my own tools. But I also use OpenBoard to manage all my documents on my interactive whiteboard. I exclusively use open-source things for many years now and that is  something very important for me. With Planet Alert, I try to initiate students to open-source licenses : they have already drawn some of the monsters used in the game and accepted to share them on Open Clipart Library :). Other important aspects are the possibility to customize the tools and the ability to do so quickly (I like working with simple .txt files as data source).

5. Anything else you would like to comment on about technology in language learning?
I have a feeling it would be hard to do without technology when teaching, but this is a personal opinion. It is fundamental to understand that teaching relies much more on the teacher than on technology ! Some teachers are not ‘techies’ but they still do extraordinary work. I think a teacher has to find his or her way of teaching. And all sorts of teaching may work !

 

Practice in second language learning – interview with the editor

I was working with an individual student at about A2 level a few weeks back. Her speaking skills are relatively weak compared to her listening skills. I decided some job related drilling would be appropriate. As she was going through the drill I was hesitating about how much would be of use. Before the advent of the modern communicative approach, practice in language teaching was often associated with such mechanical type activities. And such exercises have been criticized as using decontextualized and inauthentic language. So on this point (decontextualised/inauthentic language) I was more confident (as the student was using example language related to her work) than on the value of the drilling i.e. repetitive production of language.

In a new book edited by Christian Jones – Practice in second language learning, practice is defined broadly as “specific activities in the second language, engaged in systematically, deliberately, with the goal of developing knowledge of and skills in the second language”. Although there is no explicit discussion on drilling the chapters within do cover many interesting issues related to practice.

Christian Jones kindly answered some questions about the book:

1. What made you decide there was a need for this book at this time?
Practice is a central part of second language teaching and learning in many contexts and yet remains somewhat under-researched. This seems something of a gap in the literature. Teachers and researchers need evidence about what seems to work and what doesn’t in various contexts and with different language areas/skills. There has not been a volume focused on this area since Robert DeKeyser’s book in 2007 and we wanted to add research to the field.

2. What would readers get from this book that they wouldn’t from DeKeyser 2007?
The DeKeyser book is, in my view, a very important contribution to our field. Robert DeKeyser was kind enough to add a foreword to this volume as we wanted to acknowledge his important work in this area. In our book, we have tried to explore practice as we might find it in classrooms, online and in periods of study abroad. We wanted to research practice in different second languages, contexts and using different reseach designs and we hope this will be of interest to a variety of teachers and researchers.

3. The definition given in the book for practice is described as “broadly defined”. What would a more narrowly defined version say?
A narrowly defined version of practice might view it something tied to a particular framework such as PPP. In fact, practice forms a part of many types of methodology. For example, in the TBLT literature, task repetition is undoubtedly a form of practice. A narrowly defined version might view it as something connected to learner output. In fact, we can and do talk of receptive and productive practice. A narrow version of practice might view it as connected only to skill building theories of second language acquisition but we can link it to several others, including the noticing hypothesis and input processing.

4. What in your view is the most outstanding question on the topic of practice (both for teaching and research)?
There are several! But here is one. Chapter one by Mike McCarthy and Jeanne McCarten makes the point that practising conversation and speaking practice are not the same. CLT often features activities we can term ‘speaking practice’ but it is something of a stretch to think that typical activities such as information gaps etc (as helpful as they are in some ways) allow learners to practise conversations. In order to develop conversational skills, learners will need to practise aspects of conversation such as good listenership and linking their turn to another speaker. We need to investigate ways to practise these things. One way is to research the effectiveness of an Illustration-Interaction-Induction (III) framework which McCarthy and McCarten suggest can be useful for practising aspects of conversation. Such research might be undertaken by comparing III to other methodologies.

I have yet to form a definite opinion on drilling but having read only the first two chapters of the book I hope any future opinion on drills and practice in general to be better informed.

Thanks for reading and do note I was kindly sent a review copy of the book. But don’t hold your breath for a proper review : )

Grassroots language technology: Wiktor Jakubczyc, vocab.today

It’s been a while since the last post on teachers doing it for themselves technology wise. Do check those out if you have not or need a reminder. The teacher/developer who kindly answered questions for this post, Wiktor Jakubczyc, I stumbled across when looking for a github source on vocabulary profilers. And what a find his github pages are.

I think there are good reasons for teaching and education to have a default “inertia” regarding “innovation” (which Wiktor laments in one of his responses) but I won’t discuss this here. Maybe readers may prod me on this in the comments? 😁 I would like to refer to a (pdf) point I’ve made before – that there is a middle ground for teachers to explore regarding grassroots technology.

Anyway enough of my rambling here’s Wiktor and there is a marvelous bonus at the end for all you CALL geeks:

1. Can you explain your background a little?

I’m an English teacher with over 10 years of experience and an IT freelancer. I’ve taught English all over Europe, in London, Moscow, Warsaw, Bratislava, Sevilla and Wrocław, my home town in Poland. Since I was a kid I’ve loved computers – and that was in the ’80s when an Atari couldn’t really do very much. I passionately want teachers to make the most of digital technologies.

2. What was the first tool you designed for learning languages?

The first tool I designed to help students learn English was a dictionary lookup program for Windows, way back in 2007. Back then, there were good dictionaries you could get for your computer, but I wanted to be able to look up a word in many dictionaries at once. That option simply didn’t exist, so I created The Ultimate Dictionary (http://creative.sourceforge.net) . I got great feedback from my students, fellow teachers and friends – they still use it, and they love it! It’s a very rewarding feeling to create something of value for other people, and to be able to give it to them for free.

A few years later, I discovered that another developer, Konstantin Isakov, had the same idea and made an even better dictionary application – GoldenDict. I used his source code as the base for a redesign of my dictionary, now called Nomad Dictionary. Nomad Dictionary now has Windows, Android and MacOS editions, all available to download at http://dictionaries.sf.net.

My second project was a Half a Crossword creator. Half a crossword is a type of communicative activity for ESL classrooms which emphasizes speaking and vocabulary, two key skills in speaking a language. Students get half a crossword each, split evenly between two students, and have to ask each other for missing information and give definitions for the words they have in their crossword. It’s a fantastic way to revise and recycle vocabulary, while practicing the much-needed skills of asking for and giving information. And students love it!

Again, no such tool existed, which is why I decided to create one. I first made a version of Half a Crossword for Windows (http://creative.sourceforge.net) because at the time Delphi was the only language I could program in. I found it immensely useful in my classes – it was a perfect activity to check how many words students knew before moving on to new material. I tried to get other teachers involved, to spread the word and encourage them to use it, but I found a lot of people were resistant. They loved the idea, but few actually decided to use it in their classrooms.

A few years later, thinking that maybe the problem was accessibility – you needed to download a program, install it, write a wordlist in word and then save it… it was a bit complicated – I decided to create an online version written in JavaScript. I posted the code for Half a Crossword Online on GitHub (https://github.com/monolithpl/half-a-crossword). Despite the fact that it wasn’t advertised anywhere, quite a few people found out about it, and two people even contributed code! Teachers I talked to also found the online version easier to use, and came to use them with their classes.

3. What do you think of as a relevant tool?

That’s a very good question, which is to say a very hard question. I think a relevant tool has to be both personally important enough for the creator to design it (especially if it’s a hobby project) at the same time good enough so that other people later also find it useful to them. It’s rare for these two things to coincide.

Another difficulty lies in the fact that the world of teaching, broadly speaking, is averse to innovation. Very few teachers care to experiment with new methodologies, paradigms or teaching tools. There’s extreme inertia. So getting teachers to change their habits and try something new is very challenging, especially when it comes to technology.

Relevant tools, in my mind, would be those that embrace the DOGME/Teaching Unplugged methodology, the Lexical Approach, personalized teaching, the explosion of mobile computing, just to name a few – all the radical new ideas that have appeared in the last 10 years in language teaching. And they would have to be loved by students, teachers and administrators alike.

4. Do you create tools for languages other than English?

I would love to, someday. I simply don’t have the time to do that now. This is a hobby, after all. The language learning tools I create are useful to my students, my colleagues and myself in learning and teaching English, which is what we do everyday. So that is the priority for now.

I hope other people around the world will find the time and be inspired to create tools for their languages. Unfortunately, there is a huge gap between the English-speaking world and the rest of the people out there when it comes to technology: just compare the size of the English Wikipedia versus editions in other languages. The same is true for language data: there are far fewer corpora, frequency wordlists, audiovisual materials etc for languages other than English. There’s lots of catching up to do.

I also think that the world needs a world language, so that we can all start to understand things not just around us, in our local environment, but on a more global level. For that, we need English, so I can understand why most of the interesting developments in language teaching are designed for English students. It’s simply the largest market and user base.

5. What tools are you working on at the moment? What do you have planned for future developments?

Right now I’m working on projects related to wordlists. I have a new version of a Vocabulary Profiler (https://github.com/monolithpl/range.web) almost ready. It’s an app that visualizes word frequency in a text, or in more practical terms tells a teacher how difficult a text is and which words are going to be most challenging for their students. Developing it was an incredible learning experience as I had to figure out how to compress large wordlists so that the app could work on mobile phones and discovered trie algorithms, which are a super clever concept of packing words into a small space. I’d like to mention the groundbreaking work of Paul Nation on teaching and researching vocabulary, especially his Range program (https://www.victoria.ac.nz/lals/about/staff/paul-nation#vocab-programs), which I tried to recreate for the modern web.

My most ambitious project to date is an extension of this work – it’s an app to highlight collocations, chunks etc. in a text called Fraze Finder (https://github.com/monolithpl/fraze-finder). It takes the concept of profiling vocabulary to the next level by analyzing multi-word elements, like phrasal verbs, which students most often struggle with. The idea is to help students and teachers notice collocations, to identify them and understand their importance in written and spoken language. The difficulty here is building a good library of these expressions and accurately finding them (with all their variations) in texts. I have lots of ideas for future projects, which I’ve tried to gather together on my personal website vocab.today (https://vocab.today/teacher). I hope one day to complete them all!

6. Are there any tools (not yours) that you yourself use for learning languages?

Over the years, I’ve tried and experimented with dozens of language learning solutions. Let me focus on three main areas:

Language Management Systems (LMSs) – these are content delivery platforms, basically, websites where teachers upload material for their classes and students do their homework, complete tests, review their progress and exchange messages with one another.

I gave Moodle a try, but it was just horrible to use for both teachers and students, and I think other people agreed with me for it seems to be fading away into a well-deserved oblivion.

Later, I tried Edmodo, which was a lot easier to use, and obviously inspired by Facebook, which was just starting to be the big thing at the time. I ran into numerous limitations using it, and finally, out of sheer frustration, just gave up. It was very pretty on the surface, but you couldn’t do much with it. And students prefered to use Facebook for their day-to-day communication, so it was difficult to make them use something else.

So today, I create Facebook groups for my students and use Google Drive, Forms and Docs to share documents and tests. It’s still not a perfect solution, but it has the advantage of being familiar to everyone and easy to use. Unlike the many solutions I’ve used before, I think these are versatile enough to do the job and are actively being developed and improved.

Flashcards – There are hundreds of apps and websites that help students learn through flashcards. I’ve tried many of them with my students, including Anki (which is a great piece of software). However, I’ve found that Quizlet is the most easy to set up and easy to use. And there’s a huge library of flashcards made by talented teachers around the world available for anyone to use. It’s quite amazing, and it’s free.

Mobile Apps – I’ve also experimented with several dozen different learning tools for mobile phones. This is a very new market, as the iPhone only came out ten years ago. There is currently much hype around apps like DuoLingo, Babbel or Memrise, but personally I found them to be quite boring. The activities are very repetitive, and apart from situations where I would be forced to use them (on a crowded train with nothing else to do), I can’t imagine myself ever using them long-term.

This is still a very experimental field, which is why I find it shocking that the three biggest apps offer just two types of activities: multiple choice or fill-in-the-gap exercises. I would love to see more variety. There’s also the fact that due to their novelty, the claims of effectiveness these apps advertise with is often greatly overstated – just see what happened to all the “brain training” apps like Lumosity which now have to pay multi-million dollar fines for lying to their customers (https://arstechnica.com/science/2016/06/billion-dollar-brain-training-industry-a-sham-nothing-but-placebo-study-suggests/). There’s definitely room for improvement.

7. Any advice for people interested in learning to design such tools?

The most important thing is to have an idea on what to create: something that would be useful for you or your students that doesn’t yet exist, a faster and better way of doing something you do every day or a radical improvement on a tool or solution you currently use.

Programming skills are secondary and you can always find people who can help you out with technical stuff on StackOverflow. I’ve met a few programmers who after completing their studies had no idea what they wanted to create. Knowing what you’d like to create is the key.

It’s much easier to get into hobby development than it was 5 or 10 years ago. GitHub makes it super easy to upload your code and create a website for your project – all for free! It’s also a great way to discover other projects, make use of ready-made components and participate in the open source community by commenting or finding bugs.

JavaScript is one of the easiest programming languages you can learn, and it’s everywhere – on PCs, Macs, iPhones and Androids. With just one language, you can design for almost any device out there – the developments on the technological front are simply amazing.

On the teaching side, I could recommend no better than Scott Thornbury’s excellent article How could SLA research inform EdTech? (https://eltjam.com/how-could-sla-research-inform-edtech) which describes the needs of language learners and offers a list of requirements that should be met in order to create a truly excellent, cutting-edge language learning tool. To my knowledge, no such tool exists. Not by a long shot. It’s a great opportunity for creative minds.

8. Anything you want to add?

Thank you for noticing my work and giving me an opportunity to speak about it. Up until now I’ve been working on my projects almost in secret. It would be amazing if this interview inspired creative young minds to design new tools for language teaching, especially in languages other than English. I hope teachers will discover new tools that will help them teach better with less effort.

Technology has so much to offer in the field of learning languages, and there’s so much innovation to come. I’m looking forward to the bold new ideas of the future. Follow my work at vocab.today or on github!

Many thanks to Wiktor for spending time answering these questions. And here is the bonus link – Wiktor is compiling classic CALL programs that you can run in your browser, how awesome is that?! I am sure Wiktor would be glad to take some suggestions of some classic gems.

Successful Spoken English – interview with authors

The following is an email interview with the authors, Christian Jones, Shelley Byrne, Nicola Halenko, of the recent Routledge publication Successful Spoken English: Findings from Learner Corpora. Note that I have not yet read this (waiting for a review copy!).

Successful Spoken English

1. Can you explain the origins of the book?

We wanted to explore what successful learners do when they speak and in particular learners from B1-C1 levels, which are, we feel, the most common and important levels. The CEFR gives “can do” statements at each level but these are often quite vague and thus open to interpretation. We wanted to discover what successful learners do in terms of their linguistic, strategic, discourse and pragmatic competence and how this differs from level to level.  

We realised it would be impossible to use data from all the interactions a successful speaker might have so we used interactive speaking tests at each level. We wanted to encourage learners and teachers to look at what successful speakers do and use that, at least in part, as a model to aim for as in many cases the native speaker model is an unrealistic target.

2. What corpora were used?

The main corpus we used was the UCLan Speaking Test Corpus (USTC). This contained data from only students  from a range of nationalities who had been successful (based on holistic test scoring) at each level, B1-C1. As points of comparison, we also recorded native speakers undertaking each test. We also made some comparisons to the LINDSEI (Louvain International Database of Spoken English Interlanguage) corpus and, to a lesser extent, the spoken section of the BYU-BNC corpus.

Test data does not really provide much evidence of pragmatic competence so we constructed a Speech Act Corpus of English (SPACE) using recordings of computer-animated production tasks by B2 level learners  for requests and apologies in a variety of contexts. These were also rated holistically and we used only those which were rated as appropriate or very appropriate in each scenario. Native speakers also recorded responses and these were used as a point of comparison. 

3. What were the most surprising findings?

In terms of the language learners used, it was a little surprising that as levels increased, learners did not always display a greater range of vocabulary. In fact, at all levels (and in the native speaker data) there was a heavy reliance on the top two thousand words. Instead, it is the flexibility with which learners can use these words which changes as the levels increase so they begin to use them in more collocations and chunks and with different functions. There was also a tendency across levels to favour use of chunks which can be used for a variety of functions. For example, although we can presume that learners may have been taught phrase such as ‘in my opinion’ this was infrequent and instead they favoured ‘I think’ which can be used to give opinons, to hedge, to buy time etc .

In terms of discourse, the data showed that we really need to pay attention to what McCarthy has called ‘turn grammar’. A big difference as the levels increased was the increasing ability of learners to co-construct  conversations, developing ideas from and contributing to the turns of others. At B1 level, understandably, the focus was much more on the development of their own turns.

4. What findings would be most useful to language teachers?

Hopefully, in the lists of frequent words, keywords and chunks they have something which can inform their teaching at each of these levels. It would seem to be reasonable to use, as an example, the language of successful B2 level speakers to inform what we teach to B1 level speakers. Also, though tutors may present a variety of less frequent or ‘more difficult’ words and chunks to learners, successful speakers will ultimately employ lexis which is more common and more natural sounding in their speech, just as the native speakers in our data also did.

We hope the book will also give clearer guidance as to what the CEFR levels mean in terms of communicative competence and what learners can actually do at different levels. Finally, and related to the last  point, we hope that teachers will see how successful speakers need to develop all aspects of communicative competence (linguistic, strategic, discourse and pragmatic competence) and that teaching should focus on each area rather than only one of two of these areas.

There has been some criticism, notably by Stefan Th. Gries and collaborators that much learner corpus research is restricting itself factorwise when explaining a linguistic phenomenon. Gries calls for a multi-factor approach whose power can be seen in a study conducted with Sandra C. Deshors, 2014, on the uses of may, can and pouvoir with native English users and French learners of English. Using nearly 4000 examples from 3 corpora, annotated with over 20 morphosyntactic and semantic features, they found for example that French learners of English see pouvoir as closer to can than may.

The analysis for Successful Spoken English was described as follows:

“We examined the data with a mixture of quantitative and qualitative data analysis, using measures such as log-likelihood to check significance of frequency counts but then manual examination of concordance line to analyse the function of language.”

Hopefully with the increasing use of multi-factor methods learner corpus analysis can yield even more interesting and useful results than current approaches allow.

Chris and his colleagues kindly answered some follow-up questions:

5. How did you measure/assign CEFR level for students?  

Students were often already in classes where they had been given a proficiency test and placed in a level . We then gave them our speaking  test and only took data from students who had been given a global pass score of 3.5 or 4 (on a scale of 0-5). The borderline pass mark was 2.5 so we only chose students who had clearly passed but were not at the very top of the level and obviously then only those who gave us permissions to do so. The speaking tests we used were based on Canale’s (1984) oral proficiency interview design and consisted of a warm up phase, a paired interactive discussion task and a topic specific conversation based on the discussion task. Each lasted between 10-15 minutes.

6. So most of the analysis was in relation to successful students who were measured holistically?  

Yes.

7. And could you explain what holistically means here?

Yes, we looked at successful learners at each CEFR level, according to the test marking criteria. They were graded for grammar, vocabulary, pronunciation, discourse management and interactive ability based on criteria such as  the following (grade 3-3.5) for discourse management ‘Contributions are normally relevant, coherent and of an appropriate length’. These scores were then amalgamated into a global score. These scales are holistic in that they try to assess what learners can do in terms of these competences to gain an overall picture of their spoken English rather than ticking off a list of items they can or cannot use. 

8. Do I understand correctly that comparisons with native speaker corpora were not as much used as with successful vs unsuccessful students? 

No, we did not look at unsuccessful students at all. We were trying to compare successful students at B1-C1 levels and to draw some comparison to native speakers. We also compared our data to the LINDSEI spoken learner corpus to check the use of key words.

9. For the native speaker comparisons what kind of things were compared?

We compared each aspect of communicative competence – linguistic, strategic, discourse and pragmatic competences to some degree. The native speakers took exactly the same tests so we compared (as one example), the most frequent words they used.

 

Thanks for reading.

 

References:

Deshors, S. C., & Gries, S. T. (2014). A case for the multifactorial assessment of learner language. Human Cognitive Processing (HCP), 179. Retrieved from https://www.researchgate.net/publication/300655572_A_case_for_the_multifactorial_assessment_of_learner_language

 

Interview with Mike Scott, WordSmith Tools developer

WordSmith Tools, a corpus linguistics program, turned 20 this year quite a feat for software from an independent developer. I have an OSX system so I don’t use WordSmith (though it can be run using Wine and/or virtualization) and also because it is a paid program – always an issue for us poor language teachers. However with the great support and new features on offer the fee seems more and more tempting. Mike Scott kindly answered some questions.

1. Who are you?
A language teacher whose hobby turned into a new career, software development for corpus linguistics. Lucky to get into Corpus Linguistics early on (1980s) and before that lucky to get into EAP early on (in the 1970s). Basically, lucky!

2. What do you think is the most useful feature in WordSmith Tools for language teachers?
WordSmith is used by loads of different types of researchers, many of them not in language teaching: literature, politics, history, medicine, law, sociology. Not many language students use it because they can get free tools elsewhere and many just use Google however much we might wish otherwise. Language teachers probably find the Concord tool and its collocates feature the most useful. 

3. Of the new features in the latest Wordsmith Tools which are you most excited about and why?
I put in new features as I think of them or as people request them. I am usually most excited by the one I’m currently working on because then I’m in the process of struggling to get it working and get it designed elegantly if I can. One I tweeted about recently was video concordancing. I think it will be great when we can routinely concordance enhanced corpora with sound and images as well as words! 

4. How do you see the current corpus linguistic software landscape?
Very much in its infancy. Computer software is only about as old as I am (born soon after WWII). Most other fields of human interest are as old as the hills. We are still feeling our way in a dark cavern full of interesting veins to explore, with only the weakest of illumination. Fun!

Many thanks to Mike for taking the time to respond and to you for reading.

Coming up in BYU-COCA, mini-interview with Prof. Mark Davies

I had the great pleasure to be able to put 4 questions to Professor Mark Davies about the BYU corpora tools and the upcoming changes (http://corpus.byu.edu/upcoming.asp).

1. Would you share what you think are interesting site user statistics?

There’s some good data at http://corpus.byu.edu/users.asp; let me know if you have questions about any of that.

2. What is the motivation behind the upcoming user interface changes?

The main thing is to have an interface that works well on laptops/desktops, as well as tablets, as well as mobile devices (cell phones, etc). More and more people are connecting to websites that work well as they are on-the-go with mobile devices, and the current corpus interface doesn’t work well for that.

3. Any chance of screenshot previews of the upcoming user interface?

Please see attached. As you can see, each of the four main “pages” — search, results, KWIC, and help — are in their own “page”, which takes up the whole screen. Users click on the tabs at the top of the page to move between these (just as they would click on the different “frames” in the existing interface). But because each “page” takes up the entire page, it will still work fine with cell phones, for example (where frames don’t work well at all).

The other cool thing in the new interface is the ability to create and use “virtual corpora” (see http://corpus.byu.edu/wikipedia.asp#tutorials for their implementation in Wikipedia corpus, using the current interface).

newBYUinterface

4. One tweeter (@cainesap) was wondering in what register would cat pictures go in the web genre corpus?

🙂 🙂 Good question. See the core.png file attached 🙂

core

The first screenshot indicates the new interface looks much cleaner and easier to use from mobiles devices and as a bonus my ebook – Quick Cups of COCA won’t be needing too much change for the new edition : )

Thanks to Professor Davies for taking time to answer this mini-interview and to you for reading.

Bonus Questions!

5. Do the search results screens stay the same as now?

Pretty similar; yes.

6. Still a mystery where cat pics and graphical memes, animated gifs go? : )​

This and many other equally profound questions can be answered with the new corpus :-).

Corpus Linguistics for Grammar – Christian Jones & Daniel Waller interview

CLgrammarFollowing on from James Thomas’s Discovering English with SketchEngine and Ivor Timmis’s Corpus Linguistics for ELT: Research & Practice I am delighted to add an interview with Christan Jones and Daniel Waller authors of Corpus Linguistics for Grammar: A guide for research.

An added bonus are the open access articles listed at the end of the interview. I am very grateful to Christian () and Daniel for taking time to answer my questions.

1. Can you relate some of your background(s)?

We’ve both been involved in ELT for over twenty years and we both worked as teachers and trainers abroad for around a decade; Chris in Japan, Thailand and the UK and Daniel in Turkey. We are now both senior lecturers at the University of Central Lancashire (UCLan, Preston, UK),  where we’ve been involved in a number of programmes including MA and BA TESOL as well as EAP courses.

We both supervise research students and undertake research. Chris’s research is in the areas of spoken language, corpus-informed language teaching and lexis while Daniel focuses on written language, language testing (and the use of corpora in this area) and discourse. We’ve published a number of research papers in these areas and have listed some of these below. We’ve indicated which ones are open-access.

2. The focus in your book is on grammar could you give us a quick (or not so quick) description of how you define grammar in your book?

We could start by saying what grammar isn’t. It isn’t a set of prescriptive rules or the opinion of a self-appointed expert, which is what the popular press tend to bang on about when they consider grammar! Such approaches are inadequate in the definition of grammar and are frequently contradictory and unhelpful (we discuss some of these shortcomings in the book).  Grammar is defined in our book as being (a) descriptive rather than prescriptive (b) the analysis of form and function (c) linked at different levels (d) different in spoken and written contexts (e) a system which operates in contexts to make meaning (f) difficult to separate from vocabulary (g) open to choice.

The use of corpora has revolutionised the ways in which we are now able to explore language and grammar and provides opportunities to explore different modes of text (spoken or written) and different types of text. Any description of grammar must take these into account and part of what we wanted to do was to give readers the tools to carry out their own research into language. When someone is looking at a corpus of a particular type of text, they need to keep in mind the communicative purpose of the text and how the grammar is used to achieve this.

For example, a written text might have a number of complex sentences containing both main and subordinate clauses. It may do so in order to develop an argument but it can also be more complex because the expectation is that a reader has time to process the text, even though it is dense, unlike in spoken language. If we look at a corpus we can discover if there is a general tendency to use a particular pattern such as complex sentences across a number of texts and how it functions within these texts.

3. What corpora do you use in the book?

We have only used open-access corpora in the book including BYU-BNC, COCA, GloWbe, the Hong Kong Corpus of Spoken English. The reason for using open-access corpora was to enable readers to carry out their own examinations of grammar. We really want the book to be a tool for research.

4. Do you have any opinions on the public availability of corpora and whether wider access is something to push for?

Short answer: yes. Longer answer: We would say it’s essential for the development of good language teaching courses, materials and assessments as well as democratising the area of language research. To be fair to many of the big corpora, some like the BNC have allowed limited access for a long time.

5. The book is aimed at research so what can Language Teachers get out of it?

By using the book teachers can undertake small-scale investigations into a piece of language they are about to teach even if it is as simple as finding out which of two forms is the more frequent. We’ve all had situations in our teaching where we’ve come across a particular piece of language and wondered if a form is as frequent as it is made to appear in a text-book, or had a student come up and say ‘can I say X in this text’ and struggled with the answer. Corpora can help us with such questions. We hope the book might make teachers think again about what grammar is and what it is for.

For example, when we consider three forms of marry (marry, marries and married) we find that married is the most common form in both the BYU-BNC newspaper corpus and the COCA spoken corpus. But in the written corpus, the most common pattern is in non-defining relative clauses (Mark, who is married with two children, has been working for two years…). In the spoken corpus, the most common pattern is going to get married e.g. When are they going to get married?

We think that this shows that separating vocabulary and grammar is not always helpful because if a word is presented without its common grammatical patterns then students are left trying to fit the word into a structure and in fact words are patterned in particular ways. In the case of teachers, there is no reason why an initially small piece of research couldn’t become larger and ultimately a publication, so we hope the book will inspire teachers to become interested in investigating language.

6. Anything else you would like to add?

One of the things that got us interested in writing the book was the need for a book pitched at undergraduate students in their final year of their programme and those starting an MA, CELTA or DELTA programme who may not have had much exposure to corpus linguistics previously. We wanted to provide tools and examples to help these readers carry out their own investigations.

Sample Publications

Jones, C., & Waller, D. (2015). Corpus Linguistics for Grammar: A guide for Research. London: Routledge.

Jones, C. (2015).  In defence of teaching and acquiring formulaic sequences. ELT Journal, 69 (3), pp 319-322.

Golebiewksa, P., & Jones, C. (2014). The Teaching and Learning of Lexical Chunks: A Comparison of Observe Hypothesise Experiment and Presentation Practice Production. Journal of Linguistics and Language Teaching, 5 (1), pp.99–115. OPEN ACCESS

Jones, C., & Carter, R. (2014). Teaching spoken discourse markers explicitly: A comparison of III and PPP. International Journal of English Studies, 14 (1), pp.37–54. OPEN ACCESS

Jones, C., & Halenko, N.(2014). What makes a successful spoken request? Using corpus tools to analyse learner language in a UK EAP context. Journal of Applied Language Studies, 8(2), pp. 23–41. OPEN ACCESS

Jones, C., & Horak, T. (2014). Leave it out! The use of soap operas as models of spoken discourse in the ELT classroom. The Journal of Language Teaching and Learning, 4(1), pp.1–14. OPEN ACCESS

Jones, C, Waller, D., & Golebiewska, P. (2013). Defining successful spoken language at B2 Level: Findings from a corpus of learner test data. European Journal of Applied Linguistics and TEFL, 2(2), pp.29–45.

Waller, D., & Jones, C. (2012). Equipping TESOL trainees to teach through discourse. UCLan Journal of Pedagogic Research, 3, pp. 5–11. OPEN ACCESS

Discovering English with SketchEngine – James Thomas interview

2015 seems to be turning into a good year for corpus linguistics books on teaching and learning, you may have read about Ivor Timmis’s Corpus Linguistics for ELT: Research & Practice. There is also a book by Christian Jones and Daniel Waller called Corpus Linguistics for Grammar: A guide for research.

This post is an interview with James Thomas,, on Discovering English with SketchEngine.

1. Can you tell us a bit about you background?

2. Who is your audience for the book?

3. Can your book be used without Sketch Engine?

4. How do you envision people using your book?

5. Do you recommend any other similar books?

6. Anything else you would like to add?

1. Can you tell us a bit about your background?^

Currently I’m head of teacher training in the Department of English and American Studies, Faculty of Arts, Masaryk University, Czech Republic. In addition to standard teacher training courses, I am active in e-learning, corpus work and ICT for ELT. In 2010 my co-author and I were awarded the ELTon for innovation in ELT publishing for our book, Global Issues in ELT. I am secretary of the Corpora SIG of EUROCALL, and a committee member of the biennial conference, TALC (Teaching and Language Corpora).

My work investigates the potential for applying language acquisition and contemporary linguistic findings to the pedagogical use of corpora, and training future teachers to include corpus findings in their lesson preparation and directly with students.

In 1990, I moved to the Czech Republic for a one year contract with ILC/IH and have been here ever since. Up until that time, I had worked as a pianist and music teacher, and had two music theory books published in the early 1990s. Their titles also beginning with “Discovering”! 🙂

2. Who is your audience for the book?^

The book uses the acronym DESKE. Quite a broad catchment area:

  • Teachers of English as a foreign language.
  • Teacher trainees – the digital natives – whether they are doing degree courses or CELTA TESOL Trinity courses.
  • People doing any guise of applied linguistics that involve corpora.
  • Translators, especially those translating into their foreign language. (Only yesterday I presented the book at LEXICOM in Telč.)
  • Students and aficionados of linguistics.
  • Test writers.
  • Advanced students of English who want to become independent learners.

3. Can your book be used without Sketch Engine?^

No. (the answer to the next question explains why not).

Like any book it can be read cover to cover, or aspects of language and linguistics can be found via the indices: (1) Index of names and notions, (2) Lexical focus index.

4. How do you envision people using your book?^

It is pretty essential that the reader has Sketch Engine open most of the time. Apart from some discussions of features of linguistic and English, the book primarily consists of 342 language questions/tasks which are followed by instructions – how to derive the data from the corpus recommended for the specific task, and then how to use Sketch Engine tools to process the data, so that the answer is clear.

Example questions:
About words
Can you say handsome woman in English?
Do marriages break up or down?
How is friend used as a verb?
Which two syllable adjectives form their comparatives with more?
Do men say sorry more than women?

About collocation
I’ve come across boldly go a few times and wonder if it is more than a collocation.
It would be reasonable to expect the words that follow the adverb positively
to be positive, would it not?
Is there anything systematic about the uses of little and small?
What are some adjectives suitable for giving feedback to students?

About phrases and chunks
Does at all reinforce both positive and negative things?
What are those phrase with lastleast; believeears; leadhorse?
How do the structures of to photograph differ from take a photo(graph),
guess with make a guess, smile with give a smile?
Which –ing forms follow verbs like like?

About grammar
How do sentences start with Given?
Who or whom?
Which adverbs are used with the present perfect continuous?
Do the subject and verb typically change places in indirect questions?
How new and how frequent is the question tag, innit?

About text
Are both though and although used to start sentences? Equally?
How much information typically appears in brackets?
Does English permit numbers at the beginning of sentences?
Is it really true that academic prose prefers the passive?
In Pride and Prejudice, are the Darcies ever referred to with their first names?

There is an accompanying website with a glossary – a work eternally in progress, and a page with all the links which appear in the footnotes (142 of them), and another page with the list of questions, which a user might copy and paste into their own document so that they can make notes under them.

5. Do you recommend any other similar books?^

The 223 page book has three interwoven training goals, the upper level being SKE’s interface and tools, the second being a mix of language and linguistics, while the third is training in deriving answers to pre-set questions from data.

AFAIK, there is nothing like this.

6. Anything else you would like to add?^

In all the conference presentations and papers and articles that I have seen and heard over the years in connection with using corpora in ELT, with very few exceptions teachers and researchers focus on a very narrow range of language questions. When my own teacher trainees use corpora to discover features of English in the ways of DESKE, they realise that the steep learning curve is worth it. They are being equipped with a skill for life. It is a professional’s tool.

Sketch Engine consists of both data and software. Both are being constantly updated, which argues well for print-on-demand. It’ll be much easier to bring out updated versions of DESKE than through standard commercial publishers. I’m also expecting feedback from readers, which can also be incorporated into new editions.

My interests in self-publishing are partly related to my interest in ICT. This book is printed through the print-on-demand service, Lulu.com. One of the beauties of such a mode of publishing is the relative ease with which the book can be updated as the incremental changes in the software go online. This is in sharp contrast to the economies of scale that dictate large print runs to commercial publishers and the standard five-year interval between editions.

There is a new free student-friendly interface which has its own corpus and interface, known as SKELL which has been available for less than a year. It is also undergoing development at the moment, and I will be preparing a book of worksheets for learners and their teachers (or the other way round). I see it as a 21st cent. replacement of the much missed “COBUILD Corpus Sampler”.

Lastly, I must express my gratitude to Adam Kilgarriff, who owned Sketch Engine until his death from cancer on May 16th, at the age of 55. He was a brilliant linguist, teacher and presenter. He bought 250 copies of my book over a year before it was finished, which freed me up from other obligations – a typical gesture of a wonderful man, greatly missed.

Many thanks to James for taking the time to be interviewed but pity my poor wallet with some very neat CL books to purchase this year. James also mentioned that, for a second edition file, Chapter 1 will be re-written to be able to use the open corpora in SketchEngine.

Skylight interview with Gill Francis & Andy Dickinson

Skylight is a relatively new corpus interface designed with teachers and students in mind. Gill Francis one of the developers kindly answered some questions. The news about forthcoming suggestions for classroom activities is something to look forward to as well as the collocation feature. It is interesting to note that Gill is very much in favour of the use of keyword in context (KWIC) concordance lines. Others such as the FLAX language learning team see KWICs as more of an hinderance and propose their own novel interfaces.

Can you share a little of your background?

Andrew Dickinson is a software writer who is interested in the use of corpora in the classroom and Gill Francis (that’s me) is a corpus linguist. In 1991 I joined the pioneering Cobuild project as Senior Grammarian. Cobuild was founded in 1980 by Professor John Sinclair (University of Birmingham). Its aim was to compile and investigate huge collections of written and spoken language in order to produce a range of dictionaries and grammars for learners that reflect how English is actually spoken and written today. My interest and direction in corpus linguistics owes everything to John Sinclair and our colleagues at Cobuild.

The Bank of English corpora grew to about 450 million words by the late 1990s. We used a fast, versatile, and powerful corpus analysis tool called ‘lookup’. As a grammarian, I was responsible for the grammatical information in the second edition of the Collins Cobuild Advanced Learner’s Dictionary (1995), along with Susan Hunston and Elizabeth Manning. The three of us also wrote the Cobuild Grammar Patterns series (1996, 97, and 98). All these publications reflected a detailed study of corpus evidence.

I’ve continued to work and publish in corpus linguistics since leaving Cobuild. (A list of publications is available.) Then a few years ago I got together with Andy to design Skylight, a program with a clear, easy interface for use by teachers and learners. Since then we have presented Skylight at various corpus linguistics conferences and seminars, and are currently developing it for more general release.

You are targeting classroom use by teachers with Skylight so what do you hope to bring that other corpus tools don’t?

1 – A clear, simple interface

Skylight has a clear, visually attractive interface. The query language is simple and intuitive, and can be learned in a couple of minutes. You can make a query by simply typing in a word or phrase without any special spacing or punctuation, for example “in my opinion” or “in the middle of” or “it’s a case of”.

To vary any word in the query, you use a pipe: “in my|his|her opinion”, or “in the middle|midst of”.

If you want to vary the query and see the range of words in a particular phrase or frame, you use one or more asterisks, for example “in my * opinion” will return “in my humble opinion”, “in my honest opinion”, “in my personal opinion” and so on.

This is about as complex as the query language gets – click on the User Manual from any page of Skylight to see examples of each kind of query. The rules are few and easily mastered by teachers and learners.

2 – Fast, easy alphabetical sorting

If you want to sort concordance lines to the right, or the left, you just click on a button above the lines. This helps you to see at a glance what the right-hand or left-hand collocates of a word or phrase are.

3 – Worksheets and classroom activities

If you are a teacher, you can use Skylight to prepare your own worksheets for corpus-based language activities. When you receive the results of a query, you can tailor the lines to fit your teaching point. This means that you can show only the lines you want, or hide those that you don’t, by clicking or entering text. You can copy the result into Word or another application using the Copy to Clipboard button. The results appear as a neat table, properly displayed and ready for your use. See the User Manual for further details and lots of examples.

Ideally, too, teachers and learners would be able to access a corpus at any point during a class, whenever they want to investigate how a word or phrase is used in a range of real language texts and situations.

For initial guidance and ideas, we are also preparing a large number of suggestions for stand-alone classroom activities practising points of grammar, lexis, and phraseology. Some of these activities address language change and the tension between prescription and description in language teaching. We’ll let you know when we release the first batch of these.

4 – A range of corpora

There are several corpora already available on Skylight – choose any one from the drop-down menu. For example, there is a very large general corpus, ukWaC, which contains 1.4 billion words, as well as smaller corpora like the BNC, BASE, and VOICE. Then there are even smaller corpora – for example a corpus of all Shakespeare’s plays and sonnets that is particularly useful for school children studying English literature.

In addition, any corpus can be compiled in response to the needs of groups of users, such as English school children or intermediate level EFL students. This depends, of course, on copyright restrictions. For more information, see the final sections of the User Manual.

Which other corpus tools would you recommend for teachers either in the classroom or outside?

We don’t feel particularly qualified to answer this question. There a lot of tools that access huge corpora and are extremely useful to linguists and lexicographers, such as Sketch Engine; the COCA (a large corpus of American English) concordancer, and Lancaster’s Corpus Query Processor. If you look up ‘corpus’ and ‘classroom’ together in any search engine, there will be several hits, but we don’t know of anything that combines an easy-to-use interface with really good classroom applications. This doesn’t mean there isn’t anything of course!

What present and/or future do you see for Google as a corpus in language learning?

One of the drawbacks of compiled corpora, such as UkWaC and the BNC, is that they are a snapshot of how language is used at a particular time (or at successive times, if a corpus is updated on a regular basis). The gathering and cleaning-up of text can take many months, so all corpora – even the most recent – are necessarily out-of-date by the time they appear.

The only way to get today’s language today is to use the web as a corpus (see for example Birmingham City University’s WebCorp). This gives results in the KWIC (Key Word in Context) format, with the word or phrase in the centre. The results are not cleaned up or processed, however, which limits their usefulness in the classroom.

But Google itself won’t give you the output you need for focusing on a word or phrase, sorting it, or looking at collocations. You’ll get plenty of examples, of course, but they won’t be shown in the KWIC format. The KWIC display is probably the most important and exciting development in modern corpus linguistics, and you need it if you are to do real corpus-based language work in the classroom or anywhere else.

Anything else you would like to add?

You asked whether we intend to add information about collocation. We are experimenting with a display modelled on the ‘Picture’ technique used in the lookup software used for the Bank Of English, which shows where collocates appear in relation to the node (the central word or phrase) – whether they tend to occur before or after it, for example.

We call the collocation display ‘Searchlight’. The Searchlight display below shows that the most frequent words immediately after obvious are that, then reasons (plural), then choice, then reason (singular). The most frequent words two to the right are of, for, and is. And so on – the columns are not connected, of course; they simply give positional collocations.

The brilliant thing about ‘picture’ that we want to replicate is that you simply click on any word to go to the relevant concordance lines. So if you click on reasons, you’d get all the lines with the combination obvious reasons. So it gives you a subset of the lines, which can then be sorted and tailored in any way you like.

skylight-searchlight

We will add Searchlight to the Skylight website as soon as possible, though we have not yet decided whether to add statistical information – probably not. In the meantime, I’d just like to say that in my many years of scrolling down concordance lines, I find that alphabetical sorting is a very good guide to the collocations of a word. I happened to search for the word intuitively recently, and returned 500 lines. If I sort them one to the right and scroll rapidly down, it’s clear that among the most frequent adjectives that follow it are appealing, correct, and obvious, while the verbs are know and understand. If I sort them one to the left, it is clear that one of the most frequent collocates is the verb be in various forms: ‘it is intuitively obvious’ and so on. Sorting one way and the other gives you a quick thumbnail sketch of a word, and is extremely useful.

So go ahead and try Skylight. And above all, click onto the User Manual, which tells you all you need to know and provides lots of examples of searches using different features.

A huge thanks to the Skylight team and do comment here about your opinions of the interface.

Thanks for reading.