Signs o’ the times – some/any invariant meanings and COCA

I am glad to be writing this particular (rushed, see end) post as it involves corpus linguistics and I have not done such a post for a while. It is also about my current interest – Columbia School linguistics.

I have been over the years less enamored of the power of corpus linguistics for language teaching. It is certainly very useful to access descriptions of language but that is not enough. Explanations are also needed. Columbia School (CS) linguistics is about analyzing invariant meanings that motivate choices in both grammar and lexis. It is about one form to one meaning mappings – an ideal aim when looking to help students.

Nadav Sabar in 2016 analyses the use of some and any. The following borrows heavily from this paper.

Most pedagogical grammars state (formal) rules such as “any is used in negative sentences and not in affirmative statements”. Yet such rules cannot account for why some is used in contexts that are said to be used for any. Sabar gives the following attested example:

1) When Yvonne lived in Italy, where it seems like the whole country is married, people always wanted to know about her personal life. I remember her telling me that every time she’d come back from a great vacation, the first question from married friends was, “Did you meet anybody?” It was as if the whole point of going on vacation was to meet someone. That she had a great time and saw something new and interesting didn’t matter. The entire vacation was cancelled or a flop because she didn’t meet someone. (http://www.yvonneandyvettetiquette.com/2008_09_01_archive.html)

Formal accounts could only say that any is also acceptable as in she didn’t meet anyone and is unconcerned with why the writer chose some in this case.

Formal accounts use the sentence as unit of analysis and see meaning as compositional – i.e. the meanings of individual words in a sentence add to the whole. CS uses signs (pairing of symbol to meaning) as the unit of analysis and sees meaning as instrumental rather than compositional. That is the individual meanings of signals need not add up to sentence meaning. There is a distinction between linguistic code that has an invariant meaning (that always corresponds to a linguistic signal) and interpretation of the code which is the subjective outcome of messages. Meanings are very sparse in that they do not encode messages but only offer prompts that may only suggest message elements.

The meaning hypotheses of some and any are shown below:

I.e. some as RESTRICTED suggests limits, internal divisions, boundaries while any as UNRESTRICTED suggests no boundaries, limits or divisions. Note that this does not mean that the domain in question in reality has no divisions or boundaries. Just that the reality is irrelevant to the message. Also note that in a pedagogical grammar such as Martin Parrott’s this meaning division between restricted and unrestricted is only described for stressed SOME and ANY.

Sabar uses the following as examples:

2) If you see something, say something. (New York City public safety slogan)
3) No parking any time (street sign)

In 2) some is used because the message suggested is a restriction on the set of things people see and say. The context drives the inference as to the nature of the restriction – suspicious looking things. Any could also have been used but that would not have been as effective a message – any would have suggested no restriction i.e. people should call no matter what they see.

Similarly in 3) any is used because there is no restriction on the domain of times of the day.

So now for 1) we can see some is used because the message suggests a restriction of the set of people Yvonne did not meet, and the context shows that this restriction as people who may qualify as marriage potential.

Now the interesting corpus linguistics part.

The methodology of CS first involves a qualitative step where some aspect of the sign in question is looked at. So for some which suggests restriction another element which suggests the same is looked for:

4) Some Feds [Federal workers] are held up as national heroes while others are considered a national joke. (ABC Nightline: Income Tax)

Here others is used to refer to a different subset of people within the domain of Federal workers. This message element is also suggested by some – RESTRICTED. This does not mean there is only one reason for the choice of these forms rather that this message feature of internal division is one reason out of many possible reasons that has motivated the choice of these two forms.

To test this claim generally we can look at a corpus to see if there is a higher than probable chance that others occurs with some more than others occurs with any.

We can do this in COCA by using these search terms:

COCA searches for others:

Favoured Disfavoured
some [up to 9 slots] others any [up to 9 slots] others

The following screenshot shows how to find some [up to 9 slots] others (do similar for any):

To find some with not others see the next screenshot (i.e. use the minus sign -):

And tabulating the data in a contingency table:

others present others absent
N % N %
some 19078 90 8946046 65
any 2022 10 4841946 35
Total 21100 100 13787992 100

p < .0001

The table percentages and significance test supports the claim that there is one message feature that motivates use of both some and others. Note that the meaning hypothesis itself is not directly tested; it is only indirectly tested via the counts in COCA. Sabar goes onto to test both qualitatively and quantitatively other signals that contribute to the meaning hypothesis of some – RESTRICTED and any – UNRESTRICTED.

I wondered how the singular other would distribute with any and some:

other present other absent
N % N %
any 39244 52 4811937 35
some 35175 48 8930621 65
Total 74419 100 13742558 100

p < .0001

Here can we say that singular other contributes to a message meaning of unrestricted? I have no idea as I have not had time to explore this further!

I hope dear reader you forgive the rushed nature of this post but I wanted to get something up before the risk of forgetting this due to holiday haze!

Thanks for indulging.

Update 1:

Thanks to heads up from some tweeters Michael Lewis in his book The English Verb in 1986 was also pointing to the primacy of meaning:

Update 2:

Nadav Sabar has pointed out that he looked for others in one direction i.e. following some/any whereas I looked at occurrence of others both following and before some/any.
Plus in a new version of his paper a window size of 2 is used instead of 9.

References:

Parrott, M. (2000). Grammar for English language teachers: with exercises and a key. Cambridge University Press.

Sabar, N. (2016). Using big data to test meaning hypotheses for any and some. In Otheguy, R., Stern, N., Reid, W. and Ruggles, J. (Eds.) Columbia School linguistics in the 21st century: advances in sign-based linguistics. Amsterdam/Philadelphia: John Benjamins. Retrieved from [https://www.academia.edu/33968803/Using_big_data_to_test_meaning_hypotheses_of_some_and_any]

Advertisements

Article use: from cognitive salience to discourse differentiation

The following borrows heavily from the original paper.

Elena Gorokhova in 1995 reports on a developmental stage description of article use by Spanish L1 learners of English. She follows a description of final state article use that was formulated by William Diver – the founder of Columbia School linguistics which is a sign-based functional linguistics account. A sign is a pairing of a signal with its meaning.

In Diver’s account the/a signals a need to differentiate referents in a piece of discourse while the Ø zero article signals no such need. The signal is used when there is enough information available to differentiate referents and a/an signal is used when there is insufficient information available to differentiate referents. For the Ø zero article four communicative reasons are given:

a) referent is unimportant to message as message is about an associated activity.
He went to Ø bed early (went to sleep on whatever bed)

b) referent important but no chance of confusion
He went Ø home (his home)
He went to Ø school (his school)

c) only one possible referent
Ø Einstein died in Ø Princeton

d) no differentiation among instances needed
Ø Water boils at 100C (any and all water)

The above is represented in the figure below:

Gorokhova then postulates  4 stages based on her longitudinal data which culminate in the Standard English state shown above.

In the first stage learners only have the which is used with cognitively more salient referents. Hence important and visible referents are signaled by the:

In stage II the signal a is acquired. The now in addition to signaling importance is used to mark large size of a visible referent. A signals visible referents smaller in size and which are less important. Note that in stages I and II the and a are used very differently to the end state standard English. In stages I and II they are used to show degrees of attention whilst in the end state standard English they are used to show degrees of differentiation of referents in discourse:

Stage III learners begin to pay attention to the larger discourse although their linguistic value is still based on cognitive salience. Stage III is a transitional stage:

In stage IV discourse plays a significant role in the use of articles. Learners choose the and a on differentiation of referents. Context is used from restrictive clause or noun phrases or successive mention of the same referent. The is also used with familiar referents such as bank, school etc:

In stage V students acquire use of the Ø zero article. Here also “frame anaphora” is evoked by the use of the e.g. Someone is driving and there are people in the back seat. The speaker relies on shared non-linguistic knowledge (driving is usually done in a car which usually has seats) with the hearer. This stage is hard to acquire – of the seventy learners in Gorokhova’s study only two showed Stage V article usage.

Although this study suggests a particular order of acquisition – The > A > Ø Zero article, there is no consensus in the literature. Some studies support this order, others show A > The > Ø Zero article, others show Ø Zero article > The > A.

What is heavily implied though is that due to the discourse effects on article use, articles should not be taught in isolated sentences but with a piece of discourse in addition to background information about the speaker and hearer.

I recently drew Figure 1 and Master’s figure of Classification vs Identification with a student. She preferred Master’s figure as she had trouble understanding the word differentiation. It should be noted in this case of focus on form and meaning there was only a cursory look in response to her question about using articles.

Thanks for reading and do check the Columbia School of linguistics as I believe this approach has a lot of potential for use in class. And do also check some other thoughts on article use here:

  1. Articles and collocational effects
  2. Classified and Identified – A pedagogical grammar for article use
  3. A, an, the, definiteness and specificity

References:

Gorokhova, E. (1995). Acquisition of English articles by native speakers of Spanish. In Contini-Morava, E. & Goldberg, B. S. (Eds.) Meaning as explanation: Advances in linguistic sign theory,  441-452. Berlin: Mouton de Gruyter.

Practice in second language learning – interview with the editor

I was working with an individual student at about A2 level a few weeks back. Her speaking skills are relatively weak compared to her listening skills. I decided some job related drilling would be appropriate. As she was going through the drill I was hesitating about how much would be of use. Before the advent of the modern communicative approach, practice in language teaching was often associated with such mechanical type activities. And such exercises have been criticized as using decontextualized and inauthentic language. So on this point (decontextualised/inauthentic language) I was more confident (as the student was using example language related to her work) than on the value of the drilling i.e. repetitive production of language.

In a new book edited by Christian Jones – Practice in second language learning, practice is defined broadly as “specific activities in the second language, engaged in systematically, deliberately, with the goal of developing knowledge of and skills in the second language”. Although there is no explicit discussion on drilling the chapters within do cover many interesting issues related to practice.

Christian Jones kindly answered some questions about the book:

1. What made you decide there was a need for this book at this time?
Practice is a central part of second language teaching and learning in many contexts and yet remains somewhat under-researched. This seems something of a gap in the literature. Teachers and researchers need evidence about what seems to work and what doesn’t in various contexts and with different language areas/skills. There has not been a volume focused on this area since Robert DeKeyser’s book in 2007 and we wanted to add research to the field.

2. What would readers get from this book that they wouldn’t from DeKeyser 2007?
The DeKeyser book is, in my view, a very important contribution to our field. Robert DeKeyser was kind enough to add a foreword to this volume as we wanted to acknowledge his important work in this area. In our book, we have tried to explore practice as we might find it in classrooms, online and in periods of study abroad. We wanted to research practice in different second languages, contexts and using different reseach designs and we hope this will be of interest to a variety of teachers and researchers.

3. The definition given in the book for practice is described as “broadly defined”. What would a more narrowly defined version say?
A narrowly defined version of practice might view it something tied to a particular framework such as PPP. In fact, practice forms a part of many types of methodology. For example, in the TBLT literature, task repetition is undoubtedly a form of practice. A narrowly defined version might view it as something connected to learner output. In fact, we can and do talk of receptive and productive practice. A narrow version of practice might view it as connected only to skill building theories of second language acquisition but we can link it to several others, including the noticing hypothesis and input processing.

4. What in your view is the most outstanding question on the topic of practice (both for teaching and research)?
There are several! But here is one. Chapter one by Mike McCarthy and Jeanne McCarten makes the point that practising conversation and speaking practice are not the same. CLT often features activities we can term ‘speaking practice’ but it is something of a stretch to think that typical activities such as information gaps etc (as helpful as they are in some ways) allow learners to practise conversations. In order to develop conversational skills, learners will need to practise aspects of conversation such as good listenership and linking their turn to another speaker. We need to investigate ways to practise these things. One way is to research the effectiveness of an Illustration-Interaction-Induction (III) framework which McCarthy and McCarten suggest can be useful for practising aspects of conversation. Such research might be undertaken by comparing III to other methodologies.

I have yet to form a definite opinion on drilling but having read only the first two chapters of the book I hope any future opinion on drills and practice in general to be better informed.

Thanks for reading and do note I was kindly sent a review copy of the book. But don’t hold your breath for a proper review : )

Beyond the symbolic violence dome of the native speaker teacher

An article titled How to end native speaker privilege was posted recently on the always readable site Language on the Move. It includes an intriguing historical account of teachers of Persian in India and England in the 18th and 19th centuries. It also includes a framing of the native and non-native (English) speaker (teacher) which is problematic.

The first problem is the othering of native speaker teachers – who are implicitly depicted as a homogenous, static, monolithic entity, an undifferentiated mass of native speaker teachers.

The second problem is seen in the symbolic violence of phrasing such as “Subordinating native speakers” and that the injustices suffered by non-native speaker teachers can be resolved by “replacing” native speaker teachers with non-native speaker teachers.

Research in France by Martine Derivry-Plard and Claire Griffin reveals a picture of native speaker teachers and non-native speaker teachers in a more differentiated light. And it explores the question of going beyond the widespread symbolic violence that is due to a monolingual-monocultural world view.

Symbolic violence is a way to impose social order by social agents. The social agents act to position themselves favorably in a field. In the present case the field is the foreign language teaching field which is part of the language teaching field which in turn is part of the linguistic field of teaching which itself forms part of the linguistic field.

It is certainly the case that in the foreign language teaching field of English non-native speaker teachers are subject to various forms of symbolic violence. The Language on the Move article notes in passing that certain aspects of this violence are being addressed such as legal prohibitions on discriminatory job adverts and growing discussions of complementary strengths of non-native and native speaker teachers. Derivry-Plard and Griffin (2017) report on symbolic violence present in the experiences of native (mainly English) speaker teachers working in France.

In the first study 19 native English speaking teachers (NESTs) and 19 non-native English speaking teachers (NNESTs) teaching a BTS course (a 2 year course after the baccalaureate) are interviewed. The interviews revealed that NNESTs criticized the teaching skills of their native colleagues, that is NESTs were seen more as speakers than as teachers of English:
“some had not the project of teaching English …I have seen native English-speaking teachers who did not do the job … but, it is just because they are not teachers, they turned up in a classroom … they delivered what they could, they thought that speaking English for two hours is enough! … but this is not having a conversation, speaking about this or that for an hour ? …And some do not know French enough, which is a problem .. Some do not teach!” (Derivry-Plard & Griffin, 2017:39)

Conversely the NNESTs are denied as speakers of English by their native colleagues and consequently NNEST’s cannot be good teachers of English:
“well, it’s second language, it’s second-hand! … in this schoolbook written by French, there are a few mistakes … they make mistakes, with English vowels, their accent is not as good … Sometimes, her accent was awful and there were English teachers I could barely understand …She made so many mistakes .. and some pupils were as good as she was in English! …She could not give a precise meaning of a word with all the connotations… even if the dictionary gives that meaning, it has no longer that meaning…at a certain point, a non native teacher will be embarrassed, this is for sure because, at one point, he/she will apply a grammar rule that we no longer use …they will never get all the shades of meaning ...” (Derivry-Plard & Griffin, 2017:39)

These attitudes reflect the two teaching legitimacies that have developed in the foreign language (FL) teaching field of English in France, since the 19th century, from the spaces of the public education system (institutional) and the private educational system (non-institutional).
1. The professional legitimacy of non-native teachers in institutional spaces was based on the assumption that they were the best teachers as they went through the same learning process as their pupils, so they would be better able to explain the target language to learners sharing the same mother tongue. This is the legitimacy of the FL teacher as a learning model.
2. The professional legitimacy of native teachers in non-institutional spaces was based on the opposite assumption that they were the best teachers because they taught their own “mother tongue” and that they knew more about it. This is the legitimacy of the FL teachers as a language-culture model. (Derivry-Plard & Griffin, 2017:34)

For some time these two legitimacies were not challenged, but with the globalization and marketization of education the boundaries between institutional and non-institutional are breaking down and with it the increase in symbolic violence on non-native and native speaker teachers.

In the second, doctoral study, Claire Griffen interviewed 24 native speaker teachers. 21 were native English speakers from the UK and the Republic of Ireland and 3 were native speakers of Italian, Greek and German who worked in the secondary education sector. These teachers experiences were grouped and analysed into various themes. For example: experiences of resentment at native speakers being able to take the national competitive exams; encounters that NEST’s are not already qualified even if they have in fact more qualifications than their non-native colleagues –
“sometimes people assume that you’re only an English teacher because you’re English. “Well what else is she going to do, she’s married? What else is she going to do? She’s got children. What else can she do? She can speak English” (Derivry-Plard & Griffin, 2017:43);

NESTs are forever operating in the mode of a “learner” as they were not initially socialized in the education system as children; experiencing symbolic violence such as “but you never had to learn English like us, you just have to open your mouth” (Derivry-Plard & Griffin, 2017:46).

I remember when I started teaching in France a student was impressed by what he described as an Oxbridge accent. His subsequent question of where I had studied made me embarrassed to reveal to not having been educated at either Oxford or Cambridge. Although to be fair to the student he did not seem to show any disappointment at my un-elite education. Also, back then, when new English friends and acquaintances found out I teach English as a foreign language they would joke that there would be a generation of French people speaking English with a Welsh accent. Though that joke has not been heard for many a year.

Having described some of the issues faced by native English speaker teachers in France there is a danger that we move from talking about who is the best teacher to who is the most discriminated teacher (Derivry-Plard, 2018). How then do we go beyond the symbolic violence? The embedded fields given earlier i.e. linguistic field < linguistic field of teaching < language teaching field < foreign language teaching field can help us to see the multilingual multicultural paradigm of today. The linguistic field of teaching involves all subject matter as language is the medium used to deliver the subjects. i.e. all teachers are to some extent language teachers (this is very evident in say CLIL contexts). Next the field of language teaching can be divided into first language, second language and foreign languages. In this way the embedded model of fields takes into account language diversity, lingua cultures and cultural repertoires.

A French teacher of English in a recent twitter chat on native and non-native speaker issues commented jokingly on teaching French teenagers :
“To tell the truth, I feel like speaking their native language doesn’t help either…. someone speaking the “teenager” language would be better off!!” [https://twitter.com/Pascalune12/status/1001905158719229959]

Can we say here that the appearance of “teenager language” in the humor is a glossed acknowledgement of the pluricultural landscape of teaching? The native speaker paradox derives from a monolingual and monocultural assumption that is largely due to the centuries old drive to nation states which culminated in the 19th century. The multilingual, pluricultural paradigm encompases the monolingual-monocultural one. While in the old monolingual paradigm native speakers are included and non-native speakers are excluded in the multilingual world the native speaker is not excluded as a way to right wrongs but is part of the plurilingual continuum.

As Derivry-Plard puts it:
“There are no longer any dichotomies but continua for defining languages, cultures, speakers, and teachers as social actors. In other words, the monolingual paradigm is restrictive and exclusive, whereas the multilingual paradigm is comprehensive and inclusive and accounts for a broader perspective and better understanding of the linguistic field and the linguistic markets.” (Derivry-Plard, 2018:143)

She does not deny that embracing this is a difficult task however ignoring the necessity of this challenge is unethical and counterproductive.

Thanks for reading.

References

Derivry-Plard, M. & Griffin, C. (2017). Beyond Symbolic Violence in ELT in France. In Agudo, J. D. D. M. (Ed.) Native and Non-native Teachers in English Language Classrooms: Professional Challenges and Teacher Education (Vol. 26) (pp. 33-51). Walter de Gruyter GmbH & Co KG.

Derivry-Plard, M. (2018). A Multilingual Paradigm in Language Education: What It Means for Language Teachers. In Houghton, S. A. & Hashimoto, K. (Eds.) Towards Post-Native-Speakerism (pp. 131-148). Springer, Singapore.

The Prime Machine – a new concordancer in town

One of the impulses behind The Prime Machine was to help students distinguish similar or synonymous words. Recently a student of mine  asked about the difference between “occasion” and “opportunity”. I used the compare function on the BYU COCA to help the student induce some meaning from the listed collocations. It kinda, sorta, helped.

The features offered by The Prime Machine promises much better help for this kind of question. For example in the screenshot below the (Neighbourhood) Label function shows the kind of semantic tags associated with the words “occasion” and “opportunity”. Having this info certainly helps reduce time figuring out the differences between the words.

Neighbourhood Labels for the comparison of occasion and opportunity

One of the other sweet new features brought to the concordancer table, is a card display system as seen in the first screenshot below. Another is information based on Michael Hoey’s lexical priming theory such as shown in the second screenshot below.

Card display for comparison of words occasion and opportunity
Paragraph position of the words occasion and opportunity

The developer of the new concordancer Stephen Jeaco kindly answered some questions.

1. Can you speak a little about your background?

Well, I’m British but I’ve lived in China for 18 years now.  My first degree was in English Literature and then I did my MA Applied Linguistics/TESOL and my PhD was under the supervision of Michael Hoey with the University of Liverpool.

I took up programming as a hobby in my teens.  If I hadn’t got the grades to read English at York, I would have gone on to study Computer Science somewhere.  In those days the main thing was to choose a degree programme that you felt you would enjoy.  Over the years, though, I’ve kept a technical interest and produced a program here or there for MA projects and things like that.

I’ve worked at XJTLU for 12 years now.  I was the founding director of the English Language Centre, and set up and ran that for 6 years.  After rotating out of role, I moved into what is now called the Department of English where I lecture in linguistics to our undergraduate English majors and to our MA TESOL students.

2. What needs is The Prime Machine setting out to fill?

I started working on The Prime Machine in 2010, at the beginning of my part-time PhD.  At that time, I was interested in corpus linguistics but I found it hard to pass that enthusiasm on to my colleagues and students.  We had some excellent software and some good web tools, but internet access to sites outside China wasn’t always very reliable, and getting started with using corpora for language learning usually meant having to learn quite a lot about what to look for, how to look for it, and also how to understand what the data on-screen could mean.

Having taught EAP for about 10 years at that time, I felt that my Chinese learners of English needed a way to help them see some of the patterns of English which can be found through exploring examples, and in particular I wanted to help them see differences between synonyms and become familiar with how collocation information could help them improve their writing.

I’d read some of Michael Hoey’s work while doing my MA, and in his role of Pro Vice Chancellor for Internationalization I met him at our university in China.  His theory of lexical priming provided both a rationale for how patterns familiar in corpus linguistics relate to acquisition and it also gave me some specific aspects to focus on in terms of thinking about what to encourage students to notice in corpus lines. 

The main aim of The Prime Machine was to provide an easy start to corpus linguistic analysis – or rather an easy start to using corpus tools to explore examples.  Central to the concept were two main ideas: (1) that students would need some additional help finding what to look for and knowing what to compare and (2) that new or enhanced ways of displaying corpus lines and summary data could help draw their attention do different patterns.  Personally, I really like the “Card” display, and while KWIC is always going to be effective for most things, when it comes to trying to work out where specific examples come from and what the wider context might be, I think the cards go a long way towards helping students in their first experiences of DDL.

Practically speaking, another thing I wanted to do was to start with a search screen where they could get very quick feedback on anything that couldn’t be found and whether other corpora on the system would have some results. 

3. What kind of feedback have you got from students and staff on the corpus tool?

I’ve had a lot of feedback and development suggestions from my students at my own institution.  Up until a few weeks ago, The Prime Machine was only assessable to our own staff and students.  The majority of users have been students studying linguistics modules, mostly those who are taking or have taken a module introducing corpus linguistics. However, for several years now I have also had students using it as a research tool for their Final Year Project – a year-long undergraduate dissertation project where typically each of us has 4 to 5 students for one-to-one supervision.  They’ve done a range of projects with it including trying to apply some of Michaela Mahlberg’s approaches to another author, exploring synonyms, exploring the naturalness of student paraphrases or exam questions.  People often think of Chinese students as being shy and wanting to avoid direct criticism of the teacher, but our students certainly develop the skills for expressing their thoughts and give me suggestions!

In my own linguistics module on corpus linguistics, I’ve found the new version of The Prime Machine to be a much easier way to get students started at looking at their own English writing or transcripts of their speech and getting them to consider whether evidence about different synonyms and expressions from corpora can help them improve their English production.  Personally, I use it as a stepping stone to introducing features of WordSmith Tools and other resources.

In terms of staff input, I’ve had a couple of more formal projects, getting feedback from colleagues on the ranking features and the Lines and Cards displays.  I’ve also had feedback by running sessions introducing the tool as part of a professional development day and a symposium.  Some of my colleagues have used it a bit with students, but I think while it required access from campus and before I had the website up, it was a bit too tricky even on site. 

On the other hand, I’ve given several conference papers introducing the software, and received some very useful comments and suggestions.

I need to balance my teaching workload, time spent working towards more concrete research outputs and family life, but if we can get over some of the connectivity issues and language teachers want to start using The Prime Machine with their students, I’m going to need as much feedback as possible.  I’d like to hope I could respond and build up or extend the tool, but at the same time there’s a need to try to keep things simple and suitable for beginners. 

 4. You have some extra materials for students at your institution, could you describe these?

There’s nothing really very special about these.  But having the two ways of accessing the server (offsite vs. on-site) means if corpus resources come with access restrictions or if a student wants to set up a larger DIY corpus for a research project I’m able to limit access to these.

Other than additional corpora, there are a few simple wordlists which I use in my own teaching and some additional options for some of the research tools.

5. What developments are in the pipeline for future versions of The Prime Machine?

One of the main reasons I wanted The Prime Machine to be publically available and available for free was so that others would be able to see some of the features I’ve written about or presented about at conferences in action.  In some ways, my focus has changed a bit towards smaller undergraduate projects for linguistics, but I still have interests and contacts in English language teaching.  Given some of the complications of connecting from Europe to a server in China, unless someone finds it really interesting and wants to set up a mirror server or work more collaboratively, I don’t think I can hope to have a system as widely popular and reliable as the big names in online concordancing tools.  But having interviews like this and getting the message out about the software through social media means that there is a lot more potential for suggestions and feature requests to help me develop in ways I’ve not thought of.

But left to my own perceptions and perhaps through interactions with my MA TESOL students, local high schools and our language centre, I’m interested in adding to the capabilities of the search screen to help students find collocations when the expression they have in mind is wildly different from anything stored in the corpus.  At the moment, it can do quite a good job of suggesting different word forms, giving some collocation suggestions and using other resources to suggest words with a similar meaning.  But sometimes students use words together in ways that (unless they want to use language very creatively) would stump most information retrieval systems.

Another aspect which I could develop would be the DIY text tools, which currently start to slow down quite rapidly when reading more than 80,000 words or so.  That would need a change of underlying data management, even without changing any of the features that the user sees.  I added those features in the last month or two before my current cohort of students were to start their projects, and again, feedback on those tools and some of the experimental features would be really useful.  On the other hand, I point my own students to tools like WordSmith Tools and AntConc when it comes to handling larger amounts of text!

The other thing, of course, is that I’m looking forward to getting hold of the BNC 2014 and adding another corpus or two.  Again, I can’t compete with the enormous corpora available elsewhere, but since most of the features I’m trying to help students notice differ across genre, register and style, I am quite keen on moderately sized corpora which have clearly defined sub-corpora or plenty of metadata.

One thing I would like to explore is porting The Prime Machine to Mac OS, and also possibly to mobile devices and tablets.  But as it stands, using The Prime Machine requires the kind of time commitment and concentration (and multiple searches and shuffling of results) that may not be so suitable for mobile phones.  I sometimes think it is more like the way we’d hunt for a specialist item on Taobao or Ebay when we’re not sure of a brand or even a product name, rather than the kind of Apps we tend to expect from our smart phones which provide instant ready-made answers.  Redesigning it for mobile use will need some thought.

Personally, I’m hoping to start one or two new projects, perhaps working with Chinese and English or looking more generally at Computer Assisted Language Teaching.  

Now that The Prime Machine is available, while of course it would be great if people use it and find it useful, more importantly beyond China I think I’d hope that it could inspire others to try creating new tools.  If someone says to the developer working on their new corpus web interface, “Do you think you could make a display that looks a bit like that?”, or “Can you pull in other data resources so those kinds of suggestions will pop up?”, I think they wouldn’t find it difficult, and we’d probably have more web tools which are a bit more user-friendly in terms of operation and more intuitive in terms of support for interpretation of the results. 

6. What other corpus tools do you recommend for teachers and students?

Well, I love seeing the enhancements and new features we get with new versions of popular corpus tools.  And at conferences, I’m always really impressed by some of the new things people are doing with web-based tools.   But one thing that I would say is that for the students I work with, I think knowing a bit more about the corpus is more useful than having something billions of words in size; being able to explore a good proportion of concordance lines for a mid-frequency item is great.  I think having a list of collocations or lines from millions of different sources to look at isn’t going to help language learners become familiar with the idea that concordance lines and corpus data can help them understand, explore and remember more about how to use words effectively. 

Nevertheless, I think those of us outside Europe should be quite jealous of the Europe-wide university access to Sketch Engine that’s just started for the next 5 years.  I also really like the way the BYU tool has developed.  I was thrilled to get hold of the MAT software for multidimensional analysis.  And I think I’ll always have my WordSmith Tools V4 on my home computer, and a link to our university network version of WordSmith Tools in my office and in the computer labs I use.

Thanks for reading. Do note if you comment here I need to forward them to Stephen (as he is behind the great firewall of China) and so there may be a delay in any feedback. Alternatively contact Stephen yourself from the main The Prime Machine website.

Also do note that the current available version of The Prime Machine may not work at the moment but wait a few days for a fix to be applied by Stephen and try again then.

Explore some topics in applied linguistics

Thanks to a post by Jason Anderson I read a paper called – Research trends in applied linguistics from 2005 to 2016: A bibliometric analysis and its implications.

The authors kindly sent me a file of the abstracts that they had collected. I thought some topic modelling would be interesting to do on the data. Topic modelling is a way to discern what a set of documents is “about” by getting a program to find clusters of words. Lei & Liu (2018) used a different approach called n-grams to find their topics.

They found for example that between 2005-2016 there was a significant decrease in formal linguistic issues, such as phonology and syntax.

The topic modelling also shows this decrease (note that the corpus used in the topic modelling runs from 2000-2016):

By contrast they found significant increases in topics related to sociocultural issues. The topic modelling also indicates this:

The topic model correlational matrix is interesting to look at. The screenshot below shows that the topic “english chinese paper” (full topic cluster is “english chinese paper hong use varieties kong world local language”) is significantly related to the topic “language social identity” (full cluster is “language social identity how practices literacy languages policy linguistic multilingual”):

Though I am not sure how to interpret the red blobs! If you do let me know.

Finally the model indicates that topics related to child language development seem to be on the wane (full cluster is “children language age children’s development early study acquisition adults years”):

Have a play with the model and if you spot anything interesting do leave a comment. Note running iterations can be a tad slow.

Thanks for reading.

References:

Lei, L., & Liu, D. (2018). Research Trends in Applied Linguistics from 2005 to 2016: A Bibliometric Analysis and Its Implications. Applied Linguistics.

The paradox of re-usability in language materials

The title is adapted from a critique of learning objects (orginally defined as digital resources used to aid learning) in the field of instructional design by David Wiley. What follows is borrowed heavily/paraphrased from his writings.

Author Julie Moore raises intellectual rights and copyright issues with the idea of having editable materials from course books. However there is a deeper paradox in editable or re-usable material.

If we look at a typical unit in a coursebook it may have sections such as language focus and practice, input reading and/or listening and output speaking and/or writing all centered around the unit topic. We could describe this unit has having an internal context, that is the elements which make up the unit – instructions on how to use a language point, practice exercises on this language point, a picture that goes with the reading text, a role play that goes with a speaking activity etc. The more elements that are in the unit the larger the internal context of the unit.

External context would be the other units in the coursebook. A learning object is said to have no external context independent of its instructional use. That is external contexts exists for a learning object only for the purposes of some instructional procedure.

The number of external contexts in which a learning object will instructionally fit varies according to the internal context of the said object. An instructional fit is the effectiveness of the object.

A large object (i.e. one with many elements) has a greater internal context than a small object. Larger objects fit into fewer external contexts than smaller objects.

To restate this the fit of an object with other objects is a function of 1) its internal context and 2) its external context with other objects. The more internal context you have i.e. the more elements, the better will be the (pedagogical) effectiveness of the object. But the internal context is inversely related to the number of other objects in the external context. So the paradox is that the effectiveness of a learning object and its potential for reuse (i.e. to fit in with external context) are contradictory.

So you have a trade-off between effectiveness and re-usability. For editable materials the more you make it re-usable the less effective it will be pedagogically.

Now one could argue that learning objects is concerned with conceptual knowledge (e.g. teaching someone how to develop web pages) whereas language goes beyond the limits of such knowledge. Language avoids the re-usability paradox. It has both a lot of internal context (systems such as phonology, syntax) and a lot of external context (systems such as semantics, pragmatics).

However as the world of language course books currently exists it could be said to follow the path set by other books that deal with conceptual knowledge. If this is the case then the re-usability paradox applies.

Furthermore the paradox is due to the author rights issues covered by Julie Moore. In the copyright context “reuse” means more or less “use as exactly as is”. So is there a way out of the re-usability paradox?

As David Wiley puts it:

The way to escape from the Reusability Paradox is simply by using an open license. If I publish my educational materials using an open license, I can produce something deeply contextualized and highly effective for my local context AND give you permission to revise and remix it until it is equally effective to reuse in your own local context. Poof! The paradox disappears. I’ve produced something with a strong internal context which you have permission to make fit into other external contexts.

How likely are we to see open content from commercial parties judging by the state of current play in the ELT publishing world? Happily individual teachers and grassroots organizations are already thinking and working on this.

Thanks for reading.