Bill Louw – I intend to revive logical positivism

My last two posts (Locating collocation and Thin word lists and fat concordances) have used the ideas of Bill Louw, who kindly agreed to talk about his work. (Note if you are reading this from a mobile device you may need to refresh a few times to get all the audio to load)

The title of this post indicates his overall goal to revive logical positivism 1 (Schlick refers to Moritz Schlick one of the founders of logical positivism):

Revive logical positivism

He describes how he is doing this by merging Firthian ideas with logical positivism via the shared idea of context of situation (semantic prosody is a type of contextual meaning):

Hand over to science

Louw claims that another of the founders of logical positivism Rudolf Carnap was prevented from continuing his work on induction and probability when Carnap moved to the USA. Apparently this is evident from letters between Carnap and American philosopher Willard VO Quine. The significance of induction was highlighted by Bertrand Russell who stated that we can’t have science without induction. A very common representation of induction is the “All swans are white” example or more generally “All A’s are B’s” however Moritz Schlick saw induction differently:

Schlick on induction

Louw goes on to add how Schlick describes the relation between thinking and reality:

Schlick on thinking

The above clip is important to understand how Louw critiques the idea of collostruction. Collostruction is a way to measure collocation as it relates to grammar and Louw points out the weakness in such an approach in terms of the “given” i.e. reality/experience (Gries refers to Stefan Th. Gries inventor of collostruction):

Collostruction and the given

Another way Louw illustrates his project to revive logical positivism is how he derives the idea of subtext from Bertrand Russell’s idea of a perfectly logical natural language:

Subtext 1

He then describes how Firthian collocation needs to be brought in to augment subtext if languages like Chinese are to be studied:

Subtext 2

For some reason until I started reading Louw I did not quite get the idea of progressive delexicalisation – that words have lots of meanings that differ from their literal meanings. Previously I was only thinking of delexicalisation with respect to verbs such as ‘make’ and ‘do’. And further that many words we may think have mostly literal meanings in fact have mostly delexical meanings. Louw & Milojkovic (2016: 6) give the example of ‘ripple’, where only one form in ten occurred with ‘water’ and ‘surface’ using the Birmingham University corpus.

Louw describes how John Sinclair called this the blue-jeans principle:

Sinclair’s blue jeans

In the early 90’s Louw tested the idea of Sinclair’s that every word has at least two meanings:

Lexical-Delexical

The start of the 80’s recalls how Louw encountered the idea of a computer writing a dictionary:

Computer writing

Louw gives an example of how the computer can help using US presidents Trump & Biden:

Computer reassurance

Louw is keen to distinguish collocation from colligation:

Deceptive colligation

Louw admits his self-obsession on the idea of bringing together Firth and the Vienna school:

Firth & Vienna

Louw’s conviction of his project reflects the certainty of the logical positivists and despite that stream of thought no longer being the force it was Louw’s drive recalls Richard Rorty (without condoning the sexist language) as quoted in Goldsmith & Laks (2019: 443):

“The sort of optimistic faith which Russell and Carnap shared with Kant – that philosophy, its essence and right method discovered at last, had finally been placed upon the secure path of science – is not something to be mocked or deplored. Such optimism is possible only for men of high imagination and daring, the heroes of their times”

Thanks for reading & listening and many thanks to Bill Louw for taking time to chat with me.

Notes

  1. Wikipedia Logical Positivism https://en.wikipedia.org/wiki/Logical_positivism

References

Goldsmith, J. A., & Laks, B. (2019). Battle in the mind fields. University of Chicago Press.

Louw, B., & Milojkovic, M. (2016). Corpus stylistics as contextual prosodic theory and subtext (Vol. 23). John Benjamins Publishing Company.

Advertisement

Locating collocation

The Wikipedia entry on collocation says:
“..a collocation is a series of words or terms that co-occur more often than would be expected by chance”. 1
This is the description of collocation that is linked to by leaders in corpus tools SketchEngine in their syllabus for a new online course.2

Note the statistical aspect in the definition “more often than would be expected by chance”.

The wiki entry then reads “There are about six main types of collocations: adjective + noun, noun + noun (such as collective nouns), verb + noun, adverb + adjective, verbs + prepositional phrase (phrasal verbs), and verb + adverb. “

Note the emphasis on the grammar aspect of collocation.

Bill Louw would place this wiki definition (alongside Goran Kjellmer’s definition of collocation – ‘sequence of words that occurs more than once in identical form…and which is grammatically well structured’) at the bottom of the diagram below:

(Louw & Milojkovic 2016: 53)

The diagram shows two dimensions, the vertical dimension is how restrictive a view of collocation is with the most restrictive at the bottom and the least at the top. The horizontal dimension shows how much of the language a view of collocation covers, the top bulb of the diagram is larger than the bottom bulb.

Louw & Milojkovic (2016) argue that the link of collocation to context of situation is of great importance in applications of corpora in literature studies i.e. corpus stylistics.

Context of situation was illustrated by Firth in the following way:

“In his article ‘Personality and language in context’ Firth offers us what he calls a typical Cockney event in ‘one brief sentence’.
‘Ahng gunna gi’ wun fer Ber’. (I’m going to get one for Bert)
What is the minimum number of participants? Three? Four? Where might it happen? In a pub? Where is Bert? Outside? Or playing darts? What are the relevant objects? What is the effect of the sentence? ‘Obvious!’ you say. So is the convenience of the schematic construct called ‘context of situation’. It makes sure of the sociological component.” (Firth 1957: 182 as quoted in Louw & Milojkovic, 2016:61, emphasis added)

Awareness of the importance of context of situation is reflected in the following small Twitter poll where a majority of the 24 respondents opted for “meanings have words” over “words have meanings”:

Twitter poll

Although Louw concedes a view of collocation such as ngrams can reveal contexts of situation, opportunities to do so will be much rarer than if collocation is located near the top of the diagram – “abstracted at the level of syntax” as Firth put it.

Context of situation is also of great importance in language teaching and learning. For example task based teaching can be said to lay great weight on context of situation.

As Louw & Milojkovic (2016:26) put it :

“The closer collocation’s classifications are to context of situation, the more successful and enduring will be the approach of the scholars who placed them there. The more the term is constrained by the notion of language ‘levels’ and the linearity and other constraints of syntax, the less such classifications and the theories perched upon them are likely to endure. The reason for this is, as we shall see, that collocation takes us directly to situational meaning and acts as what Sinclair refers to as the ‘control mechanism’ for meaning”

Thanks for reading.

Notes

  1. Wikipedia Collocation https://en.wikipedia.org/wiki/Collocation
  2. Boot Camp online https://www.sketchengine.eu/bootcamp/boot-camp-online/#toggle-id-2

References

Louw, B., & Milojkovic, M. (2016). Corpus stylistics as contextual prosodic theory and subtext (Vol. 23). John Benjamins Publishing Company.

Thin word lists and fat concordances

One of the aspects of the proposed changes in the GCSE modern foreign language, MFL, syllabus in the UK is the use of corpus derived word lists 1. Distribution of words when counted follow a power law. A common power law is Pareto in economics – “Pareto showed that approximately 80% of the land in Italy was owned by 20% of the population” 2 . Similarly in any piece of text a large percentage of it comes from a relatively small amount of words – the top 100 words in English accounts for 50% of any text. The MFL review wants to use wordlists of the most frequent 2000 words – which would cover about 80% of any text.

Currently the MFL syllabus is topic based, so one issue here is that most words one can use for any particular topic will be limited to that topic. Or another way to say it is that although the word may be frequent within a topic it won’t have range and appear in other topics. The NCELP, National Centre for Excellence for Language Pedagogy in Vocabulary lists: Rationales and Uses writes “For example, many of the words for pets or hobbies will be low frequency words which are not useful beyond those particular topics. ” 3

There have been many critics of this wordlist driven proposal who have pointed out various weaknesses, see – AQA Exam board 4, ASCL, Association of School and College Leaders 5, Transform MFL 6 , Linguistics in MFL Project 7.

I want to take a different tack and argue that the wordlist driven approach is a half-hearted version of what could be a full blooded corpus approach to vocabulary content.

Corpus stylist Bill Louw writes that he “has become suspicious of decontextualised frequency lists” (Louw & Milojkovi, 2016:32). He calls such lists thin lists because they tend to cover things rather than events (Louw 2010). Events are states of affairs, what one of the originaters of the notion of meaning by collocation JR Firth has called context of situations. Looking at collocates of things in concordance lines allows us to “chunk the context of situation and culture into facts” (Louw 2010).

A concordance line brings together and displays instances of use of a particular word from the widely disparate contexts in which it occurs. To cover events one would need to examine collocates in concordances hence the term fat concordances.

The most frequent words are often bleached out of their literal meanings. Compare the word “take” on its own, most people would think of the meanings of “the act of receiving, picking up or even stealing” (Louw & Milojkovi, 2016:5), to a collocation such as “take place”, we see that the meaning here is distant from the literal meaning of “take” 8. When the NCELP say “Very high frequency words often have multiple meanings.” they are describing the notion of delexicalisation.

To demonstrate context of situation and context of culture, reproduced below is corpus linguist John Sinclair’s PhraseBite pamphlet which is reproduced in Louw (2008):

When she was- – – – – Phrasebite© John Sinclair, 2006.

  1. The first grammatical collocate of when is she
  2. The first grammatical collocate of when she is was
  3. The vocabulary collocates of when she was are hair-raising. On the first page:
    diagnosed, pregnant, divorced, raped, assaulted, attacked
    The diagnoses are not good, the pregnancies are all problematic.
  4. Select one that looks neutral: approached
  5. Look at the concordance, first page.
  6. Nos 1, 4, 5, 8,10 are of unpleasant physical attacks
  7. Nos 2, 3, 6, 7, 9 are of excellent opportunities
  8. How can you tell the difference?
  9. the nasties are all of people out and about, while the nice ones are of people working somewhere.
  10. Get wider cotext and look at verb tenses in front of citation.
  11. In all the nasties the verb is past progressive, setting a foreground for the approach.
  12. In the nice ones, the verb is non-progressive, either simple past or past-in-past.

Data for para 4 above.
(1) walking in Burnfield Road , Mansewood , when she was approached by a man who grabbed her bag
(2) teamed up with her mother in business when she was approached by Neiman Marcus , the department store
(3) resolved itself after a few months , when she was approached by Breege Keenan , a nun who
(4) Bridge Road close to the Causeway Hospital when she was approached by three men who attacked her
(5) Drive , off Saughton Mains Street , when she was approached by a man . He began talking the original
(6) film of The Stepford Wives when she was approached by producer Scott Rudin to star as
(7) bony. ‘ ‘ Kidd was just 15 when she was approached to be a model . Posing on
(8) near her home with an 11-year-old friend when she was approached by the fiend . The man
(9) finished a storming set of jazz standards when she was approached by SIR SEAN CONNERY . And she
(10) on Douglas Street in Cork city centre when she was approached by the pervert . The man persuaded

As Louw (2008) puts it:

“The power of this publication, coming as it did so close to Sinclair’s death, is to be found in the detail of his method. By beginning with a single word, she, from the whole of the Bank of English, Sinclair simply requests the most frequent collocate from the Bank of English (approximately 500 million words of running text). The computer provides it: when. The results are then merged: when+she. A new search is initiated for the most frequent collocate of this two-word phrase. The computer provides it: was. The concordances are scrutinized and cultural insights are gathered.”

The ASCL quotes applied linguist Vivian Cook:

“While word frequency has some relevance to teaching, other factors are also important, such as the ease with which the meaning of an item can be demonstrated (’blue’ is easier to explain than ‘local’) and its appropriateness for what pupils want to say (‘plane’ is more useful than ‘system’ if you want to travel)”

Blue is easier to explain than local because most collocates of blue are its literal colour meaning e.g. “blue eyes”. Yet consider this from a children’s corpus:

“There, I feel better. I’ve been needing a good cry for some time, and
now I shall be all right. Never mind it, Polly, I’m nervous and tired;
I’ve danced too much lately, and dyspepsia makes me blue;” and Fanny
wiped her eyes and laughed.” (An Old-fashioned Girl, by Louisa May Alcott)

So while it is true that blue is often associated with color, it also associates with mental states where the colour meaning is delexicalised, or washed out.

To conclude, the MFL proposal on using corpus derived word lists to drive content is not taking full advantage of corpora. They are promoting thin wordlists when they could also be promoting fat concordances.

Thanks for reading.

Notes

  1. MFL consultation – https://consult.education.gov.uk/ebacc-and-arts-and-humanities-team/gcse-mfl-subject-content-review/supporting_documents/GCSE%20MFL%20subject%20content%20consultation.pdf
  2. Pareto – https://en.wikipedia.org/wiki/Pareto_principle
  3. NCELP – https://resources.ncelp.org/concern/resources/t722h880z?locale=en)
  4. AQA – https://filestore.aqa.org.uk/content/our-standards/AQA-GCSE-MFL-POLICY-BRIEFING-APRIL-2021.PDF
  5. ASCL – https://www.ascl.org.uk/ASCL/media/ASCL/Our%20view/Consultation%20responses/2021/Draft-response-Consultation-on-the-Revised-Subject-Content-for-GCSE-Modern-Foreign-Languages.pdf
  6. Transform MFl – https://transformmfl.wordpress.com/2021/02/15/should-we-learn-words-in-frequency-order/
  7. Linguistics in MFL Project – http://www.meits.org/opinion-articles/article/the-dfe-ofqual-consultation-on-revised-gcse-qualifications-in-modern-foreign-languages-a-view-from-linguistics
  8. Take place – https://eflnotes.wordpress.com/2013/05/06/what-to-teach-from-corpora-output-frequency-and-transparency/

References

Louw, B. (2008). Consolidating empirical method in data-assisted stylistics. Directions in Empirical Literary Studies: In Honor of Willie Van Peer, 5, 243.

Louw, B. (2010). Collocation as instrumentation for meaning: a scientific fact. In Literary education and digital learning: methods and technologies for humanities studies (pp. 79-101). IGI Global.

Louw, B., & Milojkovic, M. (2016). Corpus stylistics as contextual prosodic theory and subtext (Vol. 23). John Benjamins Publishing Company.

Collocations need not be arbitrary

“On the whole, delexicalized verbs are a good way of introducing the concept of collocation to learners of any L1 background. I usually start with make/do and show how one goes with homework while the other goes with mistake (I did my homework; I made a lot of mistakes). Why is it this way and not the other way around? Because words have collocations – they prefer the company of certain other words.(Selivan, 2018:28, emphasis added)

The quote above, from a book published in 2018, reflects a pervasive view in the literature that collocations are arbitrary, that is, there is no particular reason why words “prefer the company of certain other words”, they just do.

Liu (2010) identifies this view of collocation-as-arbitrary as wide-spread amongst scholars, he also demonstrates that it is a common assumption in published teaching materials. Of the books, studies and websites on teaching collocations he observes collocation exercises as mainly noticing and memorising fixed units or in other words form focused exercises.

Example of such exercises are:

“identifying or marking collocations in a passage or in collocation dictionaries; reading passages with collocations highlighted or marked; filling in the blanks with the right word in a collocation; choosing or matching correct collocates; translating collocations from L2 back into L1 or vice versa; and memorization-type activities like repetition and rehearsal” (Liu, 2010:21)

There were fewer exercises on linking collocation forms to their meanings.

In addition to overlooking the motivated aspects of collocations, learners also miss the chance to generalise what they learn (Wray, 2000). That is, collocations also need to be analysed if students are to make the most of them in new situations of use.

To take the examples of “make” and “do”, the core meaning of “make” is create, which is a process that is purposeful and/or more effortful than the core meaning of “do” of completion/the finishing of something, which focuses on the end result of an activity rather than on any effort in the process of that activity. Understanding these core meanings can throw light on the following use of “did a mistake”:

“But I did a mistake in talking about it, you know, the last time and recently”

The larger context of this is from a spoken news report:

weren’t there. Let me handle it. I said, " Yes, ma’am. " ROSEN: The rebuke of Mr. Clinton by his wife came after the former president revived the dormant issue of Mrs. Clinton’s own misstatements about her 1996 trip to Bosnia. You’ll recall Mrs. Clinton, in recent months, spoke of sniper fire jeopardizing her landing. But contemporaneous video and eyewitness account revealed there was no such threat, and the senator effectively if belatedly defused the story with an omission of error in late March. SEN-HILLARY-CLINTO: But I did a mistake in talking about it, you know, the last time and recently. ROSEN: But in Jasper, Indiana, Thursday, Mr. Clinton blamed the controversy on the biased news media. B-CLINTON: She took a terrible beating in the press for a few days because she was exhausted at 11:00 at night when she started talking about Bosnia. ROSEN: In fact, Mrs. Clinton related the false Bosnia story numerous times including in a prepared speech delivered freshly at mid morning. B-CLINTON: And then the president (COCA SPOK: FOX SPECIAL REPORT WITH BRIT HUME 6:00 PM EST, 2008, emphasis added)

We could speculate that in using “did a mistake” Hilary Clinton was implying that in her “exhausted” state the “misstatement” was the opposite of a purposeful lie. It was just one of many activities she did that day which happened to be an error.

This can also be seen in another example from COCA – “If I do a mistake, I’m cooked”.

The context is from a written publication this time, although the language in question is in reported form:

three minutes, sometimes the whole roll — eleven minutes. It has an advantage: It takes you to the real tempo of life. Most movies are shot rather quickly and in a way where you can manipulate your reality because of the amount of coverage ” — shooting a scene from many different angles so that the director can choose among them in the editing room. ” Here my manipulation is quite different. I have to build it in with the lighting, with the framing. It requires much more attention at this stage. If I do a mistake, I’m cooked, ” he says with a laugh. # Wings’ visual style may be old-fashioned at heart, but its sound is high-tech all the way. Besides the six channels of top-notch stereo sound broadcast through the theater speakers, Wings audiences will hear two channels of three-dimensional sound through a special headset called the Personal Sound Environment (PSE) distributed to each moviegoer. Developed by Imax affiliate Sonics Associates of Birmingham, Alabama, the PSE incorporates both IMAX 3-D glasses and tiny speakers mounted between (COCA MAG: Omni, 1994, emphasis added)

The person is talking about a number of steps in their work routine in shooting a movie. The use of “do” here is to signal that any disastrous mistake is not to be blamed on the person considering all the other things he has to juggle.

Note that I could only find 3 uses of “do a mistake”, of which 2 are shown here (the third one I can’t offer any speculation on as I suspect more context needs to be chased up than that provided by COCA).

This blog was inspired by a question from a student about why a text had “in many respects” rather than “in many aspects”. I went onto COCA to have a look but could not discern any useful explanation. I just told the student that “aspects” does not seem to prefer “in many” compared to “respects”! Only later when I thought about the root word in common “spect” (meaning see) did a arguably useful explanation present itself – “in many respects” implies that the [re-seeings] have already been understood in some way. While “in many aspects” the reader may not yet know what these [partial-seeings] may be. These meanings could match up with the observation that “in many respects” often comes at the end of a clause or sentence while “in many aspects” may tend to come at the beginning of a clause or sentence.

Thanks for reading.

References:

Davies, M. (2008). Corpus of contemporary American English online. Retrieved from https://www.english-corpora.org/coca/.

Liu, D. (2010). Going beyond patterns: Involving cognitive analysis in the learning of collocations. TESOL Quarterly, 44(1), 4-30.

Selivan, L. (2018). Lexical Grammar: Activities for Teaching Chunks and Exploring Patterns. Cambridge University Press.

Wray, A. (2000). Formulaic sequences in second language teaching: Principle and practice. Applied linguistics, 21(4), 463-489.

Signs o’ the times – some/any invariant meanings and COCA

I am glad to be writing this particular (rushed, see end) post as it involves corpus linguistics and I have not done such a post for a while. It is also about my current interest – Columbia School linguistics.

I have been over the years less enamored of the power of corpus linguistics for language teaching. It is certainly very useful to access descriptions of language but that is not enough. Explanations are also needed. Columbia School (CS) linguistics is about analyzing invariant meanings that motivate choices in both grammar and lexis. It is about one form to one meaning mappings – an ideal aim when looking to help students.

Nadav Sabar in 2016 analyses the use of some and any. The following borrows heavily from this paper.

Most pedagogical grammars state (formal) rules such as “any is used in negative sentences and not in affirmative statements”. Yet such rules cannot account for why some is used in contexts that are said to be used for any. Sabar gives the following attested example:

1) When Yvonne lived in Italy, where it seems like the whole country is married, people always wanted to know about her personal life. I remember her telling me that every time she’d come back from a great vacation, the first question from married friends was, “Did you meet anybody?” It was as if the whole point of going on vacation was to meet someone. That she had a great time and saw something new and interesting didn’t matter. The entire vacation was cancelled or a flop because she didn’t meet someone. (http://www.yvonneandyvettetiquette.com/2008_09_01_archive.html)

Formal accounts could only say that any is also acceptable as in she didn’t meet anyone and is unconcerned with why the writer chose some in this case.

Formal accounts use the sentence as unit of analysis and see meaning as compositional – i.e. the meanings of individual words in a sentence add to the whole. CS uses signs (pairing of symbol to meaning) as the unit of analysis and sees meaning as instrumental rather than compositional. That is the individual meanings of signals need not add up to sentence meaning. There is a distinction between linguistic code that has an invariant meaning (that always corresponds to a linguistic signal) and interpretation of the code which is the subjective outcome of messages. Meanings are very sparse in that they do not encode messages but only offer prompts that may only suggest message elements.

The meaning hypotheses of some and any are shown below:

I.e. some as RESTRICTED suggests limits, internal divisions, boundaries while any as UNRESTRICTED suggests no boundaries, limits or divisions. Note that this does not mean that the domain in question in reality has no divisions or boundaries. Just that the reality is irrelevant to the message. Also note that in a pedagogical grammar such as Martin Parrott’s this meaning division between restricted and unrestricted is only described for stressed SOME and ANY.

Sabar uses the following as examples:

2) If you see something, say something. (New York City public safety slogan)
3) No parking any time (street sign)

In 2) some is used because the message suggested is a restriction on the set of things people see and say. The context drives the inference as to the nature of the restriction – suspicious looking things. Any could also have been used but that would not have been as effective a message – any would have suggested no restriction i.e. people should call no matter what they see.

Similarly in 3) any is used because there is no restriction on the domain of times of the day.

So now for 1) we can see some is used because the message suggests a restriction of the set of people Yvonne did not meet, and the context shows that this restriction as people who may qualify as marriage potential.

Now the interesting corpus linguistics part.

The methodology of CS first involves a qualitative step where some aspect of the sign in question is looked at. So for some which suggests restriction another element which suggests the same is looked for:

4) Some Feds [Federal workers] are held up as national heroes while others are considered a national joke. (ABC Nightline: Income Tax)

Here others is used to refer to a different subset of people within the domain of Federal workers. This message element is also suggested by some – RESTRICTED. This does not mean there is only one reason for the choice of these forms rather that this message feature of internal division is one reason out of many possible reasons that has motivated the choice of these two forms.

To test this claim generally we can look at a corpus to see if there is a higher than probable chance that others occurs with some more than others occurs with any.

We can do this in COCA by using these search terms:

COCA searches for others:

Favoured Disfavoured
some [up to 9 slots] others any [up to 9 slots] others

The following screenshot shows how to find some [up to 9 slots] others (do similar for any):

To find some with not others see the next screenshot (i.e. use the minus sign -):

And tabulating the data in a contingency table:

others present others absent
N % N %
some 19078 90 8946046 65
any 2022 10 4841946 35
Total 21100 100 13787992 100

p < .0001

The table percentages and significance test supports the claim that there is one message feature that motivates use of both some and others. Note that the meaning hypothesis itself is not directly tested; it is only indirectly tested via the counts in COCA. Sabar goes onto to test both qualitatively and quantitatively other signals that contribute to the meaning hypothesis of some – RESTRICTED and any – UNRESTRICTED.

I wondered how the singular other would distribute with any and some:

other present other absent
N % N %
any 39244 52 4811937 35
some 35175 48 8930621 65
Total 74419 100 13742558 100

p < .0001

Here can we say that singular other contributes to a message meaning of unrestricted? I have no idea as I have not had time to explore this further!

I hope dear reader you forgive the rushed nature of this post but I wanted to get something up before the risk of forgetting this due to holiday haze!

Thanks for indulging.

Update 1:

Thanks to heads up from some tweeters Michael Lewis in his book The English Verb in 1986 was also pointing to the primacy of meaning:

Update 2:

Nadav Sabar has pointed out that he looked for others in one direction i.e. following some/any whereas I looked at occurrence of others both following and before some/any.
Plus in a new version of his paper a window size of 2 is used instead of 9.

References:

Parrott, M. (2000). Grammar for English language teachers: with exercises and a key. Cambridge University Press.

Sabar, N. (2016). Using big data to test meaning hypotheses for any and some. In Otheguy, R., Stern, N., Reid, W. and Ruggles, J. (Eds.) Columbia School linguistics in the 21st century: advances in sign-based linguistics. Amsterdam/Philadelphia: John Benjamins. Retrieved from [https://www.academia.edu/33968803/Using_big_data_to_test_meaning_hypotheses_of_some_and_any]

The Prime Machine – a new concordancer in town

One of the impulses behind The Prime Machine was to help students distinguish similar or synonymous words. Recently a student of mine  asked about the difference between “occasion” and “opportunity”. I used the compare function on the BYU COCA to help the student induce some meaning from the listed collocations. It kinda, sorta, helped.

The features offered by The Prime Machine promises much better help for this kind of question. For example in the screenshot below the (Neighbourhood) Label function shows the kind of semantic tags associated with the words “occasion” and “opportunity”. Having this info certainly helps reduce time figuring out the differences between the words.

Neighbourhood Labels for the comparison of occasion and opportunity

One of the other sweet new features brought to the concordancer table, is a card display system as seen in the first screenshot below. Another is information based on Michael Hoey’s lexical priming theory such as shown in the second screenshot below.

Card display for comparison of words occasion and opportunity

Paragraph position of the words occasion and opportunity

The developer of the new concordancer Stephen Jeaco kindly answered some questions.

1. Can you speak a little about your background?

Well, I’m British but I’ve lived in China for 18 years now.  My first degree was in English Literature and then I did my MA Applied Linguistics/TESOL and my PhD was under the supervision of Michael Hoey with the University of Liverpool.

I took up programming as a hobby in my teens.  If I hadn’t got the grades to read English at York, I would have gone on to study Computer Science somewhere.  In those days the main thing was to choose a degree programme that you felt you would enjoy.  Over the years, though, I’ve kept a technical interest and produced a program here or there for MA projects and things like that.

I’ve worked at XJTLU for 12 years now.  I was the founding director of the English Language Centre, and set up and ran that for 6 years.  After rotating out of role, I moved into what is now called the Department of English where I lecture in linguistics to our undergraduate English majors and to our MA TESOL students.

2. What needs is The Prime Machine setting out to fill?

I started working on The Prime Machine in 2010, at the beginning of my part-time PhD.  At that time, I was interested in corpus linguistics but I found it hard to pass that enthusiasm on to my colleagues and students.  We had some excellent software and some good web tools, but internet access to sites outside China wasn’t always very reliable, and getting started with using corpora for language learning usually meant having to learn quite a lot about what to look for, how to look for it, and also how to understand what the data on-screen could mean.

Having taught EAP for about 10 years at that time, I felt that my Chinese learners of English needed a way to help them see some of the patterns of English which can be found through exploring examples, and in particular I wanted to help them see differences between synonyms and become familiar with how collocation information could help them improve their writing.

I’d read some of Michael Hoey’s work while doing my MA, and in his role of Pro Vice Chancellor for Internationalization I met him at our university in China.  His theory of lexical priming provided both a rationale for how patterns familiar in corpus linguistics relate to acquisition and it also gave me some specific aspects to focus on in terms of thinking about what to encourage students to notice in corpus lines. 

The main aim of The Prime Machine was to provide an easy start to corpus linguistic analysis – or rather an easy start to using corpus tools to explore examples.  Central to the concept were two main ideas: (1) that students would need some additional help finding what to look for and knowing what to compare and (2) that new or enhanced ways of displaying corpus lines and summary data could help draw their attention do different patterns.  Personally, I really like the “Card” display, and while KWIC is always going to be effective for most things, when it comes to trying to work out where specific examples come from and what the wider context might be, I think the cards go a long way towards helping students in their first experiences of DDL.

Practically speaking, another thing I wanted to do was to start with a search screen where they could get very quick feedback on anything that couldn’t be found and whether other corpora on the system would have some results. 

3. What kind of feedback have you got from students and staff on the corpus tool?

I’ve had a lot of feedback and development suggestions from my students at my own institution.  Up until a few weeks ago, The Prime Machine was only assessable to our own staff and students.  The majority of users have been students studying linguistics modules, mostly those who are taking or have taken a module introducing corpus linguistics. However, for several years now I have also had students using it as a research tool for their Final Year Project – a year-long undergraduate dissertation project where typically each of us has 4 to 5 students for one-to-one supervision.  They’ve done a range of projects with it including trying to apply some of Michaela Mahlberg’s approaches to another author, exploring synonyms, exploring the naturalness of student paraphrases or exam questions.  People often think of Chinese students as being shy and wanting to avoid direct criticism of the teacher, but our students certainly develop the skills for expressing their thoughts and give me suggestions!

In my own linguistics module on corpus linguistics, I’ve found the new version of The Prime Machine to be a much easier way to get students started at looking at their own English writing or transcripts of their speech and getting them to consider whether evidence about different synonyms and expressions from corpora can help them improve their English production.  Personally, I use it as a stepping stone to introducing features of WordSmith Tools and other resources.

In terms of staff input, I’ve had a couple of more formal projects, getting feedback from colleagues on the ranking features and the Lines and Cards displays.  I’ve also had feedback by running sessions introducing the tool as part of a professional development day and a symposium.  Some of my colleagues have used it a bit with students, but I think while it required access from campus and before I had the website up, it was a bit too tricky even on site. 

On the other hand, I’ve given several conference papers introducing the software, and received some very useful comments and suggestions.

I need to balance my teaching workload, time spent working towards more concrete research outputs and family life, but if we can get over some of the connectivity issues and language teachers want to start using The Prime Machine with their students, I’m going to need as much feedback as possible.  I’d like to hope I could respond and build up or extend the tool, but at the same time there’s a need to try to keep things simple and suitable for beginners. 

 4. You have some extra materials for students at your institution, could you describe these?

There’s nothing really very special about these.  But having the two ways of accessing the server (offsite vs. on-site) means if corpus resources come with access restrictions or if a student wants to set up a larger DIY corpus for a research project I’m able to limit access to these.

Other than additional corpora, there are a few simple wordlists which I use in my own teaching and some additional options for some of the research tools.

5. What developments are in the pipeline for future versions of The Prime Machine?

One of the main reasons I wanted The Prime Machine to be publically available and available for free was so that others would be able to see some of the features I’ve written about or presented about at conferences in action.  In some ways, my focus has changed a bit towards smaller undergraduate projects for linguistics, but I still have interests and contacts in English language teaching.  Given some of the complications of connecting from Europe to a server in China, unless someone finds it really interesting and wants to set up a mirror server or work more collaboratively, I don’t think I can hope to have a system as widely popular and reliable as the big names in online concordancing tools.  But having interviews like this and getting the message out about the software through social media means that there is a lot more potential for suggestions and feature requests to help me develop in ways I’ve not thought of.

But left to my own perceptions and perhaps through interactions with my MA TESOL students, local high schools and our language centre, I’m interested in adding to the capabilities of the search screen to help students find collocations when the expression they have in mind is wildly different from anything stored in the corpus.  At the moment, it can do quite a good job of suggesting different word forms, giving some collocation suggestions and using other resources to suggest words with a similar meaning.  But sometimes students use words together in ways that (unless they want to use language very creatively) would stump most information retrieval systems.

Another aspect which I could develop would be the DIY text tools, which currently start to slow down quite rapidly when reading more than 80,000 words or so.  That would need a change of underlying data management, even without changing any of the features that the user sees.  I added those features in the last month or two before my current cohort of students were to start their projects, and again, feedback on those tools and some of the experimental features would be really useful.  On the other hand, I point my own students to tools like WordSmith Tools and AntConc when it comes to handling larger amounts of text!

The other thing, of course, is that I’m looking forward to getting hold of the BNC 2014 and adding another corpus or two.  Again, I can’t compete with the enormous corpora available elsewhere, but since most of the features I’m trying to help students notice differ across genre, register and style, I am quite keen on moderately sized corpora which have clearly defined sub-corpora or plenty of metadata.

One thing I would like to explore is porting The Prime Machine to Mac OS, and also possibly to mobile devices and tablets.  But as it stands, using The Prime Machine requires the kind of time commitment and concentration (and multiple searches and shuffling of results) that may not be so suitable for mobile phones.  I sometimes think it is more like the way we’d hunt for a specialist item on Taobao or Ebay when we’re not sure of a brand or even a product name, rather than the kind of Apps we tend to expect from our smart phones which provide instant ready-made answers.  Redesigning it for mobile use will need some thought.

Personally, I’m hoping to start one or two new projects, perhaps working with Chinese and English or looking more generally at Computer Assisted Language Teaching.  

Now that The Prime Machine is available, while of course it would be great if people use it and find it useful, more importantly beyond China I think I’d hope that it could inspire others to try creating new tools.  If someone says to the developer working on their new corpus web interface, “Do you think you could make a display that looks a bit like that?”, or “Can you pull in other data resources so those kinds of suggestions will pop up?”, I think they wouldn’t find it difficult, and we’d probably have more web tools which are a bit more user-friendly in terms of operation and more intuitive in terms of support for interpretation of the results. 

6. What other corpus tools do you recommend for teachers and students?

Well, I love seeing the enhancements and new features we get with new versions of popular corpus tools.  And at conferences, I’m always really impressed by some of the new things people are doing with web-based tools.   But one thing that I would say is that for the students I work with, I think knowing a bit more about the corpus is more useful than having something billions of words in size; being able to explore a good proportion of concordance lines for a mid-frequency item is great.  I think having a list of collocations or lines from millions of different sources to look at isn’t going to help language learners become familiar with the idea that concordance lines and corpus data can help them understand, explore and remember more about how to use words effectively. 

Nevertheless, I think those of us outside Europe should be quite jealous of the Europe-wide university access to Sketch Engine that’s just started for the next 5 years.  I also really like the way the BYU tool has developed.  I was thrilled to get hold of the MAT software for multidimensional analysis.  And I think I’ll always have my WordSmith Tools V4 on my home computer, and a link to our university network version of WordSmith Tools in my office and in the computer labs I use.

Thanks for reading. Do note if you comment here I need to forward them to Stephen (as he is behind the great firewall of China) and so there may be a delay in any feedback. Alternatively contact Stephen yourself from the main The Prime Machine website.

Also do note that the current available version of The Prime Machine may not work at the moment but wait a few days for a fix to be applied by Stephen and try again then.

Finding relative frequencies of tenses in the spoken BNC2014 corpus

Ginseng English‏ @ginsenglish issued a poll on twitter asking:

This is a good exercise to do on the new spoken BN2014 corpus. See instructions to get access to the corpus.

You need to get your head around the parts of speech (POS) tag. The BNC2014 uses CLAWS 6 tagset. For the past tense we can use past tense of lexical verbs and past tense of DO. Using the past tenses of BE and HAVE would also pull in their uses as auxiliary verbs which we don’t want. This could be a neat future exercise in figuring out how to filter out such searches. Another time! Onto this post.

Simple past:

[pos=”VVD|VDD”]

pos = part of speech

VVD = past tense of lexical(main) verbs

VDD = past tense of DO

| = acts like an OR operator

So the above looks for parts of speech tagged as either past tense of lexical verbs or past tense of DO.

Simple present

The search term for present simple is also relatively simple to wit:

pos=[“VVZ”]

VVZ     -s form of lexical verb (e.g. gives, works)

Note the above captures third person forms, how can we also catch first and second person forms?

Present perfect

[pos = “VH0|VHZ”] [pos =”R.*|MD|XX” & pos !=”RL”]{0,4} [pos = “AT.*|APPGE”]? [pos = “JJ.*|N.*”]? [pos =”PPH1|PP.*S.*|PPY|NP.*|D.*| NN.*”]{0,2} [pos = “R.*|MD|XX”]{0,4} [pos = “V.*N”]

The search of present perfect may seem daunting; don’t worry the structure is fairly simple, the first search term [pos = “VH0|VHZ”] is saying look for all uses of HAVE and the last term [pos = “VVN”] is saying look for all past participles of lexical verbs.

The other terms are looking for optional adverbs and noun phrases that may come in-between namely

“adverbs (e.g. quite, recently), negatives (not, n’t) or multiword adverbials (e.g. of course, in general); and noun phrases: pronouns or simple NPs consisting of optional premodifiers (such as determiners, adjectives) and nouns. These typically occur in the inverted word order of interrogative utterances (Has he arrived? Have the children eaten yet?)” – Hundt & Smith (2009).

Present progressive

[pos = “VBD.*|VBM|VBR|VBZ”] [pos =”R.*|MD|XX” & pos !=”RL”]{0,4} [pos = “AT.*|APPGE”]? [pos = “JJ.*|N.*”]? [pos =”PPH1|PP.*S.*|PPY|NP.*|D.*| NN.*”]{0,2} [pos = “R.*|MD|XX”]{0,4} [pos = “VVG”]

A similar structure to the present perfect search. The first term [pos = “VBD.*|VBM|VBR|VBZ”]  is looking for past and present forms of BE and the last term [pos = “VVG”] for all ing participle of lexical verb. The terms in between are for optional adverb, negatives and noun phrases.

Note that all these searches are approximate – manual checking will be needed for more accuracy.

So can you predict the order of these forms? Let me know in the comments the results of using these search terms in frequency per million.

Thanks for reading.

Other search terms in spoken BNC2014 corpus.

Update:

Ginseng English blogs about frequencies of forms found in one study. Do note that as there are 6 inflectional categories in English – infinitive, first and second person present, third person singular present, progressive, past tense, and past participle, the opportunities to use the simple present form is greater due to the 2 categories of present.

References:

Hundt, M., & Smith, N. (2009). The present perfect in British and American English: Has there been any change, recently. ICAME journal, 33(1), 45-64. (pdf) Available from http://clu.uni.no/icame/ij33/ij33-45-64.pdf

Successful Spoken English – interview with authors

The following is an email interview with the authors, Christian Jones, Shelley Byrne, Nicola Halenko, of the recent Routledge publication Successful Spoken English: Findings from Learner Corpora. Note that I have not yet read this (waiting for a review copy!).

Successful Spoken English

1. Can you explain the origins of the book?

We wanted to explore what successful learners do when they speak and in particular learners from B1-C1 levels, which are, we feel, the most common and important levels. The CEFR gives “can do” statements at each level but these are often quite vague and thus open to interpretation. We wanted to discover what successful learners do in terms of their linguistic, strategic, discourse and pragmatic competence and how this differs from level to level.  

We realised it would be impossible to use data from all the interactions a successful speaker might have so we used interactive speaking tests at each level. We wanted to encourage learners and teachers to look at what successful speakers do and use that, at least in part, as a model to aim for as in many cases the native speaker model is an unrealistic target.

2. What corpora were used?

The main corpus we used was the UCLan Speaking Test Corpus (USTC). This contained data from only students  from a range of nationalities who had been successful (based on holistic test scoring) at each level, B1-C1. As points of comparison, we also recorded native speakers undertaking each test. We also made some comparisons to the LINDSEI (Louvain International Database of Spoken English Interlanguage) corpus and, to a lesser extent, the spoken section of the BYU-BNC corpus.

Test data does not really provide much evidence of pragmatic competence so we constructed a Speech Act Corpus of English (SPACE) using recordings of computer-animated production tasks by B2 level learners  for requests and apologies in a variety of contexts. These were also rated holistically and we used only those which were rated as appropriate or very appropriate in each scenario. Native speakers also recorded responses and these were used as a point of comparison. 

3. What were the most surprising findings?

In terms of the language learners used, it was a little surprising that as levels increased, learners did not always display a greater range of vocabulary. In fact, at all levels (and in the native speaker data) there was a heavy reliance on the top two thousand words. Instead, it is the flexibility with which learners can use these words which changes as the levels increase so they begin to use them in more collocations and chunks and with different functions. There was also a tendency across levels to favour use of chunks which can be used for a variety of functions. For example, although we can presume that learners may have been taught phrase such as ‘in my opinion’ this was infrequent and instead they favoured ‘I think’ which can be used to give opinons, to hedge, to buy time etc .

In terms of discourse, the data showed that we really need to pay attention to what McCarthy has called ‘turn grammar’. A big difference as the levels increased was the increasing ability of learners to co-construct  conversations, developing ideas from and contributing to the turns of others. At B1 level, understandably, the focus was much more on the development of their own turns.

4. What findings would be most useful to language teachers?

Hopefully, in the lists of frequent words, keywords and chunks they have something which can inform their teaching at each of these levels. It would seem to be reasonable to use, as an example, the language of successful B2 level speakers to inform what we teach to B1 level speakers. Also, though tutors may present a variety of less frequent or ‘more difficult’ words and chunks to learners, successful speakers will ultimately employ lexis which is more common and more natural sounding in their speech, just as the native speakers in our data also did.

We hope the book will also give clearer guidance as to what the CEFR levels mean in terms of communicative competence and what learners can actually do at different levels. Finally, and related to the last  point, we hope that teachers will see how successful speakers need to develop all aspects of communicative competence (linguistic, strategic, discourse and pragmatic competence) and that teaching should focus on each area rather than only one of two of these areas.

There has been some criticism, notably by Stefan Th. Gries and collaborators that much learner corpus research is restricting itself factorwise when explaining a linguistic phenomenon. Gries calls for a multi-factor approach whose power can be seen in a study conducted with Sandra C. Deshors, 2014, on the uses of may, can and pouvoir with native English users and French learners of English. Using nearly 4000 examples from 3 corpora, annotated with over 20 morphosyntactic and semantic features, they found for example that French learners of English see pouvoir as closer to can than may.

The analysis for Successful Spoken English was described as follows:

“We examined the data with a mixture of quantitative and qualitative data analysis, using measures such as log-likelihood to check significance of frequency counts but then manual examination of concordance line to analyse the function of language.”

Hopefully with the increasing use of multi-factor methods learner corpus analysis can yield even more interesting and useful results than current approaches allow.

Chris and his colleagues kindly answered some follow-up questions:

5. How did you measure/assign CEFR level for students?  

Students were often already in classes where they had been given a proficiency test and placed in a level . We then gave them our speaking  test and only took data from students who had been given a global pass score of 3.5 or 4 (on a scale of 0-5). The borderline pass mark was 2.5 so we only chose students who had clearly passed but were not at the very top of the level and obviously then only those who gave us permissions to do so. The speaking tests we used were based on Canale’s (1984) oral proficiency interview design and consisted of a warm up phase, a paired interactive discussion task and a topic specific conversation based on the discussion task. Each lasted between 10-15 minutes.

6. So most of the analysis was in relation to successful students who were measured holistically?  

Yes.

7. And could you explain what holistically means here?

Yes, we looked at successful learners at each CEFR level, according to the test marking criteria. They were graded for grammar, vocabulary, pronunciation, discourse management and interactive ability based on criteria such as  the following (grade 3-3.5) for discourse management ‘Contributions are normally relevant, coherent and of an appropriate length’. These scores were then amalgamated into a global score. These scales are holistic in that they try to assess what learners can do in terms of these competences to gain an overall picture of their spoken English rather than ticking off a list of items they can or cannot use. 

8. Do I understand correctly that comparisons with native speaker corpora were not as much used as with successful vs unsuccessful students? 

No, we did not look at unsuccessful students at all. We were trying to compare successful students at B1-C1 levels and to draw some comparison to native speakers. We also compared our data to the LINDSEI spoken learner corpus to check the use of key words.

9. For the native speaker comparisons what kind of things were compared?

We compared each aspect of communicative competence – linguistic, strategic, discourse and pragmatic competences to some degree. The native speakers took exactly the same tests so we compared (as one example), the most frequent words they used.

 

Thanks for reading.

 

References:

Deshors, S. C., & Gries, S. T. (2014). A case for the multifactorial assessment of learner language. Human Cognitive Processing (HCP), 179. Retrieved from https://www.researchgate.net/publication/300655572_A_case_for_the_multifactorial_assessment_of_learner_language

 

CORE blimey – genre language

A #corpusmooc participant in answering a discussion question on what they would like to use corpora for replied that they wanted a reference book that shows various common structures in various genres such as “letters of condolence, public service announcements, obituaries”.

The CORE (Corpus of Online Registers) corpus at BYU along with the virtual corpora feature allows a way to reach for this.

For example, the screenshot below shows the keywords of verbs & adjectives in the Reviews genre:

Before I briefly show how to make a virtual corpus do note that the standard interface allows you do to a lot of things with the various registers. The CORE interface shows you examples of this. For example the following shows the distribution of the present perfect across the genres:

Create virtual corpora

To create a virtual corpus first go to the CORE start page:

Then click on Texts/Virtual and get this screen:

Next press Create corpus to get this screen:

We want the Reviews Genre so choose it from the drop down box:

Then press Submit to get the following screen:

Here you can either accept these texts or say you want to build only a film review corpus manually look through links and filter for film reviews only. Give your corpus a name or add it to an already existing corpus. Here we give it the name “review”:

Then after submitting you will be taken to the following screen which shows you all your virtual corpora collection we can see the corpus we just created at number 5:

Now you can list keywords.

Do note that the virtual corpora feature is available in most of the BYU collection so if genre is not your thing maybe the other choices of corpora might be useful.

Thanks for reading and do let me know if anything appears unclear.

 

#TESOL2017 – Corpus related talks and posters

While IATEFL2017 may well have the razzledazzle, TESOL2017 is the big kahuna. Find below corpus related talks and posters (program pdf). There are some well known names here – Kiyomi Chujo, Randi Reppen, Diane Schmitt, Dilin Liu, Keith Folse.

Do TESOL record talks like IATEFL? Otherwise am putting faith in some tweeters to get inkling of what goes down. You know what to do folks.

Tuesday 21 March
Developing Academic Discourse Competence Through Formulaic Sequences
Content Area: Vocabulary/Lexicon
The Academic Formulas List and Phrasal Expressions List include formulaic sequences that build on traditional lists, such as the Academic Word List, to better meet student proficiency needs at the discourse level. Participants investigate the lists; experience collaborative activities designed to assist students in acquisition, including online and corpus-based; and discuss considerations for adaptation and implementation. Step-by-step guides provided.
Alissa Nostas, Arizona State University, USA
Mariah Fairley, American University in Cairo, Egypt
Susanne Rizzo, American University in Cairo, USA

Wednesday 22 March
Engaging Students in Making Grammar Choices: An In‑Depth Approach
Content Area: Grammar
Appropriate use of grammar structures in academic writing can be a challenge even for advanced ESL writers. Drawing on corpus research on the characteristics of written discourse, the presenters demonstrate how to engage students in making effective grammar choices to improve their academic writing. Sample instructional materials are provided.
Wendy Wang, Eastern Michigan University, USA
Susan Ruellan, Eastern Michigan University, USA

Lexical Bundles in L1 and L2 University Student Argumentative Essays
Content Area: Second Language Writing/Composition
This presentation reports findings of a corpus-based analysis of the use, overuse, and misuse of lexical bundles in L2 university student argumentative essays. The presentation also provides ways ESL composition instructors can assist learners in using lexical bundles more appropriately.
Tetyana Bychkovska, Ohio University, USA

Teachers’ U.S. Corpus
Content Area: Research/Research Methodology
The presenters amassed a linguistic corpus-TUSC-representing approximately 4 million words based on over 50 K–12 content area textbooks. Findings of the corpus, including word lists representative of academic language, are offered. Participants are invited to discuss ways this corpus may assist K–12 teachers, especially teachers of ELLs.
Seyedjafar Ehsanzadehsorati, Florida International University, USA

And Furthermore
Content Area: Discourse and Pragmatics
Advanced learner materials offer few guidelines for the use of the expressions “moreover,” “furthermore,” “in fact,” “likewise,” “in turn,” and other additive connectors. Grounded in pragmatic theory and drawing on written corpus examples and experimental speaker judgement data, this talk defines optimal uses and paves a path to enlightened class instruction.
Howard Williams, Teachers College, Columbia University, USA

Teacher Electronic Feedback in ESL Writing Course Chats
Content Area: Second Language Writing/Composition
This corpus-based study analyzes the rhetorical moves, uptake, and student perceptions of the teacher-student chats from five freshman ESL writing courses taught by three expert teachers. Findings show that chats are useful for establishing rapport and clarifying feedback, but we suggest that longer chat sessions may be more effective.
Estela Ene, Indiana University Purdue University Indianapolis, USA
Thomas Upton, Indiana University Purdue University Indianapolis, USA

Using Corpus Linguistics in Teaching ESL Writing
Content Area: Applied Linguistics
This session explores the use of corpus linguistics in teaching L2 writing as an effective way to bring authentic language into the classroom. The presenters discuss ways of incorporating corpora in teaching L2 writing and demonstrate a sample activity of how to use a corpus to address discourse competence.
Gusztav Demeter, Case Western Reserve University, USA
Ana Codita, Case Western Reserve Universtiy, USA
Hee-Seung Kang, Case Western Reserve University, USA

How Technology Shapes Our Language and Feedback: Mode Matters
Content Area: Applied Linguistics
This presentation explores how the use of evaluative language differs between parallel corpora of text and screencast feedback and what this means for the role of feedback and position of instructor. In understanding the implications of technology choices, instructors can better match tools to their pedagogical purposes
Kelly Cunningham, Iowa State University, USA

Posters
An Effective Bilingual Sentence Corpus for Low-Proficiency EFL Learners
Content Area: CALL/Computer-Assisted Language Learning/
Technology in Education
Kiyomi Chujo, Nihon University, Japan

Propositional Precision in Learner Corpora: Turkish and Greek EFL Learners
Content Area: English as a Foreign Language
Jülide Inözü, Cukurova University, Turkey
Cem Can, Cukurova University, Turkey

Thursday 23 March
Corpus‑Based Learning of Reporting Verbs in L2 Academic Writing
Content Area: Higher Education
We present findings from our study on the effectiveness of corpus based learning of reporting verbs during a multidraft literature review assignment. The results suggest corpus-based instruction can improve L2 students’ genre awareness and lexical variety without time consuming training. Participants receive sample corpus-based teaching
materials used in the revision workshop.
Ji-young Shin, Purdue University, USA
R. Scott Partridge, Purdue University, USA
Ashley J. Velázquez, Purdue University, USA
Aleksandra Swatek, Purdue University, USA
Shelley Staples, University of Arizona, USA

Providing EAP Listening Input: An Evaluation of Recorded Listening Passages
Content Area: Listening, Speaking/Speech
Are the recorded passages that accompany listening textbooks providing students with exposure to all the necessary elements of academic lecture language? The presenter shares results of a corpusbased study, illustrating what recorded passages do well, where they fall short, and providing activities designed to supplement EAP listening instruction.
Erin Schnur, Northern Arizona University, USA

Developing Learner Resources Using Corpus Linguistics
Randi Reppen, Northern Arizona University, USA

Applying Research Findings to L2 Writing Instruction
Content Area: Second Language Writing/Composition
Effective pedagogical practices have a strong research base and respond directly to students’ learning needs. Presenters share materials developed for such needs in EAP writing classrooms, drawing on grammar/vocabulary corpus research, integration of CBI principles with current L2 writing approaches, and research findings regarding assignment sequencing for larger end-products.
Margi Wald, UC Berkeley, USA
Jan Frodesen, UC Santa Barbara, USA
Diane Schmitt, Nottingham Trent University, United Kingdom (Great Britain)
Gena Bennett, Independent, USA

Teaching Students Self‑Editing in Writing With Interactive Online Corpus Tool
Content Area: CALL/Computer-Assisted Language Learning/
Technology in Education
L2 academic writers often struggle with word choice and collocates when composing in academic English. In this teaching tip, the presenter uses http://www.wordandphrase.info, a free corpus-based online interactive tool, to show how to teach self-editing strategies to L2 writers and demonstrates activities that can be incorporated into EAP writing courses.
Aleksandra Swatek, Purdue University, USA

Corpus 101: Navigating the Corpus of Contemporary American English (COCA)
Content Area: Vocabulary/Lexicon
The Corpus of Contemporary American English (COCA) may look overwhelming at first, but it is in fact an easy-to-use resource. Presenters guide participants through step-by-step navigation of this valuable tool, sharing tips and ideas for teachers and tasks for students that relate to several of COCA’s search and analysis functions.
Heather Gregg Zitlau, Georgetown University, USA
Heather Weger, Georgetown University, USA
Kelly Hill Zirker, Diplomatic Language Services, USA

Using a Medical Research Corpus to Teach ESP Students
Content Area: English for Specific Purposes
The study discussed investigated how expert writers use lexical bundles in medical research articles. More than 200 bundles were identified using a corpus of more than 1 million words. A structural and functional analysis revealed patterns that can be used in developing materials for medical students in international ESP classes.
Ndeye Bineta Mbodj, Health Department Thies University, Senegal

Using Corpora for Engaging Language Teaching: Effective Techniques and Activities
Using concrete examples from their new book published by TESOL, the presenters introduce some common useful procedures and activities for using corpora to teach various aspects of English, including vocabulary, grammar, and writing. They also explain how to develop and use corpora to assess learner language and develop teaching materials.
Dilin Liu, University of Alabama, USA
Lei Lei, Huazhong University of Science and Technology, China

Flexible, Free, and Open Data‑Driven Learning for the Masses
Content Area: Media (Print, Broadcast, Video, and Digital)
This presentation shares findings from multisite research with the open-source FLAX (Flexible Language Acquisition) project. Open digital collections used in formal classroom-based language education and in non-formal online education (MOOCs) are presented to demonstrate how openly licensed linguistic content using data-driven methods can support learning, teaching, and materials development.
Alannah Fitzgerald, Concordia University, USA

Posters
Visualizing Vocabulary Across Cultures: Web Images as a Corpus
Content Area: Vocabulary/Lexicon
Cameron Romney, Doshisha University, Japan
John Campbell-Larsen, Kyoto Women’s University, Japan

Developing Autonomous Academic Writing Competence Through Corpus Linguistics
Content Area: CALL/Computer-Assisted Language Learning/
Technology in Education
Chinger Zapata, Universidad Católica del Norte, Chile
Hugo Keith

Data-Driven Learning (DDL) for Teaching Vocabulary and Grammar
Content Area: Teaching Methodology and Strategy
Pramod Sah, University of British Columbia, Canada
Anu Upadhaya, Tribhuvan University, Nepal

Friday 24 March
16 Keys to Teaching ESL Grammar and Vocabulary
Content Area: Grammar
This session uses corpus linguistics data to examine not only which grammar points should be taught but which vocabulary should be taught with each key grammar point. Sample lessons for teaching vocabulary with grammar and tips for designing and teaching these activities are presented.
Keith Folse, University of Central Florida, USA

Beyond Word Lists: Approaching Verbal Complements Lexicogrammatically and Cognitively
Content Area: Grammar
Gerund and infinitive verbal complements are often taught back-to-back via the use of memorization and word lists. This presentation suggests varying lesson placement, approaching the subject from a position of conceptualization of components drawn from Conti’s rule, and incorporating corpus data in classroom materials to improve salience thereof.
Miranda Hartley, University of Alabama, USA

Corpus‑Based Comparison Between Two Lists of Academic English Words
Content Area: Vocabulary/Lexicon
The study discussed compares Coxhead’s Academic Word List and Gardner and Davies’ Academic Vocabulary List in an independently developed 72-million-token university academic corpus to reveal which list is more suitable for academic vocabulary education across different academic disciplines to improve the effectiveness of English‑medium instruction.
Huamin Qi, Western University, Canada

Fostering Effective Participation in L1 Discourse Communities Through Formulaic Sequences
Content Area: Vocabulary/Lexicon
While vocabulary lists contribute substantially to lexical knowledge, discourse-level proficiency remains a challenge. The Academic Formulas List and Phrasal Expressions List, sets of formulaic sequences, address this challenge, helping learners participate more effectively in L1 discourse communities. Facilitators share online and corpus-based activities for formulaic sequence acquisition.
Susanne Rizzo, American University in Cairo, Egypt
Alissa Nostas, Arizona State University, USA
Mariah Fairley, American University in Cairo, Egypt

Developing an Open Educational Resources EAP Corpus
Content Area: English for Specific Purposes
This presentation focuses on the development of an open educational resources EAP corpus. Presenters demonstrate how the corpus can be accessed and downloaded, reused in a variety of ways, revised, remixed, and redistributed to other interested teachers, researchers, and/or students.
Brent Green, Salt Lake Community College, USA
Dean Huber, Salt Lake Community College, USA
George Ellington, Salt Lake Community College, USA

The Emergence of Academic Language Among Advanced Learners
Content Area: Second Language Writing/Composition
This session addresses the gradual changes of academic language based on a pilot study of 35 students over a 16-week graduate course. Suggestions and practical activities, informed by these findings, are demonstrated, including academic discourse techniques and the use of corpora and other online tools for text analysis.
Cheryl Zimmerman, California State University, Fullerton, USA
Jun Li, California State University, Fullerton, USA