Locating collocation

The Wikipedia entry on collocation says:
“..a collocation is a series of words or terms that co-occur more often than would be expected by chance”. 1
This is the description of collocation that is linked to by leaders in corpus tools SketchEngine in their syllabus for a new online course.2

Note the statistical aspect in the definition “more often than would be expected by chance”.

The wiki entry then reads “There are about six main types of collocations: adjective + noun, noun + noun (such as collective nouns), verb + noun, adverb + adjective, verbs + prepositional phrase (phrasal verbs), and verb + adverb. “

Note the emphasis on the grammar aspect of collocation.

Bill Louw would place this wiki definition (alongside Goran Kjellmer’s definition of collocation – ‘sequence of words that occurs more than once in identical form…and which is grammatically well structured’) at the bottom of the diagram below:

(Louw & Milojkovic 2016: 53)

The diagram shows two dimensions, the vertical dimension is how restrictive a view of collocation is with the most restrictive at the bottom and the least at the top. The horizontal dimension shows how much of the language a view of collocation covers, the top bulb of the diagram is larger than the bottom bulb.

Louw & Milojkovic (2016) argue that the link of collocation to context of situation is of great importance in applications of corpora in literature studies i.e. corpus stylistics.

Context of situation was illustrated by Firth in the following way:

“In his article ‘Personality and language in context’ Firth offers us what he calls a typical Cockney event in ‘one brief sentence’.
‘Ahng gunna gi’ wun fer Ber’. (I’m going to get one for Bert)
What is the minimum number of participants? Three? Four? Where might it happen? In a pub? Where is Bert? Outside? Or playing darts? What are the relevant objects? What is the effect of the sentence? ‘Obvious!’ you say. So is the convenience of the schematic construct called ‘context of situation’. It makes sure of the sociological component.” (Firth 1957: 182 as quoted in Louw & Milojkovic, 2016:61, emphasis added)

Awareness of the importance of context of situation is reflected in the following small Twitter poll where a majority of the 24 respondents opted for “meanings have words” over “words have meanings”:

Twitter poll

Although Louw concedes a view of collocation such as ngrams can reveal contexts of situation, opportunities to do so will be much rarer than if collocation is located near the top of the diagram – “abstracted at the level of syntax” as Firth put it.

Context of situation is also of great importance in language teaching and learning. For example task based teaching can be said to lay great weight on context of situation.

As Louw & Milojkovic (2016:26) put it :

“The closer collocation’s classifications are to context of situation, the more successful and enduring will be the approach of the scholars who placed them there. The more the term is constrained by the notion of language ‘levels’ and the linearity and other constraints of syntax, the less such classifications and the theories perched upon them are likely to endure. The reason for this is, as we shall see, that collocation takes us directly to situational meaning and acts as what Sinclair refers to as the ‘control mechanism’ for meaning”

Thanks for reading.

Notes

  1. Wikipedia Collocation https://en.wikipedia.org/wiki/Collocation
  2. Boot Camp online https://www.sketchengine.eu/bootcamp/boot-camp-online/#toggle-id-2

References

Louw, B., & Milojkovic, M. (2016). Corpus stylistics as contextual prosodic theory and subtext (Vol. 23). John Benjamins Publishing Company.

Advertisement

Why the pineapple?

This post can be considered a follow on from the post Collocations need not be arbitrary.

One response that proponents of the lexical approach in language teaching could make to the issue of looking at meanings and collocations is simply to define collocation as one level of meaning. John Firth, as cited by Joseph (2003), put it thus:

“The statement of meaning by collocation and various collocabilities does not involve the definition of word meaning by means of further sentences in shifted terms. Meaning by collocation is an abstraction at the syntagmatic level and is not directly concerned with the conceptual or idea approach to the meaning of words. One of the meanings of night is its collocability with dark, and of dark, of course, collocation with night.”

Joseph, 2003: 130

Defining collocations as one level of meaning is reasonable but it does not provide an explanation that may be pedagogically useful. Cognitive linguistics claims to provide such a use.

Let’s take the question of the difference between choosing highest mountain and tallest mountain that arose in a class recently. One explanation is based on the distribution of what collocates with tall – that is living things (tall man, tall tree) and man made objects (tall building, tall pole). Tall tends not to collocate with natural objects such as mountains.

That is where a Firthian (and by consequence a lexical) approach stops. A cognitive analysis by Dirven and Taylor (1988) showed that general cognition (in the form of concepts) can explain further.

Highest mountain is preferred as the concept HIGH includes both a meaning of vertical position (positional meaning) as well as vertical length (extensional meaning) whereas the concept TALL only includes the meaning of vertical length. So although you can find tallest mountain people often think of being at the top of a mountain hence the vertical position is emphasised rather than vertical length (see figure below):

Figure 1. after Dirven & Taylor, 1988: 386

Thanks for reading. And do have a read of a less favourable view of cognitive linguistics at a recent Geoff Jordan blog Anybody seen a pineapple?

Update

Marc Jones writes about cueing as a way to learn chunks Pinneapples?

References

Dirven, R., & Taylor, J. R. (1988). The conceptualisation of vertical space in English: The case of tall. In Topics in cognitive linguistics, B. Rudzka-Ostyn (ed), 379. John Benjamins.

Joseph, J. (2003). Rethinking linguistic creativity. In Rethinking Linguistics, H. Davis & T.J. Taylor (eds), 121–150. London: Routledge.

Collocations need not be arbitrary

“On the whole, delexicalized verbs are a good way of introducing the concept of collocation to learners of any L1 background. I usually start with make/do and show how one goes with homework while the other goes with mistake (I did my homework; I made a lot of mistakes). Why is it this way and not the other way around? Because words have collocations – they prefer the company of certain other words.(Selivan, 2018:28, emphasis added)

The quote above, from a book published in 2018, reflects a pervasive view in the literature that collocations are arbitrary, that is, there is no particular reason why words “prefer the company of certain other words”, they just do.

Liu (2010) identifies this view of collocation-as-arbitrary as wide-spread amongst scholars, he also demonstrates that it is a common assumption in published teaching materials. Of the books, studies and websites on teaching collocations he observes collocation exercises as mainly noticing and memorising fixed units or in other words form focused exercises.

Example of such exercises are:

“identifying or marking collocations in a passage or in collocation dictionaries; reading passages with collocations highlighted or marked; filling in the blanks with the right word in a collocation; choosing or matching correct collocates; translating collocations from L2 back into L1 or vice versa; and memorization-type activities like repetition and rehearsal” (Liu, 2010:21)

There were fewer exercises on linking collocation forms to their meanings.

In addition to overlooking the motivated aspects of collocations, learners also miss the chance to generalise what they learn (Wray, 2000). That is, collocations also need to be analysed if students are to make the most of them in new situations of use.

To take the examples of “make” and “do”, the core meaning of “make” is create, which is a process that is purposeful and/or more effortful than the core meaning of “do” of completion/the finishing of something, which focuses on the end result of an activity rather than on any effort in the process of that activity. Understanding these core meanings can throw light on the following use of “did a mistake”:

“But I did a mistake in talking about it, you know, the last time and recently”

The larger context of this is from a spoken news report:

weren’t there. Let me handle it. I said, " Yes, ma’am. " ROSEN: The rebuke of Mr. Clinton by his wife came after the former president revived the dormant issue of Mrs. Clinton’s own misstatements about her 1996 trip to Bosnia. You’ll recall Mrs. Clinton, in recent months, spoke of sniper fire jeopardizing her landing. But contemporaneous video and eyewitness account revealed there was no such threat, and the senator effectively if belatedly defused the story with an omission of error in late March. SEN-HILLARY-CLINTO: But I did a mistake in talking about it, you know, the last time and recently. ROSEN: But in Jasper, Indiana, Thursday, Mr. Clinton blamed the controversy on the biased news media. B-CLINTON: She took a terrible beating in the press for a few days because she was exhausted at 11:00 at night when she started talking about Bosnia. ROSEN: In fact, Mrs. Clinton related the false Bosnia story numerous times including in a prepared speech delivered freshly at mid morning. B-CLINTON: And then the president (COCA SPOK: FOX SPECIAL REPORT WITH BRIT HUME 6:00 PM EST, 2008, emphasis added)

We could speculate that in using “did a mistake” Hilary Clinton was implying that in her “exhausted” state the “misstatement” was the opposite of a purposeful lie. It was just one of many activities she did that day which happened to be an error.

This can also be seen in another example from COCA – “If I do a mistake, I’m cooked”.

The context is from a written publication this time, although the language in question is in reported form:

three minutes, sometimes the whole roll — eleven minutes. It has an advantage: It takes you to the real tempo of life. Most movies are shot rather quickly and in a way where you can manipulate your reality because of the amount of coverage ” — shooting a scene from many different angles so that the director can choose among them in the editing room. ” Here my manipulation is quite different. I have to build it in with the lighting, with the framing. It requires much more attention at this stage. If I do a mistake, I’m cooked, ” he says with a laugh. # Wings’ visual style may be old-fashioned at heart, but its sound is high-tech all the way. Besides the six channels of top-notch stereo sound broadcast through the theater speakers, Wings audiences will hear two channels of three-dimensional sound through a special headset called the Personal Sound Environment (PSE) distributed to each moviegoer. Developed by Imax affiliate Sonics Associates of Birmingham, Alabama, the PSE incorporates both IMAX 3-D glasses and tiny speakers mounted between (COCA MAG: Omni, 1994, emphasis added)

The person is talking about a number of steps in their work routine in shooting a movie. The use of “do” here is to signal that any disastrous mistake is not to be blamed on the person considering all the other things he has to juggle.

Note that I could only find 3 uses of “do a mistake”, of which 2 are shown here (the third one I can’t offer any speculation on as I suspect more context needs to be chased up than that provided by COCA).

This blog was inspired by a question from a student about why a text had “in many respects” rather than “in many aspects”. I went onto COCA to have a look but could not discern any useful explanation. I just told the student that “aspects” does not seem to prefer “in many” compared to “respects”! Only later when I thought about the root word in common “spect” (meaning see) did a arguably useful explanation present itself – “in many respects” implies that the [re-seeings] have already been understood in some way. While “in many aspects” the reader may not yet know what these [partial-seeings] may be. These meanings could match up with the observation that “in many respects” often comes at the end of a clause or sentence while “in many aspects” may tend to come at the beginning of a clause or sentence.

Thanks for reading.

References:

Davies, M. (2008). Corpus of contemporary American English online. Retrieved from https://www.english-corpora.org/coca/.

Liu, D. (2010). Going beyond patterns: Involving cognitive analysis in the learning of collocations. TESOL Quarterly, 44(1), 4-30.

Selivan, L. (2018). Lexical Grammar: Activities for Teaching Chunks and Exploring Patterns. Cambridge University Press.

Wray, A. (2000). Formulaic sequences in second language teaching: Principle and practice. Applied linguistics, 21(4), 463-489.

Corpus linguistics community news 8

First up is the news that there are more than 700 members. Nice.

Important date for your diaries is 25 September 2017 when another round of #corpusmooc is launching. This time new sections are promised and most notable new addition is a new version of LancsBox. Check out the following two cute vids being used to promote #corpusmooc 2017:

Also if you use Twitter you can follow the bot corpusmoocRT@corpusmoocFav.

Next up are some great plenary videos from this years Corpus Linguistics 2017 knees-up in Birmingham plus related notes from conference by John Williams.

Checking the distribution of the pair on the one hand/on the other hand in BYU-COCA sections.

A graphic trying to depict keywords as calculated in AntConc.

A possible way to find collocations suitable for various proficiency levels.

And finally for a bit o’ fun is this the longest term in ELT? And The Banbury Corpus Revisted by Michael Swan.

Thanks for reading and for those coming off a summer break much energy to you for the new teaching year.

Quick cup of COCA – lemma and POS

I was reading the following which is part of a forum discussion by a French poster:

This is clearly more complicated to port, but the benefit can be very important,

OpenPandora Boards comment

It caught my attention as I am interested in the uses French speakers of English make of the word important (e.g. see here). Often they use it instead of an appropriate size adjective, so in this case the forum poster could have written – the benefit can be large.

However the construction was still sounding a little odd to me, so I used COCA to look at the collocates of the noun of the lemma benefit – [benefit].[n*]. A lemma is all forms of the word and is indicated by square brackets. The part of speech can be selected from the POS (part of speech) List drop down box. To use a POS like this, you need to append it with a dot (full stop) to the word you are looking at.

From the results of this search, the rank 6 collocate is potential. Of course! Duh! That’s why the benefit can be sounds odd, whereas potential benefits are  would sound better.

Now you may be saying I did not need COCA to figure that out, sure I could have mulled it over the morning but COCA allowed me to get on with other trivial things than puzzling over this particular one. 🙂

That’s it for another quick cup of COCA. And if you haven’t already you can read some more quick cup of COCA posts.

Just the Word – alternatives function or how to introduce concordances to your students.

This post may encourage those who have yet to try out concordances in class. Additionally if you teach the TOEIC using Cambridge Target Score book (Talcott & Tullis, 2007) you may find this post of interest. It takes advantage of the alternatives function in Just the Word which replaces each word entered with a similar word and shows their connection strength.

In the last unit 12 of the Cambridge Target Score book, on page 118 there is a collocations exercise focusing on adjective + noun and adverb + adjective patterns. A way to extend this exercise is to use the Just the word alternatives function.

This works best with the adjective + noun patterns. The first such pattern given in the book is valuable lessons.

Entering valuable lessons then pressing  the alternatives button we get this screen:

valuable_lessons
There are three options when replacing the adjective in valuable lessons:
valuable lesson (36)
important lesson (61)
salutary lesson (23)

Ask students to rank order the above in terms of their frequency.

The blue bars under each alternative shows how similar the replacement word is to the original.

An extract of the text in the exercise which illustrates the use of this collocation is shown below:

…as he gives valuable lessons in living and a fresh, first-hand view of American society…

(Talcott & Tullis, 2007, p.118)

Ask students what do they notice about this use, elicit the verb give, the preposition in. Note, when working with the text from the exercise for the first time, I usually try to get them to see any interesting chunks so in this case give lesson in; give first-hand view of.

Give students the concordance lines of valuable lessons (click on valuable lessons which is hyperlinked to the concordance lines) and ask them to note down any patterns, elicit the most common verb learn and the article a:

valuable_lessons-concordances

You can do something similar with the other patterns given in the book exercise or give it as a task for students to do for the following class.

Thanks for reading.

References:

Talcott, C. & Tullis, G. (2007). Target Score: A communicative course for TOEIC Test preparation. (2nd ed.). Cambridge: Cambridge University Press.

Building your own corpus – Such as example

Although the power of corpus is in discovering horizontal relations between lexis such as collocations and colligation, the more familiar vertical relations that teachers are used to can also be explored at the same time. Such vertical relations or semantic preference have been used traditionally in ELT coursebooks. See the post by Leo Selivan critiquing this use. This post describes using concordance output to look at both collocation (adjective + noun) and semantic preference (hyponymy or general category/specific example relation)

I adapted an activity by Tribble (1997) who in turn based it on an example from Tim Johns who used the keyword “such as”, e.g.:

encompass many _______frequently labeled HTML5 such as the Geolocation API, offline storage API a

Make sure to see the worksheet (in odt format) to understand the following. The first question A in the worksheet is a traditional pre-learning task using the specific examples of the semantic categories, so in the line above the examples of geolocation api and offline storage api are instances of the category of features (the gapped word). The next questions B and C asks students to find the category that the words in bold on the right are examples of (these bold words are the ones in question A). Question D asks students to identify the types of words in the lists and in the italicised words; to get them to notice the adjective + noun structure e.g. important features. Finally question E asks them to identify adjective + noun structures in a single text taken from the website that the corpus is based on.

Less experienced first year multimedia students struggled more with the exercises than more experienced second year students. I assume this is because they were less familiar with the lexis? Though both seemed more at ease with the last task of identifying structures in the text.

I’ll survey some corpora classroom task types in a later post.

Thanks for reading.

References:

Tribble, C. (1997). Improvising corpora for ELT: Quick-and-dirty ways of developing corpora for language teaching. In J. Melia. & B. Lewandowska-Tomaszczyk (Eds.), PALC ’97 Proceedings, Practical Applications in Language Corpora (pp. 106-117). Lodz: Lodz University Press.