My last two posts (Locating collocation and Thin word lists and fat concordances) have used the ideas of Bill Louw, who kindly agreed to talk about his work. (Note if you are reading this from a mobile device you may need to refresh a few times to get all the audio to load)

The title of this post indicates his overall goal to revive logical positivism 1 (Schlick refers to Moritz Schlick one of the founders of logical positivism):

Revive logical positivism

He describes how he is doing this by merging Firthian ideas with logical positivism via the shared idea of context of situation (semantic prosody is a type of contextual meaning):

Hand over to science

Louw claims that another of the founders of logical positivism Rudolf Carnap was prevented from continuing his work on induction and probability when Carnap moved to the USA. Apparently this is evident from letters between Carnap and American philosopher Willard VO Quine. The significance of induction was highlighted by Bertrand Russell who stated that we can’t have science without induction. A very common representation of induction is the “All swans are white” example or more generally “All A’s are B’s” however Moritz Schlick saw induction differently:

Schlick on induction

Louw goes on to add how Schlick describes the relation between thinking and reality:

Schlick on thinking

The above clip is important to understand how Louw critiques the idea of collostruction. Collostruction is a way to measure collocation as it relates to grammar and Louw points out the weakness in such an approach in terms of the “given” i.e. reality/experience (Gries refers to Stefan Th. Gries inventor of collostruction):

Collostruction and the given

Another way Louw illustrates his project to revive logical positivism is how he derives the idea of subtext from Bertrand Russell’s idea of a perfectly logical natural language:

Subtext 1

He then describes how Firthian collocation needs to be brought in to augment subtext if languages like Chinese are to be studied:

Subtext 2

For some reason until I started reading Louw I did not quite get the idea of progressive delexicalisation – that words have lots of meanings that differ from their literal meanings. Previously I was only thinking of delexicalisation with respect to verbs such as ‘make’ and ‘do’. And further that many words we may think have mostly literal meanings in fact have mostly delexical meanings. Louw & Milojkovic (2016: 6) give the example of ‘ripple’, where only one form in ten occurred with ‘water’ and ‘surface’ using the Birmingham University corpus.

Louw describes how John Sinclair described this as the blue-jeans principle:

Sinclair’s blue jeans

In the early 90’s Louw tested the idea of Sinclair’s that every word has at least two meanings:


The start of the 80’s recalls how Louw encountered the idea of a computer writing a dictionary:

Computer writing

Louw gives an example of how the computer can help using US presidents Trump & Biden:

Computer reassurance

Louw is keen to distinguish collocation from colligation:

Deceptive colligation

Louw admits his self-obsession on the idea of bringing together Firth and the Vienna school:

Firth & Vienna

Louw’s conviction of his project reflects the certainty of the logical positivists and despite that stream of thought no longer being the force it was Louw’s drive recalls Richard Rorty (without condoning the sexist language) as quoted in Goldsmith & Laks (2019: 443):

“The sort of optimistic faith which Russell and Carnap shared with Kant – that philosophy, its essence and right method discovered at last, had finally been placed upon the secure path of science – is not something to be mocked or deplored. Such optimism is possible only for men of high imagination and daring, the heroes of their times”

Thanks for reading & listening and many thanks to Bill Louw for taking time to chat with me.


Thin word lists and fat concordances

One of the aspects of the proposed changes in the GCSE modern foreign language, MFL, syllabus in the UK is the use of corpus derived word lists 1. Distribution of words when counted follow a power law. A common power law is Pareto in economics – “Pareto showed that approximately 80% of the land in Italy was owned by 20% of the population” 2 . Similarly in any piece of text a large percentage of it comes from a relatively small amount of words – the top 100 words in English accounts for 50% of any text. The MFL review wants to use wordlists of the most frequent 2000 words – which would cover about 80% of any text.

Currently the MFL syllabus is topic based, so one issue here is that most words one can use for any particular topic will be limited to that topic. Or another way to say it is that although the word may be frequent within a topic it won’t have range and appear in other topics. The NCELP, National Centre for Excellence for Language Pedagogy in Vocabulary lists: Rationales and Uses writes “For example, many of the words for pets or hobbies will be low frequency words which are not useful beyond those particular topics. ” 3

There have been many critics of this wordlist driven proposal who have pointed out various weaknesses, see – AQA Exam board 4, ASCL, Association of School and College Leaders 5, Transform MFL 6 , Linguistics in MFL Project 7.

I want to take a different tack and argue that the wordlist driven approach is a half-hearted version of what could be a full blooded corpus approach to vocabulary content.

Corpus stylist Bill Louw writes that he “has become suspicious of decontextualised frequency lists” (Louw & Milojkovi, 2016:32). He calls such lists thin lists because they tend to cover things rather than events (Louw 2010). Events are states of affairs, what one of the originaters of the notion of meaning by collocation JR Firth has called context of situations. Looking at collocates of things in concordance lines allows us to “chunk the context of situation and culture into facts” (Louw 2010).

A concordance line brings together and displays instances of use of a particular word from the widely disparate contexts in which it occurs. To cover events one would need to examine collocates in concordances hence the term fat concordances.

The most frequent words are often bleached out of their literal meanings. Compare the word “take” on its own, most people would think of the meanings of “the act of receiving, picking up or even stealing” (Louw & Milojkovi, 2016:5), to a collocation such as “take place”, we see that the meaning here is distant from the literal meaning of “take” 8. When the NCELP say “Very high frequency words often have multiple meanings.” they are describing the notion of delexicalisation.

To demonstrate context of situation and context of culture, reproduced below is corpus linguist John Sinclair’s PhraseBite pamphlet which is reproduced in Louw (2008):

When she was- – – – – Phrasebite© John Sinclair, 2006.

  1. The first grammatical collocate of when is she
  2. The first grammatical collocate of when she is was
  3. The vocabulary collocates of when she was are hair-raising. On the first page:
    diagnosed, pregnant, divorced, raped, assaulted, attacked
    The diagnoses are not good, the pregnancies are all problematic.
  4. Select one that looks neutral: approached
  5. Look at the concordance, first page.
  6. Nos 1, 4, 5, 8,10 are of unpleasant physical attacks
  7. Nos 2, 3, 6, 7, 9 are of excellent opportunities
  8. How can you tell the difference?
  9. the nasties are all of people out and about, while the nice ones are of people working somewhere.
  10. Get wider cotext and look at verb tenses in front of citation.
  11. In all the nasties the verb is past progressive, setting a foreground for the approach.
  12. In the nice ones, the verb is non-progressive, either simple past or past-in-past.

Data for para 4 above.
(1) walking in Burnfield Road , Mansewood , when she was approached by a man who grabbed her bag
(2) teamed up with her mother in business when she was approached by Neiman Marcus , the department store
(3) resolved itself after a few months , when she was approached by Breege Keenan , a nun who
(4) Bridge Road close to the Causeway Hospital when she was approached by three men who attacked her
(5) Drive , off Saughton Mains Street , when she was approached by a man . He began talking the original
(6) film of The Stepford Wives when she was approached by producer Scott Rudin to star as
(7) bony. ‘ ‘ Kidd was just 15 when she was approached to be a model . Posing on
(8) near her home with an 11-year-old friend when she was approached by the fiend . The man
(9) finished a storming set of jazz standards when she was approached by SIR SEAN CONNERY . And she
(10) on Douglas Street in Cork city centre when she was approached by the pervert . The man persuaded

As Louw (2008) puts it:

“The power of this publication, coming as it did so close to Sinclair’s death, is to be found in the detail of his method. By beginning with a single word, she, from the whole of the Bank of English, Sinclair simply requests the most frequent collocate from the Bank of English (approximately 500 million words of running text). The computer provides it: when. The results are then merged: when+she. A new search is initiated for the most frequent collocate of this two-word phrase. The computer provides it: was. The concordances are scrutinized and cultural insights are gathered.”

The ASCL quotes applied linguist Vivian Cook:

“While word frequency has some relevance to teaching, other factors are also important, such as the ease with which the meaning of an item can be demonstrated (’blue’ is easier to explain than ‘local’) and its appropriateness for what pupils want to say (‘plane’ is more useful than ‘system’ if you want to travel)”

Blue is easier to explain than local because most collocates of blue are its literal colour meaning e.g. “blue eyes”. Yet consider this from a children’s corpus:

“There, I feel better. I’ve been needing a good cry for some time, and
now I shall be all right. Never mind it, Polly, I’m nervous and tired;
I’ve danced too much lately, and dyspepsia makes me blue;” and Fanny
wiped her eyes and laughed.” (An Old-fashioned Girl, by Louisa May Alcott)

So while it is true that blue is often associated with color, it also associates with mental states where the colour meaning is delexicalised, or washed out.

To conclude, the MFL proposals on using corpus derived word lists to drive content is not taking full advantage of corpora. They are promoting thin wordlists when they could also be promoting fat concordances.

Thanks for reading.


