Quick cup of COCA – compound words

A new quick cup of coca post, whayhay. Thanks to Mike Harrison (@harrisonmike) on Twitter who was asking about finding compound adjectives.

Here we can use wildcard asterix, with part of speech.

So say we were looking for adjectives starting with well, we could use [well-*].[j*] to give the following top ten results –

WELL-KNOWN, WELL-ESTABLISHED, WELL-MEANING, WELL-DEFINED, WELL-EDUCATED, WELL-TO-DO, WELL-TRAINED, WELL-DRESSED, WELL-DOCUMENTED, WELL-INTENTIONED
(click on words to see full search results)

To find all compound adjectives we would simply replace the first part of the compound with another wildcard asterix like so:

[*-*].[j*]

which gives us the following top 10 results:

LONG-TERM, SO-CALLED, AFRICAN-AMERICAN, FULL-TIME, SHORT-TERM, WELL-KNOWN, HIGH-TECH, OLD-FASHIONED, MIDDLE-CLASS, PART-TIME
(click on words to see full search results)

Similarly if you were looking for noun, adverb or verb compounds simply add the appropriate POS tag i.e. [n*], [r*] and [v*] respectively.

Note do double-check result in concordance lines as sometimes the POS tagging is off.

As an interesting aside a search for compound adjectives historically in COHA gives us a very nice ascending curve. Wonder what the significance of that is?

COHA-compound_adj
Compound adjectives over time in COHA (click on graph)

Finally do check out the previous quick cup of coca posts if you want help with searching in COCA.

Corpus linguistics community news 4

Another installment of the Google+ Corpus Linguistics Community news. In addition I include links to some visual aids that were designed to answer questions people in the second round of #corpusmooc posed.

Bigger is not necessarily better – Here I talk about some minor aspects of the paper, and not the more interesting aspects that is, the paper compiled a list of approximately 14000 pairs of collocations that are worth teaching. The list is not available as of yet. The other main finding is that frequency is more important than dispersion or chronological data when identifying collocations, with human judgement remaining a key factor in deciding on useful collocations.

Google as a corpus with students – Some interesting recent developments on using Google as a corpus with a short list of relevant online reading.

How to develop effective concordance materials using online corpus – My interpretation of a slideshow by a Korean researcher on using data driven learning materials.

#corpusmooc Visual Aids:

Tokens, Types, Lemmas, Word families

Genre Register

Collocations and Colligations

Multi-word expressions

Do check out the other corpus lingustics community news if you haven’t already.

Thanks.

Fav the PHaVE Pedagogical List for the New Year

Great New Year news for teachers, a new word list of phrasal verbs, the PHaVE List (Garnier & Schmitt, 2014) finds that of the top 150 most common verbs there are only 288 meanings in total. That is on average about 2 meanings a phrasal verb. Consider that some estimates of the total number of phrasal verbs number it at nearly 9000.

You can try out the PHaVE Dictionary yourself.

What you will see are the 150 verbs ranked from 1 to 150 and their most common meanings.

The study used the following criteria to include verbs and their meanings:

For the top 150 verbs, each occurs at least 10 times per million. For a meaning to be included it needed to have 75 percent coverage in COCA-BYU and if the primary meaning did not reach this then secondary meanings of at least 10 percent were added until either 75 percent was reached or all 10 percent meanings used.

Thus 6 verbs have 4 meanings, 34 verbs have 3 meanings, 52 verbs have two meanings and 58 have one meaning.

As the study notes, in the user manual for the list, some of the verbs may well be easier to understand than others i.e. be more semantically transparent. A reminder to users that the list is a general guide and teachers, as ever, need to exercise their judgement.

If you want the raw lists go check out the G+ Corpus Linguistics community.

So do go on and set about exploring the PHaVE pedagogical list for the new year.

A huge thanks to all the readers for your support of the blog these past couple of years, here’s to more and better for 2015.

References:

Garnier, M., & Schmitt, N. (2014). The PHaVE List: A pedagogical list of phrasal verbs and their most frequent meaning senses. Language Teaching Research, 1362168814559798. Retrieved January 15 2016 http://www.norbertschmitt.co.uk/uploads/pdf-(418-kb).pdf

Corpus linguistics community news 2

Another round of links (here’s first round) to some posts on the G+ corpus linguistics community to notify peeps interested in this sort of thing:

Using BAWE corpus in SketchEngine as a learner corpus (see 1st comment to post)

The Desolation of Smaug and the BYU COCA

Dogme concordancing

The Disabled Access Friendly Campaign, corpus use literacy and wordandphrase.info

Quick cup of COCA – lemma and POS

I was reading the following which is part of a forum discussion by a French poster:

This is clearly more complicated to port, but the benefit can be very important,

OpenPandora Boards comment

It caught my attention as I am interested in the uses French speakers of English make of the word important (e.g. see here). Often they use it instead of an appropriate size adjective, so in this case the forum poster could have written – the benefit can be large.

However the construction was still sounding a little odd to me, so I used COCA to look at the collocates of the noun of the lemma benefit – [benefit].[n*]. A lemma is all forms of the word and is indicated by square brackets. The part of speech can be selected from the POS (part of speech) List drop down box. To use a POS like this, you need to append it with a dot (full stop) to the word you are looking at.

From the results of this search, the rank 6 collocate is potential. Of course! Duh! That’s why the benefit can be sounds odd, whereas potential benefits are  would sound better.

Now you may be saying I did not need COCA to figure that out, sure I could have mulled it over the morning but COCA allowed me to get on with other trivial things than puzzling over this particular one. 🙂

That’s it for another quick cup of COCA. And if you haven’t already you can read some more quick cup of COCA posts.

The Tinkerer – a corpus informed video activity

This post is a response to Vicki Holletts’ hosted ELT blog carnival on the theme of teaching and learning using videos. I had sent in a previous video activity but seeing as that was a bit old a new one was in order. The video is most suited for engineering/technical students.

This is a video activity that is also a little corpus informed. The lead-in is words taken from COCA using its synonym function. So in this case the search term was [=tinker]. I have included my transcription so that variations/extensions can be done such as gap-fills for detailed listening, or noticing spoken grammar. The jumbled text was made from Textivate.

1. Dictate the follow words to the class (the numbers are the rank order frequency from COCA):

tamper(7), fix(2), toy(4), fiddle(6), mend(8), play(1), interfere(5), repair(3).

2. These words are synonyms of this word T_ _K_R. //write gapped word on board, Tinker// What’s the word?

3. What do you call a person who does this? //Tinkerer; check that they understand the word, room here to personalise e.g. do you like tinkering?//

4. Re-arrange the text (that goes with the picture) into the correct order:

tinkerer-hackaday-text
original text from Hackaday; scrambled text from Textivate

Original text from Hackaday.

5. Watch the video. //approx 8 mins//

6. What word from the list do you think is the best synonym for JJ, the tinkerer? //you could comment on the rank order frequency of the words if most students pick play as best synonym//

7. Why do you think JJ says things are too easy now?

8. Do you agree with him? Why/why not?

Transcript:

When I was growing up, we grew, uh we grew up in the country. I didn’t have a whole lot. Uhm, my dad is very mechanical, uhm he owned a motorcycle shop when I was growing up. So a lot of what I worked on was with engines. Yeah if a go-kart breaks I would have to fix it myself. And sometimes it was held together with bailing twine and stuff just so that I could ride it, but.

It was in West Virginia [laughs] and I picked up a runt bicycle, a bicycle with little tiny wheels. Monkey bikes or whatever they’re called. Picked one up at a yard sale for about five bucks. And I put an engine on it. And I left the bike the way it was. So it was still a pretty big size. And then I thought to myself I’m gonna make it smaller. And then I cut the frame it half. And then I welded a bunch of stuff on there, a little tiny swing arm and used the wheels off a go-ped, uh the sprocket and chains off a go-ped the engine’s off a wheat eater. It’s a micro-bike I like that. Everyone’s just lingered I, I’ve had that a longtime now and it just keeps going.

Yeah, yeah I do a lot of just research on the internet, or uh random stuff. I’ll get on tangents on scientific topics or, or on something engine related or on some sort of hacking thing. I’ll just absorb knowledge I suppose. I normally, I’ll have some sort of inspiration or see a video or something that, I’m like I gotta do that. Or I’ll do something similar or  beat it or something like that. In fact I gotta an idea. You got, you got a rolly chair and there’s a leaf blower right there. Do we want to interrupt this interview, and? [laughs] See if it works. Nope. Oh well. [laughs] It was stupid. But now we know.

And sometimes I feel like tinkering with engines and sometimes it’s that and I keep focus all my attention on that. And sometimes it’s something electronic. And sometimes it’s sumthin else. It’s just that, it varies. Right now it’s the Tesla coil ’cause I’ve been working on it a while all week. That’s, that’s my top priority. That’s what I’ve been researching. I dunno I saw a Tesla coil video, I think, on the internet when I was a teenager. And I just thought I gotta build one of those. I got my son now, and slowed, slowed down my projects. But that’s okay. He is a project, he’s a good project. I’m forming him, in, into what I want [laughs]. Did he do it? Yeah he did.

I, I’ve always had a knack for finding really good deals and stuff, like I’m good at negotiations, I’m good at spotting things that are worth money at thrift stores. It started at thrift stores. Uhm, go there and I would just see stuff that other people wouldn’t recognize. And I clean them up makes sure they work. Go through it, just resell it on Ebay.

You heard that? They shake their body. And hiss like that to sound like a rattlesnake. But you see there’s no rattle. I’ve always, I, I’ve always been a really really curious person. Hafta explore things if I see sumthin I sometimes have to just pull over and hafta look. I’d be the guy you want in a zombie apocalypse that’s for sure. [laughs] Cause I’m very uh, I’m very resourceful. I can pretty much make anything happen with whatever I’ve got on hand.

It’s too easy now. Like back in the day when you wanted a radio. Like you wanted a transmitter or something, you build it. People don’t build them now, you just go out ‘an buy it. You don’t hafta learn how it works, you just use it. Same with computers, back in the eighties and stuff you hadta know how the computer worked before you could just use one. So, stuff’s too easy nowadays.

Thanks to the ELT blog carnival for the inspiration.

Teaching writing with the aid of COCA – guest post by Monika Sobejko

I am very pleased that Monika Sobjeko agreed to share some of her experiences of using corpora in class. If you have used corpora in your teaching and would like to write about it get in touch or send me a link to a post you have written or will write. And don’t forget the Google+ group (link in menu above) to link up with others interested in the language teaching and learning aspects of corpora. Over to you Monika.
I wish to begin with a word of warning: COCA is a highly addictive tool. I’ve spent many happy hours, using it both in and out of the classroom. I think it’s been a good choice, though there are some limitations, of which I’ll say a few words later.  COCA is freely available and it’s got a user-friendly interface (with lots of help menus whenever and wherever you need one). It’s also linked to related corpora, including the British National Corpus (BNC), and other resources such as WordAndPhrase.

Initially, I explored it mostly for fun, but I quickly realized that it can considerably speed up the process of preparing exercises and tests for my students. It offered plenty of authentic examples of lexical items and grammatical structures – COCA is huge (currently, 450 million words). I only had to sift through concordance lines, eliminating (or slightly adapting) those that would be too difficult for a particular group of students.

Still, I was somewhat reluctant to try any ‘hands-on’ activities with my students, using COCA directly in class. Our syllabi simply do not allow us any leeway, so to speak – no fooling around and wasting our students’ time with any ‘experimental tools’. However, I was teaching an academic writing class, and I thought that it could be a valuable tool for my students – side by side with dictionaries. After all, I’d been using COCA myself to confirm my poor, non-native speaker intuitions about language… If I found it valuable, so could they.

I also did a quick literature review and found some support in Ken Hyland’s positive approach to the use of corpora in the writing classroom. He claims that “the use of corpora and concordancing offers one of the most exciting applications of new technologies to the writing classroom” (Hyland, 2010: 167), and – now, having used it myself  with an academic writing class – I couldn’t agree more.

I was going to use the corpus as “a reference tool” (Hyland, 2010: 170) and was hoping it could be more or less seamlessly integrated into what we were doing in class. Ana Frankenberg-Garcia (2012, pp.41-42) describes a similar approach – she even claims that there is no need to formally train students how to use a corpus. Speaking of integration, you can find very interesting practical ideas about integrating corpora into production activities (writing and speaking) in this post by eflnotes.

Initially, I showed students how to do basic, most useful searches. While writing collaboratively or individually, they were often hesitant about using a particular word or phrase or wanted an alternative to what they already knew. At first, I was doing most of the searches for them, but soon they started performing their own – with or without my help. All we needed was a computer with internet access. They were learning a new skill ‘on the fly’ – exactly when they needed it and as much as they needed in a particular moment.

Basically, four types of searches were most useful, or most often performed by my students: 1) frequency search across different registers; 2) collocations search; 3)synonyms search; 4) word comparison search. I will give some examples of searches 1) and 4) done  by my students.

Frequency searches

Frequency searches across different registers of COCA were particularly useful when students wanted to use a word, but were not sure whether it is ‘formal enough’ for the academic register. Here’s an example of a search for the phrasal verb “boil down”. The student who was doing the search wasn’t sure whether or not he could use the verb in question – he’d been told in the past that in a more formal style, the use of phrasal verbs should be avoided. The settings for the search looked like this:

boil down

A quick search revealed that  “boil down” does occur in the academic register, though not very frequently (138 tokens only, with the frequency of 1.52 per million words), so far less frequently than in other registers, as you can see here:

boildown_register

However, while interpreting this, I think we must remember that, as Hunston puts it, “a corpus will not give information about whether something is possible or not, only whether it is frequent or not” (2010: 22). Ultimately, we have to rely on native speaker intuition to decide whether something is acceptable English or not. Still, I would argue that a lot of helpful information can be collected from frequency searches to help student writers make well-informed choices.

The search for synonyms was a life-saver sometimes – but I’m really not sure whether  a thesaurus wouldn’t be enough for those particular searches. Often, however, they were then followed by other queries – when a student needed to look at more concordance lines to get a better ‘feel’ for the newly discovered synonym, or to better understand the differences between two synonyms. Do look here at eflnotes for an excellent example of such a life-saving search for a synonym.

Word comparisons

A very interesting option offered by  COCA are word comparison searches. Below are the settings for such a word  comparison search – between two noun + preposition combinations: “change of” and “change in” as well as the results of that search.

comparison

comparison-collocate1

comparison-collocate2

On the basis of this information, my students concluded that “change of” meant that one thing was substituted for another, and “change in” implied a difference occurring in something, and hence – it was better to write ‘a change in temperature’ rather than ‘a change of temperature’. No dictionary could help us with that problem.

Student thoughts

Finally, I’ll just give voice to my students – most of those who responded to a short survey at the end of the course found the corpus useful (9 out of 12 students), and only one – not useful. And, surprisingly, most of them (eight students) admitted to using COCA both in and out of class. Many commented on the use of COCA in a positive way. Were they just being nice to the teacher while filling in the survey? I wonder.

Here’s a sample of their comments:

(…) I suggest organizing two or even three lessons only for learning how to exactly use COCA

It seems to be a great and a very helpful tool in writing articles. I have to admit that sometimes I had  got problems with using all its applications (…)

(…) on the plus side I’d mention the methods and tools used during the course. Especially original was the usage of the Corpus of American English, a tool I have not been aware of before attending the course (…)

an interesting tool, but you need to have effective  methods of working with it, so you must have some experience. I also think that the database of discipline-specific texts (for example, physics) is not developed well enough to reliably show how some rare words are used

The last comment reflects my own experience of using COCA – namely, it is a general corpus. If your students are interested in a specific field or in writing specific types of texts (genres), building a small, specialist corpus might actually be a better option.

Thanks for reading.

References

Frankenberg-Garcia, A. (2012). Integrating corpora with everyday language teaching. In: Thomas, J. and Boulton, A. (Eds.) Input, Process and Product: developments in teaching and language corpora. Brno: Masaryk University Press. 33-50. Retrieved from http://www.academia.edu/3368339/Integrating_corpora_with_everyday_language_teaching

Hunston, S. (2010). Corpora in Applied Linguistics. (7th ed.). Cambridge: Cambridge University Press.

Hyland, K. (2010). Second Language Writing. (8th ed.). Cambridge: Cambridge University Press.

Monika is a graduate of the Jagiellonian University in Krakow, Poland where she teaches EFL and ESP to students of archaeology and computer science. She also teaches an academic writing course to graduate students of the university. Her main interests include exploring effective ways to teach writing in a foreign language and using language corpora in EFL. She tweets @SobejM.