A major hurdle with corpora is understanding concordance outputs. Current interfaces don’t help much in this regard as documented by Alannah Fitzgerald e.g. Re-using Oxford OpenSpires content in podcast corpora. Geoff Jordan who worked on an early concordancer remarked:
…we found, not surprisingly, that there was a pay-off between simpliciity and allowing for “sophisticated” searches.
Geoff Jordan (blog post comment)
So there is a big need to address the question of interface design with concordance tools. Recently, I stumbled across a visualisation plugin for the Zotero program called Paper Machines (the github site is faster) which can help with the interface problem.
Using the Phrase Net function I can get a visualisation of several fixed phrases e.g. binomials i.e. x and y:
The size of the word is related to number of occurences of that word in the phrase and the thickness of the arrow how often that phrase occurs. I can of course check these binomials using AntConc.
Here’s another example using x the y:
Trying to get interesting patterns like this is much more difficult (impossible?) in AntConc. The default phrases in Paper Machines are (you can do your own custom searches):
x and y
x or y
x of the y
x a y
x the y
x at y
x is y
x [space] y which is equivalent to collocations i.e.:
In addition to seeing interesting phrases the visualisation helps you to get to know your corpus much better, e.g. in a couple of the diagrams I noticed spk and dkim, looking those up using AntConc I discovered they were abbreviations for sender policy framework and domain keys identified mail respectively. I have no idea what these are but my students may find them important to know.
There are other functions on Paper Machines like Ngram but I could not get that working. For fun, the image below is a Heatmap of the place names mentioned in my webmonkey corpus:
Thanks for reading.