Paper Machines – look ma no concordance lines

A major hurdle with corpora is understanding concordance outputs. Current interfaces don’t help much in this regard as documented by Alannah Fitzgerald e.g. Re-using Oxford OpenSpires content in podcast corpora. Geoff Jordan who worked on an early concordancer remarked:

…we found, not surprisingly, that there was a pay-off between simpliciity and allowing for “sophisticated” searches.

Geoff Jordan (blog post comment)

So there is a big need to address the question of interface design with concordance tools. Recently, I stumbled across a visualisation plugin for the Zotero program called Paper Machines (the github site is faster) which can help with the interface problem.

Using the Phrase Net function I can get a visualisation of several fixed phrases e.g. binomials i.e. x and y:


The size of the word is related to number of occurences of that word in the phrase and the thickness of the arrow how often that phrase occurs. I can of course check these binomials using AntConc.

Here’s another example using x the y:


Trying to get interesting patterns like this is much more difficult (impossible?) in AntConc. The default phrases in Paper Machines are (you can do your own custom searches):

x and y

x or y

x of the y

x a y

x the y

x at y

x is y

x [space] y which is equivalent to collocations i.e.:


In addition to seeing interesting phrases the visualisation helps you to get to know your corpus much better, e.g. in a couple of the diagrams I noticed spk and dkim, looking those up using AntConc I discovered they were abbreviations for sender policy framework and domain keys identified mail respectively. I have no idea what these are but my students may find them important to know.

There are other functions on Paper Machines like Ngram but I could not get that working. For fun, the image below is a Heatmap of the place names mentioned in my webmonkey corpus:


Thanks for reading.