I had the chance to try out the new Corpus of Global Web-based English (GloWbE) in my TOEIC class the other day. The corpus claims to have 1.9 billion words, compare that with COCA (Corpus of Contemporory American English) which has 450 million words.
The student question was about the sentence “That will suit you perfectly” (which arose from dialogues they had been inventing between a customer and a salesperson). And in fact the query was about word order of adverbs and after answering that (but see StringNet post) took excuse to consult GloWbE and COCA.
Comparing the search term /will suit you/ between the two corpora:
COCA (will suit you) – 17 hits
GloWbE (will suit you) – 322 hits
In GloWbE the most common collocate was “best”, “perfectly” ranked 5th most common – will suit you results.
As a bonus one of my students was most impressed with the tool and noted down the url link.
Of course the real power in the new corpus is to analyze country differences and for me web related lexis.
I decided to see results for search term /responsive design/ (a hot topic in web design) – on the face of it impressive numbers – 1 hit in COCA compared to 966 hits in GloWbE. However further inspection reveals duplicate text e.g. with the collocate HTML5 – responsive design results.
The corpus indicates duplicates by a number in brackets and the authors promise that these will be cleaned up over time. So the future looks rosy for the usefulness of this massive new corpus.