There have been some recent developments in helping people to build their own corpora. One of these is the wikipedia corpus builder from BYU-COCA developer Mark Davies. The advantage of this system is the handy keyword extraction tool (noun, verb, adjective, adverb, noun + noun, adjective + noun) as well as the usual functions from the BYU suite of corpus tools.
The disadvantage is that it is limited to the texts from wikipedia.
An alternative is to use Google’s Custom Search Engine (CSE) which allows you to add in the websites you are interested in and more recently (well recent for me) they have added a creation from keywords tool.
I have previously used the CSE in building a Mission to Mars search engine that contained urls with information related to the NASA mission to Mars. This however required some effort in first finding relevant urls.
Now with the keyword tool this is a much easier process.
I will describe how to do this using the example of setting up a Bridge Building corpus that I want to use with students. The task was to watch a video on a mega-bridge, take notes, then write a summary. I want to use the CSE to get students to look up “incorrect” language use in their summary and revise them using information from the CSE.
As mentioned I have done this previously with the Mission to Mars CSE but since it took some fiddling to build I had never really followed up on that. Now with the relative speed of keyword search this is something I want to try again.
Note that I will detail how to use the CSE in detail in another post, suffice to say now is that use of simple Google searching like double quotation marks and wildcard asterix goes a long way.
Note you do need a Google account.
When you create a new search engine you get this screen:
the Use the CSE creation from keywords tool is circled in red.
Clicking on that takes you to this screen which asks you to name your Search Engine:
Here is the screen with keywords I entered for my Bridge Building CSE:
Then you press Expand keywords and you get a screen like this:
These are the words Google will use to retrieve relevant urls. Note that at this step it is advisable to check the keywords and if not relevant then use the add negative items feature (circled in red).
As you can see my initial keyword of center span came up with words related to html web programming and design and so I needed to eliminate such words in a cycle of adding to negative terms and seeing the resulting search words.
As a result of the previous command you get a list of urls that you can check for relevancy before adding them to your custom search engine:
Each CSE you build has a public url you can share with students here is mine for the Bridge Building CSE.
Thanks for reading and do feel free to fire any questions and comments below.
@RudyLoock found that if you are signed into Google with non-English settings you may need to change language settings to English in order to see the Use the CSE creation from keywords tool.
Here is a link to an article about using Google Search and Google Scholar in the Humanising Language Teaching magazine – Google Giveth and Google Taketh, Developments in Google as a Corpus.