Using BYU-Wikipedia corpus to answer genre related questions

A link was posted recently on Twitter to an IELTS site looking at writing processes and describing graphs.
The following caught my eye:

…natural processes are often described using the active voice, whereas man-made or manufacturing processes are usually described using the passive.
(http://iamielts.com/2016/02/descriptive-report-process-descriptions-and-proofreading/)

The claim seems to go back to 2011 online (http://ielts-simon.com/ielts-help-and-english-pr/2011/02/ielts-writing-task-1-describe-a-process-1.html).

This is an interesting claim. It has been shown that passives are more common in abstract, technical and formal writing (Biber, 1988 as cited by McEnery & Xiao, 2005). Here the claim is about specific written texts on natural processes and man-made processes.

Well we can simplify this by asking are there more passives used when writing about man-made processes than when writing about natural processes? Since if you use passive clauses then you don’t use active clauses and we can come to a conclusion by deduction.

BYU-Wikipedia corpus can be used to get approximations of natural process writing and man-made process writing. The keywords I used (for the title word) were ecology and manufacturing. Filtering out unwanted texts took longer than expected especially for the manufacturing corpus. In the end I had an ecology corpus of 77 articles and  153,621 words and a manufacturing corpus of 116 articles and 98,195 words.

The search term I used to look for passives was are|were [v?n*]. This gave me a total of 293 passives for ecology and 304 passives for manufacturing. According to the Lancaster LL calculator this showed a significant overuse of passives in manufacturing compared to ecology. According to the log ratio score this is about 2 times as common (if I understand this statistic correctly). Now this does not mean much as a lot of the texts in the wikipedia corpora won’t be specifically about processes but still it is interesting.

What is more interesting are the types of verbs used in passives in ecology and manufacturing. The top ten in each case:

Ecology:

ARE FOUND

ARE CONSIDERED

ARE KNOWN

ARE CALLED

ARE COMPOSED

ARE ADAPTED

ARE USED

ARE DOMINATED

ARE INFLUENCED

ARE DEFINED

Manufacturing:

ARE USED

ARE MADE

ARE KNOWN

ARE PRODUCED

ARE CREATED

WERE MADE

ARE DESIGNED

ARE CALLED

ARE PERFORMED

ARE PLACED

 

Thanks for reading.

References:

Biber, D. (1988) Variation Across Speech and Writing(Cambridge: Cambridge University Press).

McEnery, A. M. and Xiao, R. Z. (2005) Passive constructions in English and Chinese: A corpus-based contrastive study . Proceedings from the Corpus Linguistics Conference Series, 1 (1). ISSN 1747-9398 Retrieved from http://eprints.lancs.ac.uk/63/1/CL2005_(22)_%2D_passive_paper_%2D_McEnery_and_Xiao.pdf

2 thoughts on “Using BYU-Wikipedia corpus to answer genre related questions

Penny for your thoughts

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s