Skip to main content

Prelims Paper 2 - Early Medieval Literature c.650-1350: Corpora

Introduction to Corpora

A corpus is a collection of texts or text extracts that have been put together to be used as a sample of a language or language variety. It consists of texts that have been produced in 'natural contexts' (published books, ordinary conversation, letters, newspapers, lectures etc), which means it mirrors natural language. A well-composed corpus can be used to answer questions about language use, such as:

Does 'wicked' generally mean 'good' or 'bad'? Has this meaning changed over time? Does the use differ between different kinds of text? Do different (kinds of) speakers use the word in the same way?

A reference corpus (created to be a balanced sample of a language variety) can be used as the basis of comparison between a text/genre and 'standard language'.

Specialised corpora can be used to examine or compare different language varieties, such as language from a particular area, covering a certain genre or text type, produced by particular language users, etc.

Corpora can be synchrone (covering one time) or diachrone (covering several time periods), consist of different media (written or spoken language) and be composed of different languages.

Annotated corpora have extra information added, usually linguistic information (part-of-speech, lemmata) or metadata (infomration about the material in the corpus, speakers/authors, situation, extra-linguistic infomration etc).

There are corpora that can be consulted online, via a custom-built interface, and ones that you explore with stand-alone tools that you install on your computer.

Useful Corpora