Welcome to the training page for Prelims Paper 1. This is Part 2: Text Analysis.
On the left column you'll find some introductory material and links to all the key resources.
Many websites or software programs allow you to analyse your chosen texts. Text analysis tools allow you to explore a text quantitatively, e.g. by instances of one particular word; and systematically, e.g. Looking at the types of words used and phrases used. This can be particularly useful or finding all instances of a specific word within a text. The tools will also list all the words in your chosen text by type, e.g. adjective or plural noun.
Using the text analysis tools allows you to compare two or more texts and lets you gather key features of the language used. You can search for the occurrences of just one word, or a more complex pattern, e.g. pairs of words within one context.
These tools are good for looking at the different ways authors write across genre or type, e.g. Fiction and non-fiction.
Researchers also put them to use to examine questions of authorship. With the tools available you can search your own chosen texts. You can also use established corpora like the British National Corpus to look for common occurrences of words and common phrases.
A good place to start is to get som statistics of your chosen texts, to find out a bit more about them. There are many free tools online that will give you statistics about a text, but one we recommend is Voyant Tools.
Voyant Tools is a web-based text reading and analysis environment. It is a scholarly project that is designed to facilitate reading and interpretive practices. Do the exercise below to learn how to use a tool like Voyant and to see what kind of information it can give you.
If you found some texts in Part 1 of this training programme, then you can copy and paste those to use in this exercise - or you can choose something else. We have chosen an online text of The Tell-Tale Heart by Edgar Alan Poe.
You should be presented with something that looks like this:
Let's look at each part in a bit more detail to see what information it contains.
In the bottom right corner look at the summary:
This will tell you how many words are in your text, and how many of them are unique words. What does this tell us about Poe's use of language? You may need to paste in other texts and compare them to get an idea about how authors tend to write in comparison with Poe. With this tool you can compare two or more different authors, or multiple texts by the same author.
Have a look at the most frequent words used. In this Poe extract the most frequent words used are Louder, Increased, Noise. Later, in Exercise Two we will use this information to find out how often these words appear in the English language.
Next, have a look at the graph in the top right corner. This displays the appearance of those frequent words throughout the text, so you can visually see which ones appear at the same time as each other.
We can see in the Poe example that the word Sound is used a lot at the beginning of the text, but this stops, and later the words Heard and Louder appear very often together. Is this similar in general in the English Language? Keep working down to Exercise Two to find out.
We've seen with the above tools how you can compare texts with each other, but if you want to compare a text to a sample of a whole language, then you will need to use a Corpus.
A corpus is a collection of texts or text extracts that have been put together to be used as a sample of a language or language variety. It consists of texts that have been produced in 'natural contexts' (published books, ordinary conversation, letters, newspapers, lectures etc), which means it mirrors natural language. A well-composed corpus can be used to answer questions about language use, such as:
Does 'wicked' generally mean 'good' or 'bad'? Has this meaning changed over time? Does the use differ between different kinds of text? Do different (kinds of) speakers use the word in the same way?
A reference corpus (created to be a balanced sample of a language variety) can be used as the basis of comparison between a text/genre and 'standard language'.
Specialised corpora can be used to examine or compare different language varieties, such as language from a particular area, covering a certain genre or text type, produced by particular language users, etc.
Corpora can be synchrone (covering one time) or diachrone (covering several time periods), consist of different media (written or spoken language) and be composed of different languages.
Annotated corpora have extra information added, usually linguistic information (part-of-speech, lemmata) or metadata (information about the material in the corpus, speakers/authors, situation, extra-linguistic information etc).
There are corpora that can be consulted online, via a custom-built interface, and ones that you explore with stand-alone tools that you install on your computer.
One of the most commonly used Corpora for this paper is the British National Corpus. This is a large corpus of British English from the Late 20th Century.
It’s really important when you use a Corpora that what you’re saying about language is relevant to the corpora you’re using. So if we were writing about how Wilfred Owen’s use of language compared to his contempories, or predecessors, then this Corpora would be no good. But if we were saying something about how his use of language is different to later generations, then this would be a useful resource. So just bare that in mind when you’re choosing a corpora to use. There are lots of them available, so just make sure it’s a relevant one to what you’re trying to say.
You can find links to more Corpora here.
We're going to show you some of the basic functions of the British National Corpus so you be guided through how it works and what it can show you.
The BNC has two main interfaces that we're going to mention here. One is the BNCWeb and the other is the BNC-BYU Interface. They both use exactly the same set of data, it's just the interface that is different. We will demonstrate the BNCWeb here as it is a bit more straightforward for demonstration purposes, but there is guidance on the BYU interface below as well.
This is interesting to do with authors who use language in what you think is an innovative way, to see if this is true.
You can also use the BNC-BYU interface, which does allow for more complex queries. To use this you must first register. Registering will give you access to 200 queries per day. To register, you need either to use a computer on campus, or to be connected remotely via a proxy server (VPN – see https://www.it.ox.ac.uk/work-remotely). After registering, you will be able to access BYU-BNC remotely, but will need to re-authenticate every 365 days by logging on again on campus.
Language of the Internet
Now you've worked through the training session you can scroll back to the top and have a look through the different tabs, you'll find sections on recommended eBooks, eJournals, Dictionaries, Primary Texts Online, Newspapers & Ephemera, Web Resources, Text Analysis Tools, Corpora.
If you need help finding or using online resources you can contact us at email@example.com. If you are having technical difficulties with any online resource please contact firstname.lastname@example.org