Maintained by: Special Collections, Bodleian Libraries
Content: The Bodleian Libraries’ Web Archive was created in 2011 to collect and preserve web material for posterity. As of 2018, the archive collects web material relating to the following eight public collections:
Scope: The Bodleian Libraries’ Web Archive works within the remit of the Bodleian Libraries’ Special Collections Collection Development Policy to collect web materials which relate to the University of Oxford or the collections of the Bodleian Libraries. As the Bodleian Libraries’ Web Archive works on a permission basis, permission is required from the owner before the web materials are added to the archive. Members of the public can nominate websites for inclusion in the BLWA, using our nomination form.
Maintained by: UK Legal Deposit Libraries (The British Library, The National Libraries of Scotland and Wales, the Bodleian Libraries, Cambridge University Libraries, Trinity College, Dublin).
Content: The UK Web Archive collects millions of websites each year. As of 2017 approximately 500TB of data has been collected. An annual UK domain crawl aims to capture as much of the UK's (freely available) web presence as possible. Selected websites are crawled on a more frequent basis, such as some news sites which are collected daily. The UK Web Archive also curates collections of websites relating to topics and themes such as:
Scope: UK websites only.
Access: Most content in the UK Web Archive is collected as per the regulations of the Legal Deposit Libraries Act (2003) which was extended in 2013 to permit the Legal Deposit Libraries to collect UK based websites. Websites collected under the regulations are only available to view on Library premises unless permission has been received from the website publisher to make the content more widely available. All content in the UK Web Archive, however, can be searched online from anywhere.
A list of webarchiving initiatives is maintained at Wikipedia. A sample of these is listed below.
UK Government Web Archive began its web archiving initiative in 1996. It aims to capture, preserve, and make accessible UK central government information published on the web.
The Parliament Web Archive began in its web archiving programme in 2009. It aims to capture, preserve and make accesible UK Parliament information published on the web.
The Internet Memory Foundation (formerly known as the European Archive) is a non-profit organisation, based in Paris and Amsterdam. Since 2004, it has worked with cultural and government organisations to capture and preserve web archives. The collections of some of the organisations they work with are hosted on their website, including those of the UK Government and the UK Parliament.
The Icelandic Web Archive has collected snapshots of Icelandic websites since 2004 in accordance with the Icelandic law on legal deposit from 2002. The collection is limited to .is-domains and a hand-picked selection of Icelandic websites within other top-level domains.
The Bibliothèque nationale de France has collected websites under French legal deposit legislation sice 2006. The collection is only available on BNF premises.
Under Danish Legal Deposit Law, the two legal deposit libraries in Denmark, State and University Library and The Royal Library, have been archiving Danish websites since 2005. The archive is only accessible to researchers who have requested and been granted special permission to use the collection for specific research purposes.
The Japanese Web Archiving Research Project has been archiving websites since 2002. The National Diet Library Law, revised in 2009, allows the National Diet Library to archive Japanese official institutions’ websites. Websites of cultural and international events held in Japan, and those related to electronic magazines, are also archived based on the permission of their webmasters. Parts of the collection are available online.
Collections of archived web sites selected by subject specialists. Includes the subjects September 11, 2011; United States Election 2008; Iraq War 2003.
Since 2005 Library and Archives Canada (LAC) has collected a representative sample of Canadian websites. The collection contains over 170 million digital objects and more than 7 terabytes of data.
Maintained by: Internet Archive
Content: Harvested websites collected since the establishment of the Internet Archive in 1996. Over 364 billion webpages included so far.
Scope: international. Aims to collect as much as possible.
Access: search and access is freely availble online via the wayback machine.
Web archives equip researchers with a new and largely untapped resource from which to conduct research. They provide unique cultural insights into online and offline societies and can show how websites have transformed over time. Publications such as The Web As History (2017) showcase that the value of web archives are gaining an increased appreciation and awareness.
- ‘As part of the project, ten arts and humanities researchers were invited to use this web archive dataset to conduct cutting-edge research’.
A few examples of the research projects conducted using the BUDDAH project’s dataset are:
1. Cowls, Josh. “Cultures of the UK.” The Web as History: Using Web Archives to Understand the Past and the Present, edited by Brügger Niels and Ralph Schroeder, UCL Press, 2017, p. 220.
How to cite a Web Archive
Note: The recommended citations are based on the Bodleian’s preferred form for citing special collections material/archives. If desirable, date accessed and date archived could be added but this is not a standard across citation guides. The following format would allow flexibility for the users to add any necessary amendments depending on the citation/reference style they are using.
When using the web archive in your work it should be cited as would any other resource using the preferred forms of citation below.
Citing a collection (BLWA)
Individual seeds or web pages (BLWA)
Other citation / reference style examples
1. MLA Format
Webpage cited as normal adding (in < and >)the name of the archive (italicized) and the archive URL.