Skip to main content

Introduction to Digital Preservation: Identification

Subjects: Digital Library

Identifying file formats

File format identification is an important part of digital preservation. Knowing what type of file format you have and what version it is, will assist with preservation planning for that digital object. It will also provide information on the types of software programs that can open and render the digital object. It is important to note that a program may be able to open a particular file format, but it may not render it correctly. This means that the look and feel could be altered, sometimes slightly and sometimes to make it difficult to interpret. This is particularly true for older file formats that were created with legacy software programs. Be aware that "legacy" can mean only 10 years!

Knowing the file format and version of a digital object also means you can plan for its future. Does it need to be normalised on ingest? Does it need to be migrated to a new file format? Would emulation be a better fit? This is all part of preservation planning.

File format identification tools and methods constantly improves and develops. File format identification should not be seen as a one-off activity which is only ran when a digital object is first given to a repository; it is good practice to regularly run identification software over collections to benefit from new tool developments. 

DROID

DROID (Digital Record Object Identification) is a tool for automated batch identification of file formats. DROID uses the PRONOM registry to identify file formats based on file format signature, file extension and other technical information contained in PRONOM. It can export reports to .CSV files for querying and creating statistics from.

DROID is a free and open source digital preservation tool. The newest version can be downloaded here.

 

PRONOM

PRONOM is a technical registry of file formats that has been created and maintained by The National Archives. It contains information about file formats and supporting software products or technical components. It is a resource to support ingest and long-term digital preservation.

It is regularly maintained and updated by the National Archives. While it is not a comprehensive list of file formats, submissions are encouraged. Researchers working with rare and proprietary file formats, as well as research data managers and archivists have made submissions to PRONOM. Information on how to submit can be found here.

Other identification tools

Siegfried is file format identification tool that uses the PRONOM registry, but is available to use in the web browser as well as available for download and installation. 

FIDO is available from the Open Preservation Foundation and also uses the PRONOM registry.