Skip to Main Content

Introduction to Digital Preservation: Fixity

Subjects: Digital Library

What is Fixity?

Fixity is a term commonly used in digital preservation when talking about digital files and bitstreams. Fixity means the state of being unchanged or permanent. Confirming a digital file's fixity means that it has remained the same over time. Often this process of confirming is called fixity checking or integrity checking. This process will verify that a digital object has not been altered or corrupted.

The most common way to confirm the fixity of a digital object is to create what is known as a checksum or hash for each individual file or in some cases, bitstream (mainly for audiovisual works). A checksum is a string of numbers and letters generated using a mathematical algorithm. A checksum is like a digital fingerprint for a file, because it will be unique for each file. 

The most common checksum algorithms used in digital preservation are: MD5, SHA-256 and SHA-1. However, there are others and they go in and out of use over time. It is important to know what algorithm was used to to generate the checksum for a digital file as they are not interoperable. 

By monitoring a file's integrity from as early on as possible, any loss or corruption to that file may be detected. However, a checksum has its limits. While a mismatch of checksums during fixity checking may flag that a file's checksum has changed, it cannot diagnose the problem with the file. It can only say there was one. It will be up to you to investigate further.

For more on Fixity and checksums, please read the DPC Handbook section on Fixity.

Image source: Jørgen Stamp, CC-BY 2.5

Fixity checking

There are a number of programs listed in the COPTR tool registry that can generate checksums and verify file fixity. Some of the common tools are:

  • Fixity 1.0 by AVP: this tool has a GUI interface. It allows you to set the frequency of checking and select the directories you wish to check over. It does not supply you with checksums, but it is good at checking files that may still be in use. While it might not be practical at scale, it is a good solution for personal or work files and certainly for researchers who want to manage their research data and outputs during the life of their project.
  • Unix commands: md5sum, shasum and others. These can be utilized in the command line on Unix machines and also written into scrips. There is no GUI interface for these commands and they come standards on Unix computers.
  • Md5summer: An application for Windows machines that will generate and verify md5 checksums.
  • md5deep and hashdeep: a set of cross-platform tools for computing checksums on any number of files and they can recursively process directory structures, which is much harder to accomplish with regular Unix checksum commands. It will also audit the files once it has checksums and will return any that do not match.

Checksums

Aside from verifying that file fixity has been maintained while the file is being stored, checksums have three other main uses:

  • To know that a file has been correctly transferred from the content owner to the preservation storage and backups
  • To verify that any copies of a file for backup are complete and correct
  • to be given to users of the file in the future so they are able to verify that have received the correct file from storage

 

Below are some examples of what various checksums look like for the following image.

Image By Walter Heubach (German, 1865–1923) (Upload: User:Jarlhelm) [Public domain], via Wikimedia Commons

File name: Heubach_cat.jpg  

Md5: 6d5b04d33455ac13a2291216e5b552a2

Sha-1: 1a26f9ce33857a5c742877aa8de982968d87f67b

Sha-256: 06a67229b29321064ab6b83cd3fce40bc8079666a1197d324e8f2ce28dd24dff

Fixity and data integrity

Data integrity is important to digital objects. It is about ensuring the maintenance and consistency of the data throughout its lifecycle. Maintaining fixity is a critical part of data integrity.

Other aspects include managing relationships between data and maintaining metadata for contextual purposes.

Fixity and PREMIS

Fixity can be recorded using the PREMIS metadata standard. It is referred to as a message digest, which is just another term for a checksum. It can record not only the checksum, but the algorithm that created it as well as the software and version. Any subsequent fixity checks can also be recorded using PREMIS, including the outcome of the check.

Recording this type of preservation metadata is crucial for confirming and establishing a digital object's "chain of custody".