File format validation does a number of functions that help to confirm a file format is well-form and valid. Validation will:
For these reasons, file format validation is important. It is an especially useful tool for digitization workflows as it will ensure that digital objects are being created correctly. When you are in control of creating a digital object, validation is an important step. However, it is important to know that file format validation has the following limitations:
This is why fixity is equally as important in digital preservation. It can help detect visual corruption of those files with the early generation of what is known as a checksum. The section on fixity goes into greater detail on creating and confirming checksums and their uses in digital preservation.
The most common validation tool is JHOVE, maintained by the Open Preservation Foundation. It is an open source validation tool that can validate the following file formats:
JHOVE stands for JSTOR/Harvard Object Validation Environment. It was a joint project between JSTOR and Harvard University to create a tool to validate files and extract metadata. In 2015, the maintenance of the software was transferred to the Open Preservation Foundation.
MediaConch is an implementation checker, policy checker, reporter, and fixer that targets preservation-level audiovisual files (specifically Matroska, Linear Pulse Code Modulation (LPCM) and FF Video Codec 1 (FFV1)) for use in memory institutions, providing detailed and batch-level conformance checking. It has an interface accessible by the command line, a graphical user interface, or a web interface. While it validates several audiovisual file types, it does not validation every file format type.
The policy checker part of the tool is useful, but it complex and requires a certain level of knowledge about the different file formats.
Jpylyzer is a validation tool for JPEG2000 (JP2) images. It also reports on the image's technical characteristics or technical metadata (called a feature extraction). It is an open source tool maintained by the Open Preservation Foundation. The creation of this validation tool was made possible by partial funding from the EU FP 7 project known as SCAPE. It is a richer validation tool for JPEG2000 images than JHOVE and is therefore preferred for validating this file type. It is commonly used in digitization workflows were TIFF files are migrated to JPEG2000 storage and access reasons.
Unlike JHOVE, Jpylyzer will only validate one file format, but it has a richer validation rules set for JPEG2000 than JHOVE.
EpubCheck validates EPUB files and will extract technical and other embedded metadata. It checks things such as:
It was largely developed by Adobe Systems and is currently supported by the International Digital Publishing Forum (IDPF).
An online version of EpubCheck is available at: http://validator.idpf.org/
veraPDF validates all PDF/A parts and conformance levels. PDF/A is a version of PDF intended for long term preservation and archving of electronic documents. PDF/A is meant to prohibit features that are not suitable for long term preservation, including font linking (instead it will embed the font file in the document), encryption and annotations. However it does not work for every document and creating a valid PDF/A can be labour intensive. Conformance levels include A (Accessible), B (Basic) and U (Unicode). U was created to deal with the specialized fonts and characters like Greek, Arabic, Chinese and so on. On top of conformance levels, there are also three versions of PDF/A, which means a PDF/A document has a version number and conformation level associated with it.
veraPDF will help to validate the various versions and conformance levels of PDF/A, but will not be able to validate any other version of PDF -- JHOVE will be required for that. It is good practice to validate a PDF/A file using both veraPDF and JHOVE as both validate different aspects of the PDF file.
There are several other file format validation tools available. These include, but are not limited to:
The COPTR registry of digital preservation tools has a list of further file format validation tools.