During my final semester as a graduate student in Library and Information Science at the University of Illinois, I pursued a practicum at the Champaign County Historical Archives (CCHA). I was interested in gaining experience with digital preservation, which the Society of American Archivists (SAA) defines as "the management and protection of digital information to ensure authenticity, integrity, reliability, and long-term accessibility." One of the overarching principles of archival work is accessibility, the essential question being: Without access, why do we preserve? This question was my guide through the 14 weeks I spent at CCHA, ingesting and describing a number of CDs, DVDs, floppy disks, and cassettes for future digital consumption.

A stock photograph of cassette tapes.

The work I did relied heavily on the students and archivists who worked on this project before me, whose guidelines provided direction on how to preserve digitized and born-digital materials. The Digital Preservation Plan at CCHA consists of five major steps to ensure the integrity of the materials and their description, as well as the safety of the Archives’ computer system. Each of these steps is characterized by its reliance on one of five different softwares: MalwareBytes, DROID, BitCurator, OxygenXML Editor, and BagIt.

  1. MalwareBytes is a cybersecurity protection software that can scan files for potential malware and viruses.
  2. DROID stands for Digital Record Object IDentification. This software was created by The National Archives (UK) in 2005 to help archivists profile a variety of file formats in a short amount of time. DROID collects important file information, such as size and date of creation, and provides each file with a "checksum" or "hash" value. The value generated is totally unique to the file at the time it is profiled, which means that any changes to the file will create a new hash value (MD5). To ensure the authenticity of the file and its provenance, it is important that this value does not change throughout this process.
  3. BitCurator is an open source digital forensics tool created for collecting institutions, like libraries and archives. The first iteration of the project was completed in 2014, made possible by the collaboration of the School of Information and Library Science at the University of North Carolina, Chapel Hill (SILS) and the Maryland Institute for Technology in the Humanities (MITH). BitCurator does a lot of tech-sounding things, like pre-imaging data triage, forensic disk imaging, file system analysis, export of metadata, and identification of potentially sensitive information. It also generates reports and exports disc contents in Archival Information Packages (AIPs) and Dissemination Information Packages (DIPs). CCHA primarily uses BitCurator for its ability to preserve the provenance or historical journey of ingested files.
  4. OxygenXML Editor is a software that supports the composition of XML records. Archives may create XML records according to one of several standards, but CCHA relies on the Dublin Core Metadata Element Set. Record elements, such as "Contributor," "Format," and "Title," are completed as they apply to the file being described. It is important to include as many of the 15 elements as possible so that users may quickly understand the content, extent, and format of the digital materials.
  5. BagIt is a software that creates hierarchical file systems, established through a collaboration between the Library of Congress and the California Digital Library in 2008 as part of the National Digital Information Infrastructure & Preservation Program (NDIIPP). In a blog post celebrating BagIt's 10-year anniversary, Meghan Ferriter (Library of Congress) very helpfully identifies BagIt's three basic functions: 
    1. "It outlines file naming structures that are optimized to work as data moves across operating systems and file systems; 
    2. "It provides a manifest of all the files included in a delivery and provides a correct checksum value for each; 
    3. "It allows the senders and recipients to include additional provenance information so that anyone viewing the package can identify immediately what the package contains and who the contacts for it are."

CCHA uses these software to facilitate the digital accessibility of born-digital and digitized archival materials. At present, the XML records for the materials I worked with are not publicly available, but there are hopes to make them available in the future. Many institutions use open access software to link this metadata to their digital collections, including Samvera, AtoM (Access to Memory), and RODA. Archives staff are exploring options for the next steps in the digital preservation workflow, including patron access. 

Stay tuned for future updates on CCHA’s digital preservation journey!

-Dana Miller                                                                                                                                                                                                      Former Practicum Student