Our Biggest Challange - Digital Preservation
Posted on behalf of Bill Rueter
In 1995, in a seminal article on digital preservation in Scientific American, Jeff Rothenberg presented this hypothetical scenario:
The year is 2045, and my grandchildren (as yet unborn) are exploring the attic of my house (as yet unbought). They find a letter dated 1995 and a CD-ROM (compact disk). The letter claims that the disk contains a document that provides the key to obtaining my fortune (as yet unearned). My grandchildren are understandably excited, but they have never seen a CD before – except in old movies – and even if they can somehow find a suitable disk drive, how will they run the software necessary to interpret the information on the disk? How can they read my obsolete digital document? (Rothenberg, 1995)
If it seems funny to imagine CDs as unreadable antiques in 50 years, consider the storage mediums of the 1980s and early 1990s, which only 20-25 years later look almost Mesozoic:
Legacy formats from HSP’s Institutional Archives.
In Ghostbusters, released in 1984, when Janine Melnitz said to Dr. Egon Spengler, “You’re very handy, I can tell. I bet you like to read a lot, too,” Spengler famously, monotonously responded, “Print is dead.” And while predictions of paperless offices have proved premature, the papers that document an individual’s or an organization’s history are, increasingly, not actually created on paper, but rather digitally, via software programs. Digital preservation, therefore, is a pressing issue in archives that affects the integrity of not only the material that is already part of collections, but also affects decisions regarding the types of digital materials and file formats that institutions will collect in the future. Obsolescence with regard to file formats, software, media and hardware presents complex issues that are difficult to predict. Any preservation strategy that is employed must be designed to adapt to unknown changes. Even if, for example, the 3.5” disks found in HSP’s collections have not been damaged — their data neither erased nor compromised — there is no guarantee the newest version of Microsoft Word will open up documents that were created with WordPerfect, or even an earlier version of Word.
Emory University’s work with Salman Rushdie’s archive material has brought to light many of the issues involved with preserving legacy digital materials. Not only did the Emory archivists collect all of his printed material, but they took every computer, hard drive, CD, and diskette in Rushdie’s apartment. Erika Farr, Emory's director of born-digital initiatives, noted: "Rushdie's archive is pretty remarkable and high profile. It's a perfect one to start with. Much of his archival material after the 1980s, including daily calendars, virtual sticky notes, email correspondence and first drafts of novels, never existed on paper. We have close to his entire digital life up to 2006” (Naughton, 2011).
A Macintosh Performa 5400 like the one used by Salman Rushdie
If most of Rushdie’s archival material since 1990 never existed on paper, we can imagine how little material will be created on paper in the future.
My internship at HSP consists of two primary projects that will hopefully contribute to planning a digital preservation strategy: 1) Identifying materials within the collections that exist on digital formats, such as CDs and DVDs, and migrating the files to a separate, secure location, as well as identifying materials that exist on legacy formats, such as 3.5” and 5.25” floppy disks, WANG disks, audio cassette tapes, VHS tapes, open reel tapes, etc., and researching migration and/or emulation solutions to ensure their preservation; and 2) Interviewing the staff of HSP to determine the types of digital files that are being created during the course of business, how and where they are saved, and what is being done with them.
The goal of any digital preservation strategy is to provide long-term access to digital information, and that access is dependent on the integrity of each of the digital items. The challenge for archives is to preserve the integrity of the digital information that has already been collected and to have a plan in place for collecting and managing digital materials in the future. By failing to commit to digital preservation, institutions risk having Jeff Rothenberg’s hypothetical scenario become reality, and contributing to the “Digital Dark Age” – “the idea that historians of the future will look back to our present age as another Dark Ages since so much important information documenting our current civilization is recorded digitally and will have vanished” (Simons, 2004).
Naughton, J. (2011). If you have lofty ambitions for your legacy, head for the attic. The Observer.
Rothenberg, J. (1999). Ensuring the longevity of digital information. Scientific American, Vol. 272, No. 1,
Simons, G.F. (2004). Ensuring that digital data last: the priority of archival form over working form and
presentation form. Presented at the E-MELD Symposium on “Endangered Data vs. Enduring
Practice,” Linguistic Society of America Annual Meeting, January 8-11, 2004, Boston, MA.
For further reading:
Emory University Archives: http://www.emory.edu/home/academics/libraries/salman-rushdie.html
New York Times article: http://www.nytimes.com/2010/03/16/books/16archive.html?pagewanted=1&_r=1