Remember Stereo 8, better known as 8-Track, Betamax, or Laserdiscs? If you have never heard these terms, you are not alone. Now think ahead to what your granddaughter may remember of PDF (Portable Document Format), JPEG (Joint Photographic Experts Group), Blu-ray, or MPEG-1 (Moving Picture Experts Group) Audio Layer 3—better known as MP3—20 years from now. Purchase a laptop tomorrow and three months from now there will be a newer model with fancier bells and whistles that makes the top of the line, specially configured laptop you are still paying for look like a Commodore 64. Should you be worried? Yes. Why? Because as Ray Kurzweil stated in his Foreword to Bryan Bergeron’s book, “[A]ccessing information stored in digital form decades (and sometimes even just years) later is extremely difficult if not impossible (Bergeron, 2002, p. xiii). Remember that brilliant final research paper in English 1101? It took hours to write using WordStar on a PC running MS-DOS and you still have that 5 ¼ floppy you saved it on, but you would be hard pressed to find a working computer with a 5 ¼ floppy drive, let alone the version of WordStar you used back then installed. Although it is possible to find emulators that run on currently available operating systems that will allow you to open that file, accessing the file on that 5 ¼ floppy is an entirely different proposition.
The preceding example is a rather simple one, but it illustrates Bergeron’s caution about writing our precious heritage in "disappearing ink" (Bergeron, 2002, p. xiv). Project that scenario to the vast stores of information being digitized in special projects by Fortune 500 companies, university libraries all over the world, your local municipalities and state agencies, and the Library of Congress, to name a few. The prospect of catastrophic losses of data and the subsequent loss of the information they provide which supports a storehouse of knowledge from which understanding may be gleaned is very real (refer to Bergeron’s definitions of the underlined terms on page 9 of his book). According to Stuart Kelly, author of The Book of Lost Books: An Incomplete History of all the Great Books You’ll Never Read, "Loss is not an anomaly or a deviation or an exception. It is the norm. It is the rule. It is inescapable" (as quoted in, Zorich, 2007, para. 2). While loss may be inescapable, loss of the magnitude considered by Bergeron should be mitigated to the greatest extent possible by “intelligent action through explicit, conscious decisions on how to proceed with the complex socio-technical phenomena that we call the digital revolution” (Bergeron, 2002, p. xxi).
One organization that has given some thought to the mitigation of data loss is the Koninklijke Bibliotheek (KB), National Library of the Netherlands. The KB through its e-Depot currently archives over 10 million international e-publications as part of an agreement signed with Elsevier Science and Kluwer Academic. The majority of these articles are in various PDF formats ranging from version 1.0 to version 1.6; however in the next five years, the KB expects to add increasingly more articles in a wider variety of formats which has forced it to reconsider its digital preservation strategy. Currently the two most viable strategies considered for long-term preservation are migration and emulation (Rog and van Wijk, 2008, pp.1-2). Whichever one is chosen, the ultimate goal is the sustainability of access to digital objects which must also take into account the ability to “manage and maintain records of change, original formats, and relationship and version information to describe the processes that led to the current form”—an issue of both documentation and authenticity (Bradley, 2007, p.157), as well as one of data integrity (Bergeron, 2002, p. xix).
The KB has developed a file format risk assessment which rates file formats on a score of 0 to 100 where file formats with the highest scores are more suitable for long-term preservation use and the score of a format can vary over time. The assessment is based on seven criteria: Openness broken down into standardization, restriction on the interpretation of the file format, and reader with freely available source; Adoption broken down into world wide usage and usage in the cultural heritage sector as archival format; Complexity broken down into human readability, compression, and variety of features; Technical Protection Mechanism (DRM) broken down into password protection, copy protection, digital signature, printing protection, and content extraction protection; Self-Documentation broken down into metadata and technical description of format embedded; Robustness broken down into robust against single point of failure, support for file corruption detection, file format stability, backward compatibility, and forward compatibility; and finally Dependencies broken down into not dependent on specific hardware, not dependent on specific operating systems, not dependent on one specific reader, and not dependent on other external resources (Rog and van Wijk, 2008, pp. 3-4). This approach by the KB illustrates the theme of "Stewardship in the Digital Age" of the eighth annual WebWise conference which considered how "to maximize our preservation successes in the digital era" (Zorich, 2007, para. 3).
Bradley, K. (2007, Summer). Defining digital sustainability. In M. V. Cloonan & R. Harvey (Eds.), Preserving cultural heritage [Special issue]. Library Trends, 56(1), 148-163. Retrieved March 30, 2008, from Library Literature & Information Science Full Text database.
Rog, J., & van Wijk, C. (2008, February 27). Evaluating file formats for long-term preservation. Retrieved March 30, 2008, from National Library of the Netherlands Web site: http://www.kb.nl/hrd/dd/dd_links_en_publicaties/publicaties/KB_file_format_evaluation_method_27022008.pdf
Zorich, D. M. (2007, July 2). Defining stewardship in the digital age. First Monday, 12(7). Retrieved March 30, 2008, from Library Literature & Information Science Full Text database.
1 comment:
Loss of data due to loss of the ability to read the format it is saved in is definitely the greatest threat to digitization. We assume that thing stored on the web will be available forever, after all nothing short of a tremendous disaster will make the web obsolete. But even the web evolves. Many tags that I used to write my first web page are now depricated. That web page doesn't display properly anymore. We have to be careful of this, lest we lose all the labor we put into digital reformatting.
Post a Comment