Friday, July 11, 2008

Web sites of interest

Association for Preservation Technology International


Digital Library Federation


International Internet Preservation Consortium


Library of Congress: Digital Preservation


Library of Congress: Preservation


The National Archives: Preservation and Archives Professionals


National Center for Preservation Technology & Training


National Film Preservation Foundation


National Preservation Institute


Northeast Document Conservation Center


Preservation Resources - Digital Library SunSITE

Loss

Remember Stereo 8, better known as 8-Track, Betamax, or Laserdiscs? If you have never heard these terms, you are not alone. Now think ahead to what your granddaughter may remember of PDF (Portable Document Format), JPEG (Joint Photographic Experts Group), Blu-ray, or MPEG-1 (Moving Picture Experts Group) Audio Layer 3—better known as MP3—20 years from now. Purchase a laptop tomorrow and three months from now there will be a newer model with fancier bells and whistles that makes the top of the line, specially configured laptop you are still paying for look like a Commodore 64. Should you be worried? Yes. Why? Because as Ray Kurzweil stated in his Foreword to Bryan Bergeron’s book, “[A]ccessing information stored in digital form decades (and sometimes even just years) later is extremely difficult if not impossible (Bergeron, 2002, p. xiii). Remember that brilliant final research paper in English 1101? It took hours to write using WordStar on a PC running MS-DOS and you still have that 5 ¼ floppy you saved it on, but you would be hard pressed to find a working computer with a 5 ¼ floppy drive, let alone the version of WordStar you used back then installed. Although it is possible to find emulators that run on currently available operating systems that will allow you to open that file, accessing the file on that 5 ¼ floppy is an entirely different proposition.



The preceding example is a rather simple one, but it illustrates Bergeron’s caution about writing our precious heritage in "disappearing ink" (Bergeron, 2002, p. xiv). Project that scenario to the vast stores of information being digitized in special projects by Fortune 500 companies, university libraries all over the world, your local municipalities and state agencies, and the Library of Congress, to name a few. The prospect of catastrophic losses of data and the subsequent loss of the information they provide which supports a storehouse of knowledge from which understanding may be gleaned is very real (refer to Bergeron’s definitions of the underlined terms on page 9 of his book). According to Stuart Kelly, author of The Book of Lost Books: An Incomplete History of all the Great Books You’ll Never Read, "Loss is not an anomaly or a deviation or an exception. It is the norm. It is the rule. It is inescapable" (as quoted in, Zorich, 2007, para. 2). While loss may be inescapable, loss of the magnitude considered by Bergeron should be mitigated to the greatest extent possible by “intelligent action through explicit, conscious decisions on how to proceed with the complex socio-technical phenomena that we call the digital revolution” (Bergeron, 2002, p. xxi).



One organization that has given some thought to the mitigation of data loss is the Koninklijke Bibliotheek (KB), National Library of the Netherlands. The KB through its e-Depot currently archives over 10 million international e-publications as part of an agreement signed with Elsevier Science and Kluwer Academic. The majority of these articles are in various PDF formats ranging from version 1.0 to version 1.6; however in the next five years, the KB expects to add increasingly more articles in a wider variety of formats which has forced it to reconsider its digital preservation strategy. Currently the two most viable strategies considered for long-term preservation are migration and emulation (Rog and van Wijk, 2008, pp.1-2). Whichever one is chosen, the ultimate goal is the sustainability of access to digital objects which must also take into account the ability to “manage and maintain records of change, original formats, and relationship and version information to describe the processes that led to the current form”—an issue of both documentation and authenticity (Bradley, 2007, p.157), as well as one of data integrity (Bergeron, 2002, p. xix).



The KB has developed a file format risk assessment which rates file formats on a score of 0 to 100 where file formats with the highest scores are more suitable for long-term preservation use and the score of a format can vary over time. The assessment is based on seven criteria: Openness broken down into standardization, restriction on the interpretation of the file format, and reader with freely available source; Adoption broken down into world wide usage and usage in the cultural heritage sector as archival format; Complexity broken down into human readability, compression, and variety of features; Technical Protection Mechanism (DRM) broken down into password protection, copy protection, digital signature, printing protection, and content extraction protection; Self-Documentation broken down into metadata and technical description of format embedded; Robustness broken down into robust against single point of failure, support for file corruption detection, file format stability, backward compatibility, and forward compatibility; and finally Dependencies broken down into not dependent on specific hardware, not dependent on specific operating systems, not dependent on one specific reader, and not dependent on other external resources (Rog and van Wijk, 2008, pp. 3-4). This approach by the KB illustrates the theme of "Stewardship in the Digital Age" of the eighth annual WebWise conference which considered how "to maximize our preservation successes in the digital era" (Zorich, 2007, para. 3).



Works Cited
Bergeron, B. (2002). Dark ages II: When the digital data die. Upper Saddle River, NJ: Prentice Hall.

Bradley, K. (2007, Summer). Defining digital sustainability. In M. V. Cloonan & R. Harvey (Eds.), Preserving cultural heritage [Special issue]. Library Trends, 56(1), 148-163. Retrieved March 30, 2008, from Library Literature & Information Science Full Text database.

Rog, J., & van Wijk, C. (2008, February 27). Evaluating file formats for long-term preservation. Retrieved March 30, 2008, from National Library of the Netherlands Web site: http://www.kb.nl/hrd/dd/dd_links_en_publicaties/publicaties/KB_file_format_evaluation_method_27022008.pdf

Zorich, D. M. (2007, July 2). Defining stewardship in the digital age. First Monday, 12(7). Retrieved March 30, 2008, from Library Literature & Information Science Full Text database.

Rumminations (couldn't think of a better title)

Certain periods in human history are known by specific names; one such period is called the Dark Ages—referring not to a lack of sunlight, but rather to a paucity of knowledge and significant intellectual discourse. In his book, Digital Dark Ages II: When the Digital Data Die, Bryan Bergeron made a compelling case for the possibility that “[t]he United States, one of the most technically advanced nations on the planet, is poised to enter a second Dark Ages—a time when what we leave behind will be viewed as negligible compared to the previous centuries” and that furthermore, this second Dark Ages is not predicated on a catastrophic natural event such as dramatic climate changes or on a nuclear Armageddon, but that it can stem from “the natural progression of systems and processes already in place” (2002, p. xxvi). However, the United States is not the only nation that could potentially suffer such a setback. As we embrace the conveniences, richness, versatility, and flexibility that come with life in a digital world, we are moving further away from the traditional and time tested means of recording the fruits of human endeavors for posterity. This is partly due to the newer is better syndrome, but the reality is that space restrictions make keeping hard copy records less and less practical.



There are many ways to preserve something. The choice comes down to the nature of the item being preserved, how the preserved item will be accessed, and how long one foresees the item will continue to be of use. Preservation in general is a very complicated proposition when you really consider all that is involved not only in infrastructure costs, in human labor and intellectual costs, but also in choosing what to preserve. It can really all a big coin toss, especially when it comes to the more esoteric or less practical (I'm not sure if I am really conveying what I mean with that term). I think one can make a clear case for preserving practical knowledge that are essential to the survival of humanity and that bring us our human comforts, but then again our ancestors did ok without, so who's to say that our descendants couldn't find their way back to it again if it came down to it. I strongly believe in the resilience of humanity and its ability to bounce back from the disaster. The preservation of culture and heritage is a much harder sell I think. Culture and heritage can play as important a role as the practicalities of physics, chemistry, medicine, engineering in humanity's ability to recover from catastrophe, so it is difficult to see which should be the greater focus of preservation efforts. When the chips are down one never knows what bit of knowledge, arcane and irrelevant as it may seem now, may be the thing that takes us around the bend.



In a perfect world we would be able to preserve everything, but physical space is limited, resources are limited, and storage costs--perhaps not as much as 10 or even 10 years ago, but it still costs. So given those constraints, choices must be made, priorities must be set for what gets preserved and what does not. However distasteful it may seem, someone has to make that call. There will always be some bias introduced in the process and there will be agendas and attempts to manipulate the process to specific ends, but that is not anything we should be surprised about. The best that we can hope for is that the decison-makers make more great and valid choices than bad ones. Only our descendants will be able to say how well, or not, we did in making those tough calls.


Works Cited
Bergeron, B. (2002). Dark ages II: When the digital data die. Upper Saddle River, NJ: Prentice Hall.