Unlike the works of William Shakespeare or the mathematical proofs of Albert Einstein, the newest works in literature and science are now usually written and stored on a computer.
But whether kept on discs, computer hard drives or old-style magnetic tape, digital works come with preservation and storage issues that are only just being realized.
Lisa Goddard, the librarian responsible for digital scholarship at the University of Victoria, says many of the issues stem from simple lack of experience. We have yet to grasp the how or even why of digital preservation.
“We’ve had this digital world since about 1995, when broadband first started to come into people’s homes,” said Goddard. “So we have about 20 years’ experience preserving digital objects and work.
“Meanwhile, we have had about 2,000 years of preserving physical objects, scrolls and paper and have got pretty good at that,” she said. “But we are rank beginners when it comes to the digital world.”
Librarians such as Goddard, along with computer specialists, are now beginning to tackle the preservation of digitally produced and recorded material. But the enormity of the task is only just being realized, as it encompasses everything from government emails to written reports, websites, research material and personal correspondence.
“This is a problem for our age,” said Goddard.
Consider how digital data has taken over the entire world of written human expression and recorded experience.
For example, more people store family snapshots on their cellular phones than keep photographs. Relatives and friends are much more likely to send emails than a paper letter.
But 10 years from now, how many people will even own the same cellular phone? How many people will own the same personal computer with the emails?
Meanwhile, governments, corporations, universities and just about every other institution in the world now communicate within and without via electronic data, whether it’s email or digital files.
But where will those files or that correspondence be stored? Will they be accessible or even usable in 10 years or longer?
Digital files degrade and deteriorate. One or two bits of data get lost and an entire file can become useless.
So digital material must be copied and re-copied and sent to long-term storage. Special computer programs must be designed and enlisted to check through preserved data to make sure no corruption has arisen.
And if you’re dealing with digital material stored on computer discs, will those discs even be readable decades later? Just like music CDs, computer discs are perishable. So they must be managed, stored and copied.
That’s not even considering the Internet, which Goddard likens to a fast-flowing river. It’s never precisely the same from any one moment to another.
“Anything on the web to my mind is transitory,” said Goddard. “Putting something on the web is not a good way to preserve it.”
Some effort is now being made to freeze-frame certain websites at intervals, or “harvest” certain elements of the internet from time to time. But the sheer volume of data online is without precedent and an enormous challenge to archivists.
Furthermore, once any data is stored, it must be catalogued and made accessible. There’s no point to storing vast amounts of data unless it can be retrieved.
So librarians, research universities and super-computer organizations are now coming together to tackle the issue.
Groups such as the Canadian Association of Research Libraries are working with high-performance computer outfits such as Compute Canada, a government-supported agency.
Compute Canada was originally formed to offer cutting-edge researchers the chance to use fast, powerful computing services capable of handling vast amounts of data. Now it is also looking at storing and managing vast amounts of material written and stored digitally.
John Simpson, Compute Canada’s humanities and social sciences specialist, said an effort is underway to link up regional computer networks across Canada.
The goal is a system that allows a user to gain access to any participating data centre across Canada. Users won’t be using one data system, but the experience should be like that.
“We need to be able to find data, work with it, then put that new knowledge back for others to use,” said Simpson.
“Whether it’s something to do with gene sequencing, environmental simulation, weather forecasts or new translations of Shakespearean texts, it won’t matter.”
At UVic, Goddard says the copy of record for all research papers or treatises is now the digital version. A paper copy is likely kept by the original scholar. But UVic relies on its own data centres to record the work and make it available for other scholars.
But those research works can be incredibly complex, going far beyond the written conclusions.
Much scientific research is the result of data generated by whole arrays of software programs, each one custom written and all working together. All that software must be preserved so the work can be replicated by other scientists.
That doesn’t even take into account the possibility of power outages or natural disasters such as earthquakes, especially on the West Coast.
So UVic, along with other universities, backs up its collections with copies stored in each other’s data centres. It’s another reason for the Compute Canada effort. Yet most of this work has barely started.
“We are only just beginning to build the kind of infrastructure and knowledge and tools that we will need,” said Goddard.
“Right now, we do our best to create lots of different copies and formats and we put them in lots of different places and we think that’s the best way to hedge our bets.
“But this is a very complex challenge and it is not a challenge that a single institution or a single company is going to solve. It is a network challenge that is going to require a lot of co-operation and duplication and tool-building together.”