Email storage to ensure long term access and preservation

Posted by

Hello everyone,

This is the first time I have posted to this forum – apologies if this topic has been discussed previously. If it has, perhaps someone could send me a link to the discussion?

I am looking for some advice/best practice on how we should store emails containing attachments so that both the email content and the attachment can togheter be preserved as a record long term? If anyone has any recommended ways of doing that I would love to hear from you!

Many thanks

Rolls-Royce Submarines Limited. Raynesway, Derby, DE21 7BE

Are you looking to do records “retention” or actual “archiving” (i.e. retain for life of long term asset, life of employee, indefinite, 50 years, 99 years, etc.)?

For these particular ones, the retention period will be 50 years and we’d like to retain them digitally so our assumption is as a starting point that we at least need to convert them from their native file format to PDF/A to give us any chance of being able to open/access them in the future but if anyone knows of any better methods I’d be very happy to hear about them.

Many thanks

Thanks for the extra info, Julie.
I would highly recommend you look at Preservica. Tell them I sent you.

All – Preservica certainly looks robust. A quick perusal of their website doesn’t yield an answer to what I see as the fundamental question, however: “Many years from now, if my organization’s archival content is in Preservica but Preservica itself no longer exists, how do I know that I’ll be able to get it back?”.

There’s a reference to OAIS ISO 14721 (digital preservation standards), but we all know from painful experience that standards don’t necessarily guarantee real-world interoperability or access.

Assuming that any of our organizations still exist at the end of a long-term archival timeframe, how can we be confident today that our successors will be able to retrieve the archived content if Preservica itself is no longer in operation?

RecordPoint Software

I don’t think there IS an answer to that, which is why Preservica didn’t answer it ;)). Seems to me that what you’re asking is getting into “crystal ball gazing” territory. For instance, 30 years ago I think it might have been difficult to predict that Enron (a $60B+ company at the time) would collapse like a house of cards in a 50 mph wind. Or, for that matter that someone would invent solid state drives (SSD’s). I suggested Preservica because I think they are the most realistic option given what we know and what technology we have and what standards we have, TODA

Good point, Lorne. I just finished reading The Ancestor’s Tale, by Richard Dawkins, which covers the evolutionary history of life on Earth. DNA and RNA figure prominently in that discussion as the mechanism whereby information is transmitted across the eons with relative consistency.

‘Relative’ being the operative word, since evolution itself happens because of mutations in the sequence that would be unacceptable to an archivist.

Your choice of illustrative examples makes me feel like Forrest Gump. I did a number of projects for Enron ‘before the fall’. I was working for Intel when the SSD (and its successor NVRAM technologies) came on the scene. And I was at the opening event for the Computer Museum West, which is referenced in this article about ‘hard core data preservation techniques’ as a candidate source of methods for reading outmoded media.

I suppose it’s similar to the problem faced by the designers of the Pioneer plaques on Voyager 10 and 11 – how to successfully convey information to recipients who might not even be carbon-based, much less conversant in terrestrial languages, digital or otherwise!

Short of carving the information in analog form on a piece of granite, accompanied by the modern equivalent of the Rosetta Stone for translation purposes, I suppose the only viable answer is to use a service like Preservica, protected by contractual guarantees that archive transfers will always be supported by the provider as long as the provider exists. (Along with a software escrow backstop.) Then test the effectiveness of the transfers before you leave things in the hands of your successors!

Thought-provoking discussion. Shades of the 10,000-year clock and the Deep Time Project. Of course, none of us will be around to observe the ultimate efficacy of whatever scheme we cook up!

RecordPoint Software

I would also suggest you look into Arkivum, especially their Perpetua solution. Arkivum, has a clear exit plan for its clients, including the necessary tools to capture your records should your organization decide to end a contract, or if Arkivum should cease to exist. We selected Arkivum over Perpetua for this and many other reasons.

Arkivum is currently developing an interesting email preservation system that will work in concert with Archivematica (which is included in Perpetua). The system will retain the structure of mailboxes, retain the connection between attachments and the original message, and support normalization of the attachments. It is one of the more sophisticated email preservation tools I’ve seen so far.

The Archives of the Episcopal Church

Preservica does a a few things with email which may be of interest. First, we unpack the transfer file (PST/MBOX for example) into individual emails with search metadata extracted. We then detach the attachments so they can be preserved separately. We then do a parsing of the bitstreams to identify the format of all the items in the record stack and extract any relevant technical metadata about the objects.We are now ready to save the emails, attachments, metadata and structure and index this ready for use. We keep multiple check-summed copies in multiple places and check them regularly for corruption, self healing if needed.

If the formats start to become unreadable we maintain a library of migration tools to migrate to a newer format. This can create a new master copy e.g. a newer version of the Word format, or an access copy, for example PDF/A, which is not necessarily 100% accurate but is easily used. All generations of the files are retained with an audit trail linking them.

We have easy exit mechanisms to make sure you can extract content and metadata whenever you want to along with easy search and browse tools.

This confirms with the ISO reference model and can be done at scale.

No one can promise they will be around for ever but we can claim this is what we do and we are trusted by national and state libraries and archives to look after their content


Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.