Here's How Historians Are Making Sure Emails Will Last Forever

By Rich Firth-Godbehere on at

Your emails contain the kind of information modern-day historians can only dream about. To find anything about the past, historians usually have to descend into dusty archives, searching through scraps of paper, old letters, documents and diaries, in the desperate hope they can shed some light on things that happened long ago. Much of history — such as the history of anybody who wasn’t rich, royalty, world leaders, influential thinkers, or dangerous criminals (often all at once) — is even more difficult to find; the ordinary folk didn’t leave much behind.

On the rare occasion ordinary folk do leave something behind, it can change our view of history. Take something as supposedly well-understood as the Roman army. Most military histories of the Romans tell of great generals, leading their troops into battle against the barbarians. They talk of senators and emperors, changing the world by their will alone. Of course, that’s mostly because these histories were written by the self-same generals, senators and emperors, or those close to them. We had almost no way of knowing what life was like for the average troop on the ground. Then on 22nd June 1992, at the remains of the old Roman fort of Vindolanda just south of Hadrian’s Wall in Northern England, some small and fragile pieces of postcard-sized wood covered in writing, were discovered.

These Vindolanda Letters are from the general infantry, not some powerful leader. They are postcards sent between common soldiers, their families who had come with them, and the people they left behind in Gaul and southern Britain. They aren’t in the perfect Latin of a Roman scholar, they contain spelling and grammatical errors, even shorthand and what is thought to be slang. The letters include requests for underwear, party invitations, military dispatches, and applications for leave. They are windows on real life. They told us that beyond the statue-like image of a soldier were real people, people whose socks got holes in them, who sometimes badly needed a holiday, and who liked to party now and again. The letters humanised them, told us something of their lives and made them more relatable than tales of the conquests of Julius Caesar will ever be.

Unfortunately, this sort of record is very rare. The result is that most of history is the history of great men — and it is almost always men. Even if you dispute whether we have a patriarchal and elitist system in The West today, anyone who looks at the historical record for any length of time will find the idea that we used to have those systems impossible to deny without contorting their intellects into knots. Women, the poor, the working classes, people of colour: these are all very difficult to study. There’s an entire field called ‘history from below’ that scours old contracts, birth certificates, gravestone inscriptions and any other clues they can to give lost communities a voice. They examine things like the party invitation found in the Vindolanda Letters. That invitation may be the earliest known writing in Latin by a woman and surely wasn’t the only writing by a woman at the time. It is however, all historians have to go on. The truth is, history isn’t really just the stories of rich white men, it just appears that way.

Thankfully, the historians of the future won’t have those sorts of problems because you, me, them, everybody, sends emails. Lots and lots of emails. From the wealthiest billionaires to bedsitters on benefits, for over 20 years, we’ve been sharing information about ourselves electronically and storing it. A lot of information.

Despite the rise of Facebook, Twitter, WhatsApp, Snapchat and the other myriad of supposedly email-killing messaging systems that have come and gone, 2.6 billion of us still rely on good old-fashioned emails. We still send a whopping 125 billion messages every single day to each other. With all those testimonies, tantrums, seductions and spams, a historian of below, in the future, will find that his cup runneth over. The problem is, all of that useful data is at risk.

Emails will not survive on their own. A lot of them are kept secret, a lot are destroyed. Some archival systems keep attachments, while others do not. If future historians are to be able make use of the data, it has to be stored somewhere, and that could use a lot of resources. Hard drives, with the best will in the world, don't last forever, and nor does any other technology that could be used to preserve them at present. Any chance to understand important historical information — from how ordinary workers spent their days to what was actually in all of Hillary’s emails (I’m guessing not orders for pizza) — could disappear forever. This means that future historians will have to hope that someone in the here and now has been thinking about them. Thankfully, somebody is/has/will have been.

The Archive of Future’s Past

In November 2016, the Task Force on Technical Approaches for Email Archives was set up by the Andrew W. Mellon Foundation and the Digital Preservation Coalition. The Task Force is made up of some the best and brightest archivists from around the world — think the Mission Impossible team, but with more cardigans and tea. Their job, amongst other things, is to work out how to preserve your emails. Recently, they released a report: The Future of Email Archives. The report speculates on ways they might fulfil their mission, and, thankfully, it won’t involve climbing any tall buildings or hanging off the sides of aeroplanes.

The Task Force has identified three ways the data could be stored. The first is through Bit-Level Preservation, which leaves the emails as they are while protecting them from bit-rot, accidental manipulation and deletion of the data. This uses techniques that make sure the bits of data that make up your emails are all present and correct using ‘checksum creation’ — constantly checks everything is as it should be — and ‘fixity checks’ — constantly making sure the file paths are the same as when the archive was set up. Added to this would be regular archival backups. The main idea is to have three backups, all in different parts of the world, all with these checks going on. The thought is that to lose one archive would be unfortunate, to lose two would be unlucky, but to lose three — well, someone somewhere has far too much time on their hands.

The next question is how to store that data. Should it be on something physical and fixed, like hard disks, solid state drives or magnetic tapes? Should it be on the cloud — which brings us back to hard disks, SSD or tape. Should it be put on portable handheld devices? The most likely candidate is a mixture of all of those making up the three versions of the archives, but sadly none of those can be guaranteed to be as future-proof as they might need to be. The idea is that these records will last for hundreds, possibly thousands of years. Not on magnetic tape, they won't.

The second method of preservation they have identified is Migration. Migration involves making sure that the many formats the emails come in are migrated to the most stable format available at any given time, and that this migration to more stable formats keeps happening. This method avoids the problems that an obsolete format might cause, and it also avoids Bit-Level Preservation’s storage problem. The information is migrated to the latest format and so, presumably, the latest storage type every so often. But there are other problems.

Migrating not only emails would be a big and difficult job. Emails themselves are already stored in a variety of different ways (MBOX, EML, XML, etc.), and few of those preserve the attachments. Even if they did, migrating every .os, .pdf, . txt, .doc, .docx, .docwhatevermicrosoftdoesnext file into a single format would be extremely hard, and that’s just text. Add the trillions of images, audio samples, metadata strings, video formats and the rest, and the process becomes extraordinarily time-consuming.

More worrying is that, according to the report, no one who makes migration systems can agree on exactly what they should be migrating. The report suggest that “it would be helpful to know whether particular tools include information such as whether an email was read and if they preserve metadata such as flags or keywords.” We have no idea what will be of interest to future historians. Some might want to do keyword databases to work out what was important at any given time, some might want to know what emails we were flagging as ‘important’. Some may even want to know what we thought was spam and what we didn’t. There may be a whole future PhD thesis on how African prince’s struggled to get their money from oppressive banking regimes for all we know. The difficulty is in predicting what might be needed and finding ways to migrate that, possibly hundreds of times over the next thousand years or so. Quite soon, the whole thing starts to look a little too expensive and labour intensive.

The third way is Emulation. With a bit of searching, those of us nostalgic to play Llamasoft’s classic game Gridrunner — in its original glorious 8-bit Commodore Vic20 format — can download both the game and something that tricks a PC into thinking it’s a Vic20. That last bit of software is an emulator. In this method, your emails will be preserved exactly as they are in their current formats and systems, albeit with a little migration to new drives as the old ones run out. Emulation software will then be developed so that in the future you can use your Enhanced-Reality 5D Holographic Projection Platform, also known as Windows 52, to access them in their original formats. The positives are that this allows future historians to analyse the emails as originally intended without the changes in interpretation that seeing them in 5D holography might cause. There are a few issues. One is that it means that future historians will have to learn how to use older versions of software. I had to learn to read Latin and sixteenth-century handwriting, so I have no sympathy for my future colleagues there. The problem is that the legacy software will, itself, need preserving, which takes us right back to the original problem.

There is also the problem of copyrights. Any single email can belong to a large number of copyright holders — the intellectual copyright of the writer, the licensing rights of the software creator that might still be in force should emulation be used, ownership of the company for whom the writer worked when writing the email, the copyright of the person they are responding to and so on. The copyright laws can differ from country to country as emails cross continents every second of every day. This is a legal nightmare. Any modern historian will tell you of the legalistic nightmare of trying to publish a collection of letters, or work that contains extracts from them. Doing the same with emails will be harder still.

Creating an E-Archive

Before anyone in 2235 could even think about publishing a collection of emails there’s a more immediate challenge. How do we get the data in the first place? Most organisations have strict privacy policies when it comes to emails. Quite often, your job will keep hold of your emails for legal reasons even long after you’ve left. But almost all companies have a policy of deleting them after a given time. The reason is simple — businesses, large and small, simply don’t have anywhere to keep them. The report suggests trying to find ways to get companies to filter out unwanted emails and store important ones, but who knows what will be important to future historians. I’m sure a wet and cold Roman soldier, sat at the edge of the known world around 2000 years ago, wouldn’t have thought we’d be fascinated by the holes in his socks.

Let’s forget work emails and think about your private emails and assume you own the intellectual copyright to all of them. The report suggested asking individuals if they would donate their personal email archives to a historical association of some kind, but that assumes that we never delete our archive folders. Most people do, and often they are encouraged to do so by the web hosts. Even Google doesn’t have infinite storage resources. The result might be that future historians know a great deal about what people wrote in personal emails in the last few years of their lives, but little more.

The suggestion in the report is that people working on these archives “might consider identifying writers, scientists, politicians, and others at an early stage in their careers and build a productive working relationship with them over time”. That may reduce the storage issues, but it takes is right back where we started — the histories of the mighty, not the histories of a partying footsoldier’s wife.

The report, sadly, doesn’t solve the problem of which route to go down. Instead, the Task Force has planned a series of actions. Getting people archiving emails now in whatever format they can, training and advocating for how important keeping all this data is, testing all the software that has been created to store and analyse emails and supporting communication between any programmers who are already taking the challenge on.

The report does make one thing perfectly clear: the future of making history into our history, not just the continuing history of the 1%, starts now. We might just have to work out how to do it as we go.