Every Book Ever Published Could Conceivably Fit On To A Single Storage Device

By James O Malley on at

Imagine the scenario: An asteroid is heading for Earth. When it hits, it will wipe out everything - and take with it not just seven billion lives, but human civilisation. It will have all been for nothing, with not a trace of human achievement to show for it.

Luckily, NASA has a plan. It has built a rocket that will take a select few off to start a new civilisation on Mars. But what can we take to help kick things off? Will a Picasso or a Renoir be more useful on Mars? Is Bach really more important than The Beatles? How can we make sure that Piers Morgan doesn't manage to sneak aboard?

There is one thing we might not have to worry about though: Preserving our books. Books are perhaps uniquely important relics of our civilisation, as together they are the sum total of our knowledge. And thanks to modern technology, we might not have to choose whether at least a little bit of of Dan Brown is more important than Shakespeare's really obscure stuff. This is thanks to endless growth in storage capacity.

Extra Capacity

Since 2011, the average size of a hard disk sold by the likes of Seagate and Western Digital has more than doubled. Meaning that we can carry around ever more data with us - and keep ever more backed up.

This breathtaking growth can also be illustrated by this well known image of MicroSD cards a few years apart. The one on the right can hold one thousand times as much data as the one on the left.

Sadly I can't find the original source to credit here as the image is all over the place.

This is put into even greater relief when you look at a bookshelf and try to conceive how much information contained within. If all of the books on yourself were digitised and stored as byte characters, they would probably all fit dozens, if not hundreds of times on to the memory card on the left, let alone the right.

And this is what made me curious: How much space would we need to store every book ever? What if we needed to prepare a hard disk for humanity's survivors?

Staring at my bookshelf, I started to wonder how much storage would it take to fit every book ever written? Say we digitised every book, how many hard disks would it require? Could you fit them on to the back of a truck? Heck... could you fit them into a bag?

Could it really be possible to carry on your person everything from The Bible to the Donald Trump's Art of the Deal?

Doing the Maths

So how have I figured this out? First, we need to know the collective file size of all of the books ever. To find this out, I got in touch with Professor Mark Davies, a Professor of Linguistics at Brigham Young University in the US, who deals with enormous databases (or corpuses) of books. His work involves mining the enormous Google Books corpus. So I asked him to do a back-of-the-envelope calculation for me.

He says that Google has 5 million books digitised. The average length of a book is around 70,000 words in the Corpus of Historical American English, and at an average of 5 characters a word, plus one extra character for a space. That would result in a total file size of... 2.1TB.

Yep - 2.1 terabytes. That's pretty much the same as a standard hard you could buy in a shop today. You could carry around everything that Google has spent millions of dollars digitising in your pocket.

That really isn't that much. Of course, if you really did want to store every book like this it would probably be larger - because of formatting, and the need to store metadata and so on - but as a baseline, you could each store that on, say, this £110 2.5TB Western Digital disk drive.

But that's on the number of books stored by Google. What about if humanity were able to digitise every book ever? Could they be packed into Tim Peake's carry on luggage?

In 2010, Google's engineers tried to estimate the total number of books ever published, and (with a huge number of caveats and no doubt large error bars) arrived at a figure of 129m - significantly more.

If we apply the same calculation here - with an average of 70,000 words per book and six bytes for each word (plus a space) - that would mean that, in theory, every book ever would be around 54.18TB in total.

That's much larger, but it isn't that large by modern standards. This year the Large Hadron Collider is expected to produce 60PB of data - and averaged out over a year, that means that the LHC is producing three times as much data as every book ever written three times a day.

But more importantly, 54.18TB is still an amount of data that could conceivably be packed on to one storage device. Perhaps something like this: Last August Seagate unveiled a 60TB SSD. Which would leave enough room to comfortably hold the lot. Literally everything. Every classic, ever textbook, even all of those awful ghostwritten celebrity autobiographies.

And why does this matter? Not only does it demonstrate how incredible human engineering is, but it also demonstrates that if the worst does happen, and we are facing our doom - perhaps there's a chance that all of that human effort won't have been a waste of time, and that we'll be able to save our collective knowledge for future species and civilisations after all?