Mega is here, and you’ve been hearing a lot about its encryption, as well as it not really working too great just yet. But maybe the most important thing is Mega’s promise of being less of a lawsuit magnet. A lot of steps have been taken there, but there’s one that stands out as the biggest: Mega doesn’t use de-duplication.
Let’s talk de-duplication. It’s a pretty simple idea, with some widespread consequences. “De-duplication” basically means that a file storage system—in this case Mega or Megaupload—scans files as they come in. If they are recognised as something that’s been uploaded previously, the system will not store the new files, and instead reference back to the version already on the servers. In addition to being a great space-saver, this can be an easy way to wipe out all versions of a copyright-infringing file in one swoop. In fact, not doing that will get you in trouble, if the option is available. Which is why Mega takes that option off the table.
Megaupload did use de-duplication, which also saves on cost, but when it would receive a takedown request for copyright violations—you know, stemming from any of the zillion message boards and blogs posting illegal download links—Megaupload would only disable the one reported link, instead of every link associated with that file, and the file itself. So copyright holders, who already want to ritualistically disembowel people like Dotcom, didn’t take it well when the found out that, systemically, Megaupload had to know that copyright-protected files were being left up. For all the conspicuous consumption and willful ignorance involved with the Megaupload case, that was as big a factor as any.
Now, there are legitimate reasons for a system like Megaupload to not just nuke all versions of a file. Plenty of people use these lockers for legitimate storage of music and movies, and never share them with anyone, only using them to transfer data from one machine to another. Nuke all associated links to a file, and you wipe out all the legal users’ music too. It’s a complicated problem to solve, and would require a lot of traffic and use analytics that would compromise anonymity and probably raise more than a few privacy issues. Megaupload’s problem, though, was that it basically just ignored the problem completely. Mega’s solution to his tricky situation is that, since you encrypt files as you upload them, the service can honestly say it has no idea what you’re storing on its drives.
The cost and overhead of associating a different file for every upload is significant, but other services have done it for a while. Rapidshare never ran into the kinds of link-based problems that Megaupload did, despite a huge amount of lawsuits of its own. Combined with Mega’s considerable encryption, this should be as good a shield against piracy hawks for a site that’s basically entirely about piracy. Much more so than the flimsy buck-passing Terms of Service, at least. There will be other threats to the service—a group is already trying to shut down Mega’s finances—but no de-duping is one more finger in the dam.













What a ‘Mega’ load of bullshite.
You know you should probably take that Christmas hat off now. I’m sure you don’t want to accept it, but I’m afraid Christmas is over.
I’ve been thinking about this a lot. And now someone has pointed it out I feel the time has come. I will see you all on the other side.
Section 8 of the TOS:
“8. Our service may automatically delete a piece of data you upload or give someone else access to where it determines that that data is an exact duplicate of original data already on our service. In that case, you will access that original data.”
… or maybe they put that into the TOS as a legal bulwark, when in reality it’s impossible to enforce due to their encryption methods.
It isn’t impossible to enforce, it would actually be relatively easy. If many people all have the same file (Ubuntu 10.4 release for example) and all upload it to Mega. Mega will be able to identify they are they same because of the hashed encryption key, as the base files are identical image on the servers the hash will be the same, although they can’t view the content they can still see, and presume the file is the same.
(This may also cause a hash collision depending on how they are storing and indexing the files) – I don’t know encryption very well, nor do I pretend to, but I can see a possible way they implement it.
No. AFAIK that is not how this encryption works – identical files will appear completely different apart from their length, as the encryption keys are different for each user. Mega will have NO WAY of knowing what the users files contain. That’s the point.
This.
Ok, I haven’t looked seen much about the encryption they use. I know they don’t know what the files contain I only meant the hash may give them an idea.
Each person has two encryption keys, a private one and a public one. If to people share the same file with there public key the the system will match then and one copy will be deleted, if you use the private key then no matching will be done.
Surely the file itself is still encrypted differently though? I don’t think there’s any way to know that two files are the same unless you decrypt them.
Qwan, I think what you’re thinking of is checksum, not the encryption hash. Again though, since each file is encrypted differently, checksums won’t match.
once you have shared the public link they can be decrypted by using the public key, each file has its own public key.
So I upload a file, it uses my private key, somebody else uploads the file with there private key. I share it, it no longer uses my private key but uses a new public key that only that file uses.
If they then share there copy it won’t make a new public key it will instead switch to use the file I uploaded and that files own public key.
Isn’t the point of the article that Mega doesn’t do that?
it only does that once both people share the file, if you just keep it in our own private area it doesn’t.
But each file is encrypted differently, even though the original file is identical. The same public key can’t be used to decrypt two separately encrypted versions of the same file.
When you make a file public it decrypts it and recrypts it with a new key, if a second person makes the same file public then once it has decrypted it it sees that it already has a copy of this file (as it remembers the unencrypted checksum of the first file) and deletes the 2nd persons copy and gives then access to the first persons. The nobody is told this has happened won’t be told this has happened.
If you keep the file private then only you will ever be able to decrypt it and know its contents. Once it is make public then anyone with the download link can get it.