Dropbox Refuses to Explain its Mysterious Child Porn Detection Software

By Kate Knibbs on at

Recently a US Army reservist was arrested for sharing child pornography. Here’s what makes his story different from dozens of others: he’d been turned in by Dropbox.

As it turns out, Dropbox has a habit of turning in
paedophiles. The why of turning in people who share and hoard abusive images that exploit children is obvious, but I started wondering about how the company sniffed out the abusive images.

The Dropbox detail struck me as strange not because there’s something objectionable about companies trying to stop paedophiles exploiting children, but because I wondered what else Dropbox could proactively search my files for: could it look for pirated movies? Could it look for evidence of drug dealing, illegal sex work, illegal gambling? Short answer: Yep!

Looking at its Terms of Service, Dropbox states that it can search through your files to see if they comply with its terms of service and Acceptable Use Policy. The company can look for much more than just vile child exploitation images: it can search for hate speech, any illegal porn, and anything that infringes on someone else’s privacy.

I asked how the company went about discovering child porn on its servers, but Dropbox wouldn’t tell me how it found the images, hidden inside the personal folders of its users. Instead, a spokesperson sent me a statement:

“Whenever law enforcement agencies, child safety organizations or private individuals alert us of suspected child exploitation imagery, we act quickly to report it to the National Center for Missing & Exploited Children (NCMEC). NCMEC reviews and refers our reports to the appropriate authorities.”

Not at all an answer to my question. This doesn’t explain how Dropbox foils paedophiles exploiting children without outside tips. But I have a strong suspicion of how they do it: I think the company uses PhotoDNA, a software Microsoft developed in 2009 with Dartmouth College, to help companies sniff out child porn on their servers, or something very similar. Microsoft donated use of this technology to the NCMEC, and uses it with Bing and One Drive.

Dropbox Refuses to Explain Its Mysterious Child Porn Detection Software

PhotoDNA takes known child abuse images from the National Center for Missing & Exploited Children and creates a numerical value for each known image using hashing, a technique that creates a “digital fingerprint” for each known image. The horror trove of exploitation porn that serves as the source library for PhotoDNA is compiled from images previously reported to the NCMEC’s Cyber Tip Line, as well as images found by the companies who do the reporting.

John Shehan, the vice president of the NCMEC, talked to me about how the program uses PhotoDNA to investigate reports from its tip line. “It comes down to a math problem,” Shehan explained, which was not what I expected to hear about such awful subject matter.

The PhotoDNA software takes each image in the database and divides it into a grid, giving each portion of the grid a computational value, in a process called “hashing”. It does the same thing for every single photo that gets uploaded to the services that use it, assigning numerical values to each portion of a photo, as well as a unique identifier for the entire photo. So every time someone uploads a photo, it gets compared against every single image of exploitation in the database.

It’s a system for hunting the world’s most taboo, upsetting, and obscene images with freakish accuracy. False positives are extremely rare, only “one in ten billion,” according to Shehan.

Companies that use PhotoDNA scan all of the images uploaded to their services against this database of numerical values. If they get a hit, they review and remove the photos, and report the user to the NCMEC.

By law, companies using PhotoDNA are required to make a report if they find a match. But the NCMEC isn’t a law enforcement agency: it acts as a clearinghouse for these reports, sending them on to the appropriate local or federal law enforcement agencies so they can investigate. From there, arrests like that of the US reservist/paedophile are made.

Many companies aren’t shy about using PhotoDNA. Facebook is a major client, as well as Google and Twitter. Even smaller services like Flipboard and Kik publicly use it. It’s not a secret that companies like Facebook scan every single one of the photos uploaded to its servers against the PhotoDNA database to minimise the chances of exploitation imagery slipping through.

This technology is extremely useful for catching people sharing child porn. In 2014, the NCMEC received 1.1 million reports, but as more companies have started using PhotoDNA (Microsoft released a cloud version this year) the number has drastically shot up; Shehan told me that they’ve received 2.7 million reports so far this year.

I don’t know why Dropbox is so reticent to acknowledge that it either uses PhotoDNA or a similar service. Perhaps the company is worried about blowback from people who had the same questions as me about what else Dropbox was actively looking around for within its customers’ accounts, or it’s simply worried about negative press from being associated with the storage of child porn.

Oddly enough, Dropbox has already admitted to using a hashing system to detect illegal content in its users files, but not for child porn instead for detecting copyrighted files. If you try to share a pirated movie using Dropbox, you may receive a DMCA takedown notice. That’s because the company assigns hash values to certain pirated content and will check the files you share against its database of frequently pirated files. In that case, Dropbox doesn’t check private folders, only shared ones. But it’s unclear if the company is checking private as well as shared folders for child exploitation images, since it won’t disclose it.

No matter the reason for Dropbox’s reluctance, it’s a shame, because PhotoDNA and services like it deserve more publicity for their good work. Dropbox should be more transparent about how it trawls users’ files.

Image by Jim Cooke.