Will My Data Be Online Forever?

By Daniel Kolitz on at

We’ve all pretty much reconciled ourselves to the fact that a handful of unaccountable technology executives have, with our help, generated the largest repository of personal information ever assembled, housed in vast fortified complexes around the globe and sifted continually for the benefit of corporations, federal agencies, political campaigns, etc. Less clear is the lifespan of everything they’ve gleaned. Are they really going to hold on to this stuff forever? And if they are, and if we’d rather they didn’t, is there anything we can do about it? For this week’s Giz Asks, we reached out to a number of experts to find out.


Meg Leta Jones

Associate Professor, Communication, Culture & Technology, Georgetown University, who researches rules and technological change with a focus on privacy, data protection, and automation in digital information and computing technologies

Nothing is “forever.” The popular, commercial web is only 25 years old, the post-dot-com-crash web and the Y2K crisis are only 20 years old, and Facebook is only 15 years old. Most of what was online in the 1980s and 1990s is gone, as is most of the early web. Sites, forums, and platforms come and go. Link rot and bit rot plague archives, historical records, and everyday inquiries. Digital data is incredibly fragile, and a lot of things need to go right on a number of layers for my computer to be able to request and receive a file from your computer. Access requires maintenance and few people (e.g., dedicated hobbyists), organisations (e.g., news groups), or institution (e.g., archives) have the motivation and resources to maintain digital data for long periods of time.

How long “my” data stays online also depends on who I am. If I am a famous person, I may have generated the interest of many data maintainers and increased the likelihood of long-term access. I may also have a hard time exercising my right to be forgotten, which an increasing number of individuals around can utilise, as a public figure. If I can effectively exercise a right to be forgotten, my data may not be online for long at all. My data may also be considered someone else’s data or speech. In that case I am probably American and can hope that the tech giant maintaining my data will get eaten by another tech giant, who will kill the site/content/database.

Whether my data is “online” might also change, even if it is not erased from the source. Today there may be a number of hurdles that keep someone from easy access to my data, even though it is online somewhere. Many Europeans exercised their right to be forgotten after a 2014 European Union court case by requesting that Google de-index certain search results. Unless the data subject also successfully sought erasure at the actual source, the data was still online – but how would anyone know? News organisations and people searches have brought back paywalls and subscriptions to access old content and social networking sites have layers of privacy settings that may reveal data to some but not others.

We should not be scared of permanent records. We should be scared of informational power dynamics that bring immediate, harmful consequences and a serious lack of preservation infrastructure for contemporary culture.


“How long ‘my’ data stays online also depends on who I am.”


Fred H. Cate

Vice President for Research, Professor of Law and Senior Fellow at the Center for Applied Cybersecurity Research at Indiana University

Companies (and government agencies) collect extraordinary amounts of personal information about individuals all the time. This collection takes place through a wide variety of means: portable devices, triangulation on cell phones, video cameras, apps, email, web browsing, loyalty programs, online transactions, payment tools, etc. – far too many to name. The vast majority of this information really is not “online” in the sense that people could look up their own or others’ information. In fact, one might argue that more should be available “online,” so you could see (and possibly even correct) what has been collected about you, and so that data monopolisation did not keep out new entrants to important markets.

But that collected data, which is often shared with other companies or third-party data brokers, will exist as close to forever as matters. There are really no legal limits in the United States applicable to the vast majority of data, and so-called legal limits elsewhere don’t tend to amount to much because there is often a legitimate use for the data – for research or training AI tools or security – that has the effect of tolling limits on data storage. Moreover, much of the data, even in countries with apparent storage limits, is held by people who just don’t care about those limits. How often do most users go through their contacts or their email or their photos to delete data that is outdated or has no legitimate use?

To me, this suggests that attention to the collection or storage of data is likely misplaced and may, in many settings, be totally unworkable in practice. It’s like asking people to sort through air or water in the ocean. Rather than focus on data (as in the term “data protection”), shouldn’t we be focused on people and communities and the good and the harm that can be done to them with data? I would argue it is far more useful and more practical to focus on what can be done with data, no matter how old or how collected – how can that data be used? So we could identify uses that are harmful or objectionable or likely to cause offence, and either prohibit them outright or require explicit, opt-in consent. Other uses, for example, perhaps for research, we might permit outright, so long as reasonable security precautions are employed.

We do something similar in other areas that we care about as a society, so we might look for example of tools that work well and are scalable. For example, almost all research on humans in the U.S. and Europe is done pursuant to Institutional Review Boards or Ethical Review Boards that in some cases require individual consent and in some cases say that consent isn’t practical or isn’t necessary. Why not require the use of “Data Review Boards” to provide similar oversight and accountability?


“Rather than focus on data (as in the term ‘data protection’), shouldn’t we be focused on people and communities and the good and the harm that can be done to them with data?”


Anu Bradford

Professor of Law and International Organization at Columbia Law School, Senior Scholar at Columbia Business School, and the author of The Brussels Effect: How the European Union Rules the World

Not necessarily, says the European Union. The EU’s General Data Protection Regulation (GDPR) vests individuals with the right to ask internet platforms to permanently erase certain data about themselves in instances where that data is inaccurate or no longer relevant. This concept of data erasure – known as “the right to be forgotten” – was first established by the European highest court in a case Google Spain. In this case, a user in Spain requested Google to remove from its search engine results that linked him to old newspaper articles detailing his financial troubles. According to this user, the information, while accurate, was no longer relevant since all his debts were resolved. Google refused to delink the information. In the end, the European court forced Google to permanently de-link the requested information and ensure that it was no longer searchable. Since then, this right has been codified in the GDPR. It has also been adopted by numerous countries around the world as they have enacted privacy laws modelled after the GDPR.

The right to be forgotten has been both controversial and effective. Its critics claim that the erasure of information from platforms undermines free speech and stifles public debate. For instance, the US Courts have flatly rejected the right to be forgotten, and favoured free speech considerations over individual privacy. To the dismay of its critics, the EU’s right to be forgotten is also effective: it leads to significant delisting because of the asymmetrical incentives that the GDPR imposes on search engines. While individual companies such as Google retain the authority to make decisions in individual cases as to whether to erase information, any borderline case is likely to result in the removal of the information from search results. Failure to erase the information can lead to heavy fines – up to 4% of the company’s global turnover – whereas excessive delinking carries no penalty, incentivising data erasure. As evidence of the company’s responsiveness to the delinking requests, Google has agreed to remove about 44% of the 2.8 million requests it has received since the May 2014 ruling according to its transparency report from May 2019.

The right to be forgotten is one of many examples of the EU exerting its regulatory authority in the digital economy. While the US has relegated the regulation of data privacy largely to the private sector, the EU has moved ahead with extensive regulations that are shaping the business practices of multinational governments. Today, most big technology companies draft their global data privacy policies with the EU in mind. For example, Facebook, Google and Microsoft have one global privacy policy, which closely follows the GDPR. Similarly, Facebook, Twitter and YouTube have adopted the EU’s definition of hate speech worldwide when deciding which type of content to take down from their platforms. As a result, it is often Brussels that decides how your data is stored, processed, shared, transferred, or erased – and whether it will therefore be online forever.


“While individual companies such as Google retain the authority to make decisions in individual cases as to whether to erase information, any borderline case is likely to result in the removal of the information from search results.”


Sandra Wachter

Associate Professor and Senior Research Fellow in Law and Ethics of AI, Big Data, and robotics as well as Internet Regulation at the Oxford Internet Institute at the University of Oxford and a Visiting Associate Professor of Law At Harvard University

The EU’s GDPR framework is a fantastic first step in terms of trying to guarantee – across Europe, and maybe even beyond the boundaries of the European Union – basic privacy protections for personal data. Unfortunately, the GDPR, and protection law in general, is more focused on the input stage rather than the output stage. It doesn’t regulate what can be, or should be, inferred from a person’s information. A company may have to ask you for consent to collect your geolocation data, but you have no idea what’s being inferred from it. And this is important, because the potential for privacy-invasive harms don’t necessarily occur at the input stage, where you volunteer information to a company. The interesting stage comes afterwards, once machine learning and AI are applied to that data, a process that can derive a lot of potentially very intimate information: your sexual orientation, your housing status, your religion, your political beliefs, potential disabilities, your gender identity. The user often has no idea that the data they’ve surrendered can actually disclose those things.

Part of the difficulty in regulating this aspect of the situation is that people might argue that these processes are protected by trade secrets. They might argue that the resources put into collecting and analysing the data turn that into the property of the company, or the public sector. There is an interesting battle on the horizon when it comes to who should have power over inferred data derived from neutral-seeming, voluntarily-surrendered data, and whether or not the user should have some control over that.

Take, for example, Apple Card, the credit score system used by Apple. Is the credit score inferred by Apple the personal data of the customer, or by the company? If it’s personal data, should you be able to rectify that score? And what implications do that have for the company, and for the individual? Should you be allowed to delete your credit score?

I’m currently working on a research project which will run for the next couple of years called AI and the Right to Reasonable Inferences, in which I argue that we need to look at ethically acceptable, normative standards of inferential analytics, because at the moment it’s ungoverned, and we don’t have any standards for how they should be used responsibly. You have to find a very good balance between the protection rights of the individual and the interests of the business.


“The potential for privacy-invasive harms don’t necessarily occur at the input stage, where you volunteer information to a company. The interesting stage comes afterwards, once machine learning and AI are applied to that data, a process that can derive a lot of potentially very intimate information...”


Featured image: Elena Scotti (Photos: Getty Images, AP, Shutterstock)