By the 1960s, the US government was using powerful mainframe computers to store and process an enormous amount of data on nearly every American. Corporations also used the machines to analyze sensitive information including consumer purchasing habits. There were no laws dictating what kind of data they could collect. Worries over supercharged surveillance soon emerged, especially after the publication of Vance Packard’s 1964 book, The Naked Society, which argued that technological change was causing the unprecedented erosion of privacy.

The Trackers Tracking You

Online trackers can be divided into two main categories: same-site and cross-site. The former are mostly benign, while the latter are more invasive. A quick taxonomy:

  • Traditional Cookies
    Facebook, Google, and other companies use these extremely popular cross-site trackers to follow users from website to website. They work by depositing a piece of code into the browser, which users then unwittingly carry with them as they surf the web.

  • Super Cookies
    Supercharged cookies can be difficult or impossible to clear from your browser. They were most famously used by Verizon, which had to pay a $1.35 million fine to the FCC as a result of the practice.

  • Fingerprinters
    These cross-site trackers follow users by creating a unique profile of their device. They collect things like the person’s IP address, their screen resolution, and what type of computer they have.

  • Identity trackers
    Instead of using a cookie, these rare trackers follow people using personally identifiable information, such as their email address. They collect this data by hiding on login pages where people enter their credentials.

  • Session cookies
    Some trackers are good! These helpful same-site scripts keep you logged in to websites and remember what’s in your shopping cart—often even if you close your browser window.

  • Session replay scripts
    Some same-site scripts can be incredibly invasive. These record everything you do on a website, such as which products you clicked on and sometimes even the password you entered.

The next year, President Lyndon Johnson’s administration proposed merging hundreds of federal databases into one centralized National Data Bank. Congress, concerned about possible surveillance, pushed back and organized a Special Subcommittee on the Invasion of Privacy. Lawmakers worried the data bank, which would “pool statistics on millions of Americans,” could “possibly violate their secret lives,” The New York Times reported at the time. The project was never realized. Instead, Congress passed a series of laws governing the use of personal data, including the Fair Credit Reporting Act in 1970 and the Privacy Act in 1974. The regulations mandated transparency but did nothing to prevent the government and corporations from collecting information in the first place, argues technology historian Margaret O’Mara.

Toward the end of the 1960s, some scholars, including MIT political scientist Ithiel de Sola Pool, predicted that new computer technologies would continue to facilitate even more invasive personal data collection. The reality they envisioned began to take shape in the mid-1990s, when many Americans started using the internet. By the time most everyone was online, though, one of the first privacy battles over digital data brokers had already been fought: In 1990, Lotus Corporation and the credit bureau Equifax teamed up to create Lotus MarketPlace: Households, a CD-ROM marketing product that was advertised to contain names, income ranges, addresses, and other information about more than 120 million Americans. It quickly caused an uproar among privacy advocates on digital forums like Usenet; over 30,000 people contacted Lotus to opt out of the database. It was ultimately canceled before it was even released. But the scandal didn’t stop other companies from creating massive data sets of consumer information in the future.

Several years later, ads began permeating the web. In the beginning, online advertising remained largely anonymous. While you may have seen ads for skiing if you looked up winter sports, websites couldn’t connect you to your real identity. (HotWired.com, the online version of WIRED, was the first website to run a banner ad in 1994, as part of a campaign for AT&T.) Then, in 1999, digital ad giant DoubleClick ignited a privacy scandal when it tried to de-anonymize its ads by merging with the enormous data broker Abacus Direct.

Privacy groups argued that DoubleClick could have used personal information collected by the data broker to target ads based on people’s real names. They petitioned the Federal Trade Commission, arguing that the practice would amount to unlawful tracking. As a result, DoubleClick sold the firm at a loss in 2006, and the Network Advertising Initiative was created, a trade group that developed standards for online advertising, including requiring companies to notify users when their personal data is being collected.

But privacy advocates’ concerns eventually came true. In 2008, Google officially acquired DoubleClick, and in 2016 it revised its privacy policy to permit personally-identifiable web tracking. Before then, Google kept its DoubleClick browsing data separate from personal information it collected from services like Gmail. Today, Google and Facebook can target ads based on your name—exactly what people feared DoubleClick would do two decades ago. And that’s not all: Because most people carry tracking devices in their pockets in the form of smartphones, these companies, and many others, can also follow us wherever we go.

Personal Data Collection The Complete WIRED Guide

The Future of Personal Data Collection

Personal information is currently collected primarily through screens, when people use computers and smartphones. The coming years will bring the widespread adoption of new data-guzzling devices, like smart speakers, censor-embedded clothing, and wearable health monitors. Even those who refrain from using these devices will likely have their data gathered, by things like facial recognition-enabled surveillance cameras installed on street corners. In many ways, this future has already begun: Taylor Swift fans have had their face data collected, and Amazon Echos are listening in on millions of homes.

We haven’t decided, though, how to navigate this new data-filled reality. Should colleges be permitted to digitally track their teenage applicants? Do we really want health insurance companies monitoring our Instagram posts? Governments, artists, academics, and citizens will think about these questions and plenty more.

And as scientists push the boundaries of what’s possible with artificial intelligence, we will also need to learn to make sense of personal data that isn’t even real, at least in that it didn’t come from humans. For example, algorithms are already generating “fake” data for other algorithms to train on. So-called deepfake technology allows propagandists and hoaxers to leverage social media photos to make videos depicting events that never happened. AI can now create millions of synthetic faces that don’t belong to anyone, altering the meaning of stolen identity. This fraudulent data could further distort social media and other parts of the internet. Imagine trying to discern whether a Tinder match or the person you followed on Instagram actually exists.