When you search for something online – whether it is a new pair of shoes or a holiday to Vegas – an army of online trackers spring into action. Often, the first indication that you will have that this has happened is that you see adverts for shoes or cheap flights on every site you visit for the next few days. Few, however, actually understand how this technology works, or what is happening behind the scenes.
When you load up an article on the Guardian, the New York Times or the Washington Post, it looks like you are seeing a digital version of a paper newspaper. It has text, graphics and adverts. In fact, your article has been assembled from dozens of different sources, and these different sources can come from hundreds of different companies. When you are viewing a web page, the content that is visible to you is just the tip of the iceberg. Beneath the surface, there is a lot of invisible stuff going on in the background tracking what you are doing.
The hidden network of trackers
Since the early days of the Internet, companies have run ads online. The first banner ad was run by AT&T in 1994! However, in recent years, online advertising has grown much more clever in recent years. The ads that you see today are the product of highly refined digital stalking: as customers do more and more of their browsing online, the surveillance architecture to track customer behaviour has grown ever smarter too. Even some of the publishers who make some of the loudest noises about online tracking are at it too!
Although there has been a lot of publicity recently about Facebook and Cambridge Analytica, the online tracking that happens during our ordinary Internet browsing is really much more pervasive. Online advertising technology software is following you from site to site, and compiling all of your actions into a database called a DMP (or “Data Management Platform”), using a pseudonymous online identifier. Using this technology, an ad network knows who you are, when you come back, and decide which advertisements it is going to show you.
All of this takes place in the shadows. Advertising technology companies don’t like talking about the technology they use to track you. The default for many years has been to collect as much as possible, and decide what to use it for later. Behind the scenes, a network of “data brokers” trade in data that has been collected about you, as advertisers try to build as comprehensive a picture of they can of the roughly 4 billion connected consumers.
The information that is most useful to these companies is your browsing and search history. This can be compiled and profiled into different behavioral categories, such as “Hobbyist Photographer”, “Used Car Shopper,” or “Luxury Goods Shopper”. AdTech people like to tell themselves that this data is used to “improve the customer experience” and have “conversations with customers,” but few people would consent if they knew the extent to which they were being tracked. Typically you the consumer have been offered no choice in this transaction either.
It’s not just tracking, though. Companies are using this information to mine into the data, using Machine Learning and AI to see what they can infer about your behaviour, your preferences, even the other devices you own. Sometimes this information may even be used to show you a different price to someone else! It has been shown that using a VPN (Virtual Private Network) to pretend that you are in a poor Eastern European country in some cases allows you to get cheaper flight prices. In 2012, it was discovered that Orbitz was using software to show more expensive prices to Mac users than Windows users.
How does tracking work?
Because every device has subtly different characteristics (device ID, font setup, OS version), it is possible for manufacturers to create a unique “device fingerprint” that subtly changes how a site behaves, in a way that is completely invisible to the user. Site owners or third parties can use this information to track your browsing even further. This might sound like something that you would only find on shady websites, but techniques like these are absolutely commonplace, even on websites that are making the loudest noises about the Facebook and Cambridge Analytica leak. The Guardian homepage alone loads up 29 separate advertising trackers. The New York Times loads up 11. Business Insider loads up 45.
Most of us have some awareness of what cookies are and how they work, but few are aware of the “cookie syncing” that takes place behind the scenes. Cookie syncing allows the companies that are tracking you online to compare notes with each other, and build a better profile of you. All of this takes place silently, in the background, and without your knowledge. And once your data has “leaked”, they have it forever. Even if you delete your cookies, an advertiser only has to see you again once (say, if you follow a link in a social media app, which often does not have Adblock turned on), for the whole process to kick off again.
It is even possible for advertisers to link the different devices you own. An advertiser might notice, for instance, that your iPad is often connected to the same wireless network as your smartphone, and are often in close physical proximity to each other. The chances of this happening are quite low, so they can now deduce that the device belongs to the same person, or to the same household. Now they can track and target you across both of the devices that you own. Or an advertiser might come to an AdTech company and say “I have these cookies that belong to these customers, show me the other devices that they own.”
The anonymity fallacy
For a long time, advertisers have got around concerns with this by saying that the data is “anonymous”. This is a convenient line for advertisers to take, because it is in many ways completely irrelevant to them whether you have their real name. Whether your name is Mike, Stephen, Jenny or Mohammed really makes very little difference to them.
Nor does the fact that the data is anonymized really keep you any safer. Every field in the database acts as a “primary key” that one day, with the right conditions, could be linked to a dataset that is not anonymized. It just takes one dataset breach that has your name attached (or another field, like your phone number), and this data can be identified back to you. A lot of AdTech companies don’t deal in smoking guns, but they are quite open about selling the bullets.
There’s also a growing industry of companies that promise to link, or “onboard” this data. Companies like LiveRamp offer ways to link together different datasets (often starting with an email address, because we as consumers are so willy-nilly about handing it out, and because our email addresses often contain our real name). If a store asks you for an email address “so they can send a receipt” when you make a purchase, it is likely that they are looking for a way to tie your in store activity back to online activity and tie a real name to your data.
There isn’t a magic bullet for safeguarding privacy online. But if you’re willing to invest a little time, it is possible to take some steps toward protecting your privacy. I’d recommend starting with Ghostery and UBlock,
By researching technology, learning about the privacy implications of the products you are using, the privacy tools that are out there and also the right way to use them. If you’re not fully aware, you’re not going to be making a fully informed choice.