unModified()

Break stuff. Now.

How Tracking Happens On Your Browser

The inner workings of trackers that follow us on the internet

November 18, 2018

Every move you make on the internet is tracked. Whether it's for legit website analytics or shady user profiling, websites know who you are. It's creepy, I know. But have you ever wondered how all of this happens in your browser? In my other article, I outlined the things involved in online tracking. In this article, I'll try to explain how tracking happens on your browser.

Not your old school newspaper

To most people, reading a page on the internet is analogous to reading a newspaper. To read the news on newspaper, you pick up a newspaper and start flipping. To read the news on your phone, you pick up your phone and start swiping. They're done pretty much in the same way so they must be the same, right? Well not really.

The web is powered by the HTTP protocol. You can think of the HTTP protocol as a request-response protocol. That is, in order to get content, you'd have to ask for that content first. For instance, the web browser you're using right now had to make a request for this page and every single thing on this page just to show you the content you see right now.

It's more than just a request

The interesting thing about HTTP requests is that it's more than just asking for content. It's also sending a lot of metadata. Take the following snippet for example. This I got from Firefox's Developer Tools when I searched for "cats" on Google's home page search box (potential identifier values replaced with xyz's).

GET /search?source=hp&ei=xyz&q=cats&btnK=Google+Search&oq=cats&gs_l=xyz
Host: www.google.com
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:63.0) Gecko/20100101 Firefox/63.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate, br
Referer: https://www.google.com/
DNT: 1
Connection: keep-alive
Cookie: 1P_JAR=xyz; NID=xyz; OGPC=xyz
Upgrade-Insecure-Requests: 1
Pragma: no-cache
Cache-Control: no-cache
TE: Trailers

It's definitely more than just cats.

The browser normally sends some default data. Things like which browser it is (User-Agent), where the request came from (Referer), how it wants the content constructed (DNT, Cache-Control), how it wants to receive content (Accept-*), how it wants the connection be treated (Connection).

Then there are the extra things like cookies, query strings, headers, and request bodies. These are added to the request either by hard-coding them in code, by forms and scripts on the page, by software and extensions installed, or even by web servers.

Getting to know all about you

Tracking starts with the server giving your browser a universally unique ID. This is given to you, via a cookie in the response, during the very first request you make to the tracking server. From there, future requests to the tracking server will carry tracking data together with this ID. Basically, data is tied to an ID which is tied to your browser and by extension, you.

At the bare minimum, trackers want to know your movement across the internet. This information is obtained from the Referer header. Browser and OS usage is obtained from the User-Agent header. IP addresses can be obtained from X-Forwarded-For header.

Some trackers provide APIs to allow developers to programmatically report custom data. For instance, Google Analytics allows the reporting of "pages" and "events". Page Tracking is used to report page visits or views on sections of an app. Event Tracking is used to report user interactions.

Of course, anything can be taken to the extreme. EFF has the tool called Panopticlick which simulates browser fingerprinting. To accomplish this, it gathers several pieces of data from your browser in order to generate the fingerprint. This list of gathered data shows just how much information browser APIs reveal about your device.

E.T. phone home

The last leg of the race involves sending all of this gathered data to a tracking server via a request. As mentioned earlier, any resource can make a request. However, there are certain aspects that make certain resource types less than ideal.

Pop-up windows and iframes are annoying and bad for user experience. Scripts and styles require parsing and can break if errors are thrown. Fonts, audio and video are heavy. They're also fairly recent tech so they have varying browser support. XHR is subject to the Same-Origin Policy unless you make the extra effort to setup Cross-Origin Resource Sharing. This leaves us with images.

// All it takes is this one line to send a loaded request
(new Image()).src = 'https://tracking-domain.com?d=' + stringLoadedWithData

Images work on all browsers (maybe except for Lynx). All it takes to make an image request is one line of JavaScript, bolting on tracking data onto the image url. And for the server to respond with the smallest possible success response, it sends back a 1x1 image. Fun fact: this is why we call trackers "pixels".

Conclusion

The concept of tracking is simple: a request loaded with metadata sent to a tracking server. Do this to every user on the internet on every single page they visit, then you have yourself a data collection monster. Pixels anyone? Anyone?

Hopefully this article gave you insight on how tracking happens on your browser. As always, if you have comments or suggestions, feel free to drop a line.