Google’s Quest to Kill the Cookie Is Creating a Privacy Shitshow


Google G logo, pixelated

Photo: David Ramos (Getty Images) (Getty Images), Graphic: Shoshana Wodinsky (Gizmodo) (Getty Images)

For the past few months, millions of Chrome users have been roped into Googles origin trials for the tech meant to replace the quickly crumbling third-party tracking cookie. Federated Learning of Cohortsor FLoC, for shortis a new kind of tracking technique thats meant to be a friendlier, more privacy-protective alternative to the trackers we all know and loathe, and one that Google seems determined to fully implement by 2022.

As you might expect from a Google privacy push, people had concerns. A lot of them. The Electronic Frontier Foundation pointed out that FLoCs design seems tailor-made for predatory targeting. Browsers like Firefox and Brave announced they wouldnt support the tech in their browser, while DuckDuckGo literally made an extension to block FLoC entirely. While this trial keeps chugging along, academics and activists keep on finding loopholes that contradict FLoCs privacy-preserving promises.

They arent the only ones. Digiday reported this week that some major players in the adtech industry have started drawing up plans to turn FLoC into something just as invasive as the cookies its supposed to quash. In some cases, this means companies amalgamating any data scraps they can get from Google with their own catalogs of user info, turning FLoC from an anonymous identifier into just another piece of personal data for shady companies to compile. Others have begun pitching FLoC as a great tool for fingerprintingan especially underhanded tracking technique that can keep pinpointing you no matter how many times you go incognito or flush your cache.

In the middle of all this, the most popular browser in the world, Chrome, is just… looking the other way.

Even if Google didnt think about these things when it was designing this technology, as soon as they put this stuff out in public back in 2019, this is exactly what advocates were saying, said Bennet Cyphers, a technologist with the EFF who focuses on adtech. You could take one look at this thing and immediately know itll just turn into another tool for fingerprinting and profiling that advertisers can use.

What is FLoC supposed to be and hows it different from cookies?

Googles pitch for FLoC actually sounds pretty privacy forward at first glance. The third-party cookies FLoC is meant to replace are an objective scourge to the web writ large; they map out every click and scroll made while browsing to create countless unique profiles, and spam those profiles targeted ads across multiple sites. FLoC nixes that individualized tracking and targeting, instead plunking people into massive anonymous cohorts based on their browsing behavior. These cohorts are thousands of people deep, and get wiped every weekmeaning that (in a perfect world), your assigned cohort cant be used to pick you out of a crowd, and cant be used to target you in the long term. At least, thats how its being sold.

On top of this, your ever-shifting FLoC ID is labeled with a meaningless jumble of letters and numbers that only Google can decipher, and that jumble is held locally on your browser, rather than in the hands of some third-party company youve never heard of. Altogether, FLoCs meant to turn you into a nameless drop in an inky sea of data, where everything about youyour name, your web history, what you ordered for lunchis buried deep beneath the surface.

At the start of this year, Google announced that some of these FLoC cohorts would be available for advertisers that want to see them in action through the companys upcoming origin trials, with plans to start serving the first FLoC-targeted ads in the second quarter of this year. So far, the company reports that theres been a whopping 33,872 different cohorts, and each cohort holds data from at least 2,000 Chrome customers that were opted in to the program literally overnight.

Google not only forgot to give these millions of users a basic heads up, but it didnt give users any way to see if theyd became unwitting guinea pigs in this global experiment (thankfully, the EFF did). And if you do want to pull your browser from the trial, youre going to need to jump through way too many hoops to do so.

What are the rules around FLoC? Haha… rules…

This early in the trials, there are literally no rules surrounding what advertisers, adtech companies, or anyone else in these trials can do with this data. That means at minimum theres a grand total of nearly 68,000 Chrome users having their cohort data hoovered up, parsed apart, and potentially passed around for massive profits right now. (Weve reached out to Google for comment on these trials).

Its gone as well as youd expect. One of the adtech giants thats part of this trial, Xaxis,told Digiday that its currently conducting an analysis to see how FLoC IDs could be incorporated into its own cookie alternative, which they call mookies. Yes, really. Nishant Desai, one of the directors overseeing Xaxiss tech operations, plainly said that those strings of numbers that FLoC spits out are an additional dimension of how you resolve [a persons] identity.

Desai compared it to the IP addresses that marketers have used to target you since the 90’s. Like an IP address, someones FLoC ID can be pulled from a webpage without any input on the users part, making it an easier grab than email addresses and phone numbers that usually require a user to manually hand over the information. Like an IP address, these IDs are strings of numbers that dont disclose anything about a person until theyre lumped in with a buffet of other data points. And like (some) IP addresses, FLoC IDs arent entirely statictheyre technically reset every week, after allbut once you get assigned one specific cohort, chances are youll be stuck with it for a while.

If your behavior doesnt change, the algorithm will keep assigning you in that same cohort, so some users will have a persistent FLoC ID associated with them or could, Desai told Digiday.

Googles software engineer Deepak Ravichandran put this more bluntly during a recent call with the World Wide Web Consortium (or W3C for short). When asked how stable someones FLoC ID was expected to be, Ravichandran replied that an average user visits between 3-7 domains on an average day, and they tend to be fairly stable over time.

Ravichandran noted that even if a person jumps from cohort to cohort every other week, if you take a birds eye view of their web browsing behavior, it all looks pretty similar. That means even with the reset after seven days, youre likely going to be assigned the same ID you had before, rendering the rest meaningless.

Who is using these FLoC IDs?

Xaxis is just one of the many, many (many) companies in the adtech space with these sorts of plans. Mightyhive, a San Fransisco-based data firm, told Digiday that its lumping users into specific buckets, to see if the FLoC ID their browsers been branded with is associated with certain actions, like buying particular products. Adtech middleman Mediavine has gone on record saying its currently slurping any FLoC IDs from people visiting the 11,000-ish sites plugged into its tech, and then passing that data onto other partners responsible for parsing apart which IDs visit which specific webpages.

These so-called Demand Side Partners (DSPs, for those in the biz) are the ones tasked with figuring out which jumbled identifier corresponds to a new mom, a teenage TikToker, or a guy that just really, really likes dogs.

Right now, its worth guessing these labels will be pretty broad; in that same W3C call, Ravichandran explained that these first sets of cohorts are exclusively generated using data about the domain name a person lands on, and nothing else. Different pages on a site, or the actual content on a particular page, arent being considered in FLoCs algorithmthough he hinted that might change later this year.

If youre wondering how hard it is for these DSPs to decode these cryptic cohort codes, the answer is not very. Last month, Mozilla alum Don Martiwho now works for the ad firm CafeMediapublished a blog laying out how he roughly decoded some of the major FLoC categories that were visiting sites his company worked with. After boiling down the 33,000-ish different cohorts Google generated into 33 mega-horts, he mapped out keywords associated with the websites these horts frequented.

After filtering out some of the more mundane keywords (to make the results more meaningful), and he ended up with… this:

Table showing the types of FLoC cohorts.

1 kFLoC cohort = 1,000 FLoC cohorts. Sorry if this is giving anyone flashbacks to high school chemistry class.
Screenshot: CafeMedia (Gizmodo)

In broad strokes, you can probably tell what kind of person each of these FLoCs represents. Number 32, featuring words like healthy and tomato and apple and (my personal favorite) beans, might be someone thats really into eating organic and cooking from home. Number 20 (crochet, pattern, writing,) sounds like a chill person that could make you a comfy scarf. Number 15 (codes, printable, eggs,) sounds… well, Im honestly not sure about that one. A tech bro that likes a good shakshuka?

You probably wouldnt learn much about someone if you matched one of these cohorts with whatever data a major broker already had on them. Sure, you might learn that this guys really into magic/casseroles/dogsbut if my past experiences with magic-casserole-dog guys are any indication, you likely already knew this about them.

But what if that guy regularly visits websites centered around queer or trans topics? What if hes trying to get access to food stamps online? This kind of web browsingjust like all web browsinggets slurped into FLoCs algorithm, potentially tipping off countless obscure adtech operators about a persons sexuality or financial situation. And because the world of data sharing is still a (mostly) lawless wasteland in spite of lawmakers best intentions, theres not much stopping a DSP from passing off that data to the highest bidder.

Google knows this is a problem. It even published a white paper detailing how it plans to keep FLoCs underlying tech from accidentally conjuring cohorts based on a predefined list of sensitive categories, like a persons race, religion, or medical condition. Not long after that paper dropped, Cyphers dropped a blog of his own arguingamong other thingsthat papers approach was infuriatingly half-assed.

I mean, yeah, they tried. Thats better than not trying, Cyphers said. But I think their solution dodges that hard problem that theyre trying to solve.

That hard problem hes talking about is admittedly a really hard one to solve: How do you keep your most vulnerable users safe from being profiled in ways that range from life-threatening to economically devastating while still scooping up troves of data about them so other people can make money?

Google, for its part, decided to tackle this problem by combing through the browsing history of some users that are part of these trials to see if theyve visited sites in different sensitive categories. A website for a hospital might be labeled medical, for example, or a site for a persons church might be labeled religion. If a cohort traffics sites within these Forbidden Categories particularly often, Google will block that group from being targeted.

In other words, Googles proposal assumes that people in a certain sensitive category are visiting specific sensitive websites en masse. But this just… isnt how people browse the web; people with depression probably dont hang out on psychiatry dot org every day, and a person who identifies as LGBT+ might not be lurking around whatever Googles assuming a gay website might look like. Sure, people in these categories might show off similar browsing behavior, but Googles proposal reads like a fix for a world where people browse the web like robots instead of like, well, people.

At the end of the day though, Googles on track to fully roll out FLoC by mid-2022, whether its ready for us or not. If you go and look at the public FLoC Github page, theres pages of back-and-forths between the people who designed FLoC and privacy advocates pointing out why this is such a bad idea, Cyphers said.And every time, the designers are just like Good to know! We still think were right.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *