This comes from discussions and lectures from people working on engineering in some ad aggregstors - at least one said that in the end, they have a table with 300 million IDs for each person in the country and they key all the data they can link to that person with this id. In principle this data is annonymized. But does that make a difference? At least in health care people worry about hipaa and do audits to minimize reidentification risk but I'm not sure if adtech companies do anything like that. So yes, a good data scientist can find any person they want from the data but even otherwise I think these companies can work on a fairly meaningless definition of annonymization to get away with all this crap.
It's basically impossible to anonymize data. There a numerous papers about how little data is enough to uniquely identify people. Things like the zipcode where you start your commute and the zipcode where your commute ends are enough to identify the vast majority of people.
In our case, the postal code (Canadian) and almost any other piece of data is uniquely identifiable. Through a quirk in the layout of our street, my wife and I have the only house in our postal code. Add age, gender, birth month, hair colour, t-shirt size... pretty much anything, and you’ve reduced from 2 possibilities to 1.
I still want to try dropping a letter in a mailbox from a different city with just our postal code written on it and see if it arrives.
It's impossible to anonymize some data. If you're including demographics and locations then yeah, it's going to be hard or impossible to anonymize. If you're using surveys on emotional state or perhaps newsgroup comments? That's not so hard.