Do you do any ML for Big Tech? Because it's actually a lot simpler than that: the input is the sum total of your activity, and the output is the likelihood that you'll click on an ad or buy a product on a specific surface. You certainly can predict demographic information like sexual orientation, education level, income, political party, and with a fair degree of accuracy, but all it does is add noise to the calculation you really want, which is optimizing the amount of money you'll make. To the extent that demographics are computed, it's to make advertisers feel better about themselves. They would almost always be better off with a blanket "optimize my sales" campaign, but it's hard for ad agencies and digital marketers to justify their existence that way.
> You certainly can predict demographic information like sexual orientation, education level, income, political party, and with a fair degree of accuracy, but all it does is add noise to the calculation you really want, which is optimizing the amount of money you'll make.
How are all those data points noise? They're crucial information used for targeting ads to a specific audience. Advertisers pay extra for it, because it leads to more sales. This is not just a gimmick, but a proven tactic that has made the web the most lucrative ad platform over any other. Adtech wouldn't be the behemoth it is without targeting, and the companies that do this well are some of the richest on the planet.
They're noise in the sense that they are imperfect human categories that we superimpose on reality. The alternative is not knowing nothing about the user, it's knowing everything.
Take this simplified example. Say that you want to predict whether a driver will cause a car accident. You could run the stats and say that poorer, older, less educated, alcohol-impeded, sleep deprived drivers statistically cause more crashes, and then take an 80-year-old high school graduate with an income of $20K/year and say "He's three of those five categories, that makes his risk higher." Or you could observe footage of every minute of him driving, count the number of times he strays out of lane, turns without his blinker, doesn't look at the road, speeds, runs a red light, etc. Which is going to give you a more accurate picture?
Marketers build up demographic profiles because historically, that's all the information they have had available to them. The detailed record of everything their customer has ever done has been impossible to collect, or illegal for privacy reasons. Big Tech has that record. And they can use it to make much more accurate machine predictions about what a person will do than demographics alone can predict.
In your first post you said that computed statistics are there "to make advertisers feel better about themselves". I pointed out that those computed statistics are still very valuable, even if they're based on probabilities and not on tangible data. Of course that with more real-world data the statistics are more accurate, but the reality is that real data is likely unavailable for most users. If the only available data are a few pictures and behavioral records (what they liked, who they follow, etc.), then those computed statistics are still much better than nothing.
Besides, advertisers mostly care about demographics, since that's how companies define their target markets. And most of this information can be gathered from just a few sources, so the type of advanced data analysis in your example is not even required in practice. Whether someone is at risk of having a car accident would be more valuable to insurance companies, than for advertisers to decide what product to show them.
I understand that tech companies simply care about whether the user will click on the ad, video or like the next song or show. But can this also be used to change user's preferences or thought process?
> and the output is the likelihood that you'll click on an ad or buy a product on a specific surface.
Surveillance capitalism isn't really about ads. Increasingly that data is being used to impact your life offline. It influences how much companies charge you for their products and services. It determines what version of their policies companies will inform you of and hold you to. It determines very big things like whether or not you get a job offer or a rental agreement, but it's also being used to determine even small things like how long a company keeps you on hold when you call them. It's being used to make people suspects for crimes. It's being used against people in criminal trials and custody battles. It informs decisions on whether or not your health insurer covers your medical treatments. Activists and extremists use it to target and harass people they perceive as being their enemies.
The data you hand over to companies is being used to build dossiers stuffed with inaccuracies and assumptions that will be used against you in countless ways yet you aren't even allowed to know who has it, what they're using it for, when they use it, or who they share it with.
Nobody really cares about what ads they get shown when they use the internet so companies like to pretend that that's what their data collection is all about, and they absolutely do use it for marketing, but the truth is that digital marketing is a smokescreen for everything else that your data is being used for and will later be used for.