The politician's use of "shared" to describe this is misleading in a way that HN headline-only readers are going to jump all over. What these sites were doing was abusing Google Analytics to store private user data in custom parameters. That is, in one example, if a user of a tax prep site said that they had 1 dependent, the site would set a parameter like "cd17:1" in their Analytics request. I don't really see how that amounts to Google's problem.
If a company uses or abuses software platforms to improperly store private user data, that is their error and not the fault of the platform. I assume you would not blame Google if Intuit put everyone's AGI in a Google Sheet.
It doesn't seem like anyone is saying this is the problem of Google Analytics or Pixel, but rather the problem of tax prep firms for storing the data in first place and perhaps meta for using this sensitive data.
The Senators demanding investigation and the article itself are both clearly implying Facebook has done something wrong. The accusation seems to be that Facebook used the data to target ads. I suspect that's true only in the sense that it went into the same generalized funnel all the URLs do and not that they're specifically parsing out this information, but I guess those details don't matter if your reputation stinks.
I can easily imagine Facebook's machine learning algorithms picking a signal out of this data and using it to target ads. They may not have had to tailor a query to search for "size of tax deduction". The machine learning algorithms did that on their own. In a completely opaque manner.
I doubt it is anything anywhere near that fancy. Look at the primary source of this complaint, which is available in GitHub. One of the sites was putting sensitive data, such as income amounts, in the page title. The Facebook analytics thing sends the page title to Facebook. Google Analytics also takes the page title. You can easily see how any old targeting engine, machine inference or otherwise, would take the titles of visited pages as a signal.
Based on experience working in a similarly-sized company I have some guesses as to how they might operate. Is there a logical reason I should always assume the least favorable thing instead?
How do you know that they are not parsed out by Meta? The possibilities for this goes from collusion with the tax firms over somebody at Meta reverse engineering the coding to an automatic "AI" reverse engineering ("pattern discovery" Not sure if they do that but possibly?).
I don’t “know” that which is why I said I “suspect” it rather than “know” it. But it seems like such an obviously foolhardy move with such marginal gain to deliberately seek to collect and analyze that data from tax prep firms that I have a hard time imagining them signing off on it.
My point is that they might not have to sign off on it. The signal for the ads might have been discovered by an algorithm without making it explicit to the management.
I would not be surprised if google, meta and co don't have big ML codes that try to find signals to increase ad revenue, and they dump all data they have into it.
"It's an algorithm! Nobody knows how or what went into creating it!" Seems like the new version of "I'm in charge of making the rockets go up; where they come down is a different department."
Meaning, of course, people are in charge of the machine learning / algorithm, people manage the weights, they measure the outcomes, they exercise editorial control. It's not a mystical black box given to us by god or a super advanced alien race; it's people doing people things; if they get the profit they also get the liability / responsibility. Even if they're not sure how it works, they sure as hell still responsible for it.
I am not arguing against them having responsibility. They do. I am arguing that they might not have realized what the weights are doing. Because, yes, indeed, large neural networks ARE black boxes.
I am not sure how it would be workable for Facebook to be responsible for what information third parties choose to put into their tool. By that logic pretty much any application that allows users to upload text is equally guilty unless each post in manually reviewed.
It's definitely on the tax company for sending extraordinarily sensitive data to google analytics.
However I'd say sending sensitive (or at least private) data to google analytics is more or less its intended purpose. I don't think the use and continued existence of google analytics is compatible with any reasonable privacy protection.
The article states that the information was "shared" with Google and Meta via their respective pixels for ad-tracking purposes, but claims that only Meta parsed out that data and misused it.
>but claims that only Meta parsed out that data and misused it.
Can you quote the specific section for this claim? I skimmed the themarkup.org article and the closest thing I could find was this
>Meta wins financially too. The company says it can use the data it gleans from tools like the pixel to power its algorithms, providing it insight into the habits of users across the internet.
Which has a link to facebook's tos. That's not really indication of what they actually do, just legal ass covering.
Right. Someone at H&R Block wanted to know "what pages or functions do people with 1 dependent use the most".
This query seems innocent enough. It's maybe logical and an easy thing to ask for without realizing the implications. I would bet this type of thing happens with LOTS of sites that collect any sort of personal data.
Obviously for us here, blocking analytic platforms through an adblock tool is a smart thing to do, exactly for this reason. But obviously that doesn't scale to the masses.
Because the whole point of Google Analytics is because Google wants the data. Obviously the platform here is just as much at fault as the customer using it inappropriately.
There is no way that google has interest in the effort it would take to try to interpret the custom data that websites insert into custom analytics dimensions in a meaningful way. Their algorithms may track trends for them, but they are not aware of what they actually mean in a semantic way. I doubt that it us useful for them to learn anything from that data that is scoped outside the single account.
People are so confused about this. They underestimate the cost and effort that would be needed to just grovel around in GA logs and infer things, nevermind the fact that Google promises not to do it. Then the same commenters overestimate the expected value of successfully doing it, handwaving about how valuable it would be when in reality there is not a market for such data. Finally, they underestimate the risk it would pose to the adsense business, which prints more money than Jerome Powell, and the risk averse culture of Google who would not put that business in jeopardy for anything that pays less than a billion dollars per hour.
Because it would be impossible for Google to do any kind of serious server side analysis of the data, which is why these companies use analytics, if Google don't have the data unencrypted.
I worked on apps where the data collected was completely empty of user-identifying information. Management seemed content with merely logging clicks from within the UI, seeing how (and how many) people used the app.
Sadly I think there is a trend in our industry to gather everything, decide what you actually want/need later.
I think Google (and related) are to blame for enabling this kind of "data hoovering". Giving the customer what they want at the expense of us the users is getting kinda evil.
The terms of service of Google Analytics specifically prohibit storing PII in it. "You will not and will not assist or permit any third party to pass information, hashed or otherwise, to Google that Google could use or recognize as personally identifiable information"
And that level of insight only works for small companies - at a certain point you want to combine analytics from your UI with things like “is this a high value customer” or “did this customer stop using the app after experiencing this frustrating UX”
I think you can say it's "anonymous" data in that it doesn't have name/phone/email but still be relatively fingerprint-able given additional analytics data.
I'm still not convinced how this is "bad" per se. Yet we're all here discussing this as if it's already decided that this data shouldn't be shared and I'm sitting here thinking "Wtf? Who asked me?". Ok, so the Tax firm I gave my personal info to put it on Google, okay. Will google see it? Sure. Is it on some DB on google? Sure.
Do I care? No.
We're watching the world burn right now with more pressing issues, and this whole thing seems so ridiculously academic and contrived.
Same here, this type of outrage bait doesn’t work on me either. Some income info on me is in Intuit’s database somewhere, it hypothetically maybe being in a database at Meta does not affect me at all, other than I might get ads for more luxury goods. I don’t care, I also click “allow” instead of “Ask App Not To Track” in every iOS pop-up scare message.
I don't understand why you'd be confused after reading the original source[1]. The authors explain at length why they consider it to be Meta's problem, and it's not hard to understand - Meta make misleading claims about their own ability to detect and filter personal information. It also appears the detail sent was a lot less obfuscated than you indicate here.
If you only got as far as the press release[2] then I can understand your view:
> * Tax prep companies shared extraordinarily sensitive personal and financial information with Meta, which used the data for diverse advertising purposes
> TaxAct, H&R Block, and TaxSlayer each revealed, in response to this Congressional inquiry, that they shared taxpayer data via their use of the Meta Pixel and Google’s tools. Although the tax prep companies and Big Tech firms claimed that all shared data was anonymous, the FTC and experts have indicated that the data could easily be used to identify individuals, or to create a dossier on them that could be used for targeted advertising or other purposes.
This paragraph is woolly and does not appear to support the claim in the bullet point. But the full report has much strong wording on page 2: "Meta also confirmed that it used the data to target ads to taxpayers, including for companies other than the tax prep companies themselves, and to train Meta's own AI algorithms".
The logic of this claim, via page 19, appears to be: Meta says if their sensitive information filtering algorithm detected personal information, the information would not have been used for advertising, and they'd have sent a notification to the tax prep firms. They also confirmed the negative case: if no notification was received by the tax prep firm, then no filtering of their data took place. Meta was asked to provide copies of notifications they had sent to the tax prep firms and they did not do so. So the assumption is that none were sent, therefore no filtering took place, and the data were used as a signal in the advertising algorithm.
I don't find it to be an unequivocal confirmation, but the sources don't support your claim that this article is misleading or your claim that there's no reason to consider it a problem of the tech companies involved.
Not sure what's misleading about it. They shared that data through these analytics links, didn't they? And according to the article, Facebook then used it for targeted advertising.
I don't see where you get that people are blaming the platform. It sounds to me that they're blaming the tax prep companies that shared the sensitive data.
If I take your tax return and store it in my deposit box at the bank, it is misleading to say that I shared it with the bank, or the bank collected it.
The letter from Sen. Warren that caused this press article blames Meta and Google for these events. It says "The Big Tech firms also appeared to act with stunning disregard for taxpayer privacy" even though Google and Meta had no agency in these matters.
<< Google and Meta had no agency in these matters.
I take issue with this statement.
agency is defined as -- via google - as action or intervention, especially such as to produce a particular effect
If google/meta creates a product like google analytics that is then used by third parties ( even if it is misused -- for various values of misused ), in itself, it is an action. You can say they were not actively involved ( and I do wonder about that, but I have no definitive proof one way or another ), but it is hard to argue they have no agency.
It is an equivalent of 'look, we are only selling the thing that we also happen to completely control online and we have no control over how it is used'. It is not a great defense ( and it is very easy to portray negatively ).
Yes, if they have no agency then I would expect them to ban customers violating privacy laws. Doubt that happened. More than likely, the entire ad industry end to end, is a actively searching for and excuting on opportunities to generate revenue by raping user privacy.
> Taxpayer data was also shared with Google, through its own tracking tools — though the firm told lawmakers that it never used the information to track users on the internet, according to the report.
So, they only store a unique id in their user tracking system, but they put the sensitive information in their ad targeting system, then join the data at impression time.
I guess it is good that they don’t stick PII in tracking cookies that they use to interoperate with third parties, but that is an extremely low bar.
> I don't see where you get that people are blaming the platform.
Other than when Senators referred to it as a shocking breach of privacy by "Big Tech" and the article includes a digression on Facebook's supposed long history of privacy issues, clearly implying this is another such incident?
Well, they do. That's why those tax prep companies should be doubly careful about sharing sensitive data with them. And it was "by tax prep companies and by Big Tech firms", so they didn't put all the blame just on Google and Facebook.
It is Google’s fault for creating a product that can be abused in such a way that would allow for these outsized and egregious invasions of individual privacy
In every other engineering discipline you are required to understand the impacts and uses of what you build.
This is what an “environmental impact study” is - and despite what people might think of how they are used - the core purpose and reason they are key to engineering is because second and third order effects are real, impactful and frankly mitigation costs a lot.
Unfortunately, we never actually developed such a practice in software development outside of cases where hard engineering was leading such as in defense, nuclear or other very highly regulated, lethal, or martial industries.
This argument can be extended to anything, which suggests it might not add any value to the discussion.
- You make Microsoft Excel. Someone uses Excel to leak a million people’s PII.
- You build shovels. Someone murdered someone with your shovel.
- You build an analytics product, explicitly ban collecting PII in your EULA, someone does it anyway.
It’s fun to be outraged and hating on GA is very in vogue at the moment, but this specific incident is just someone (yes, someone, not a nefarious company) using a popular tool incorrectly.
> This argument can be extended to anything, which suggests it might not add any value to the discussion.
Excel has many primary benefitial purposes, it wasn't built for leaking PII.
Shovels likewise are useful for a benefitial purpose, they are not made for beating people up.
Third party spyware ("analytics") has only one purpose, collect data from unsuspecting customers. They have no valid benefitial purpose, if one accepts that spying is not kosher.
Well, I’ll tell you what the answer is not: ignoring these externalities, and assuming they are someone else’s problem
This is my least favorite part of product development, which is the part where people who like developing stuff just arbitrarily decide when to stop caring about what the product is used for, and then fighting the rest of the world on that design decision because it’s obstinately “simply their product and their choice.”
I don’t know anything more antisocial or lacking in responsibility than taking on the responsibility of being a leader, promoting your services or product to other people via advertising and marketing and then shirking all responsibility when people say your product is being used to harmi other people.
As long as you get paid and do only the things that ONLY you think are important, that’s that’s the kind of society we want to build, right?
I agree, these shovel makers need to be held accountable for their cavalier attitudes about how their products are used by murders to bury people, which is concealing evidence and that is a crime. As long as they get their fat shovel checks, it doesn't matter to them. Is this the society we want? Where murders are helped along the way by these fat cat shovel makers?
Yes but some tools which are useful and important are inherently dangerous to oneself or others. There is only so much you can do to make them safer before you have to make the operator assume responsibility for using them properly.
This is another reason why it should be the government's obligation to provide websites to prepare and file taxes. Tax softwares should work to solve complex scenarios or whatever the need be for the specialized requirement only via API. And tax softwares should be approved after an audit by the government.
Or how about - send me a tax bill at the end of the year and let me contest/adjust it if I want. The IRS knows the exact number the vast majority of people owe, but just chooses to keep it a secret.
Not a US citizen so forgive me - but does the IRS not auto file tax returns for "regular" incomes - those managed fully by your employer?
In my country, it's automatically done for those with "regular" incomes, even end of year tax back is as simple as logging on to Revenue and clicking a button.
You're only on your own if you're self employed, generate non-standard income etc (like shares, which I imagine most people on HN are).
In those cases, the government does _not_ know how much you earn until your audited. And audits are neither automated nor simple, a dedicated auditor is tasked with identifying all your sources of income and verifying everything. Point being it's manual.
The tax prep software companies (Intuit, H&R Block, etc.) have successfully lobbied to make such a system illegal and keep the system as complicated as possible. There has been some movement to make this simpler but I don’t have much hope it will actually bring about meaningful change. Free file forms is a joke and hardly better than filling out PDFs (or paper) manually.
Companies DO submit your income to the IRS. However, citizens are required to separately submit their incomes on tax returns. Taxes on (nearly all) stock income is also reported to the IRS. The IRS cross-checks these values between the company-reported and the earner-reported submissions.
The argument is that if the system auto-filled tax returns, people wouldn't report income that the system hadn't auto-filled, so the government would lose money.
All federal electronic filing is done through private companies. Even the IRS's website for "free fillable forms" (which appears as if it were run by the IRS) is actually run by a private company. The only way to not submit taxes through a private company is to submit them on paper.
The IRS has a thing called "free file" for simple tax situations below an income threshold, but Americans in several populous states would still need to separately prepare their state income tax filings.
Americans in most states would need to prepare their state income tax. Americans in several states would also need to prepare their local income tax forms.
Personally, I am filing income tax returns to four jurisdictions this year. One federal, two states, and one municipality.
The most the government does is provide their own tax filing software for free if you meet a certain income threshold, but (1) you still have to do all the work yourself (collect forms, enter data, figure out what you owe) and (2) there is an added penalty if you get something wrong. So most choose to use paid tax preparation services regardless.
It's incorrect to frame it like it's the IRS' fault, when it's really the fault of companies like intuit or H&R Block and their congressional lobbying efforts, and of course the congress that keeps taking the money in exchange for continuing this system.
A simpler system can “solve” a problem, but it often disregards complexity instead of handling it.
A federal sales tax is such an example. It simplifies the calculation of tax burden (solves the problem) by implementing a regressive tax code and placing an undue burden on lower income residents.
Tax deductions for single employed people, married couples and married couples with children, without capital gains, are stupid simple and also cover the majority of the population.
The government really should just send a bill/check. Tax websites/software only exist in the first place because of the extraordinary lobbying effort of tax prep firms.
The IRS's "free file" landing page uses Google Analytics and New Relic in a way that sends the URL and page title to both parties, the exact mistake that H&R Block is accused of here.
I can understand analytics, I don’t like it but I can understand, however the government site does not use Meta which is inexcusable as afaik they don’t have a pure website analytics platform as that isn’t their business.
As much as I hate Meta, Google, etc... they shouldn't be in the crosshairs here.
It's the tax-prep companies like Intuit that are effectively rent-seeking leeches that serve little purpose outside of "lobbying" politicians to keep the tax code intentionally obfuscated and full of loopholes.
If you're going to move the target like that then why not to congress itself who even enable the ability to lobby in the first place? Several congressmen discuss regulatory capture but they themselves participate in it. Overturn Citizens United.
Absolutely. I'm all for companies being able to participate in political discourse, but there has to be a limit. Regulatory capture, while a natural outcome in a capitalist economy, should be actively fought against with the strongest of forces.
My proposal:
~If you've received a federal contract in the last 2 years of over $500,000 , or are currently in the midst of a contract with a federal (or state) agency or military branch, you cannot lobby.
~If you have a EV of over 1 billion dollars, you cannot lobby.
~If over half your workforce are not US citizens, you cannot lobby.
~If any full-time worker in your organization makes within x% of minimum wage, you cannot lobby.
I'm all for small and medium sized businesses getting their voices heard. I'm not in favor of massive corporations bullying their way to writing laws that hurt the American people.
Actually, it sounds like Intuit was the best of all of the named firms (see the original research here: https://themarkup.org/pixel-hunt/2022/11/22/tax-filing-websi... ) but still not perfect, since they at least shared the username of the active user.
This is a very silly and naive thing to believe. There are 10,000 loopholes and edge cases in the tax code because 10,000 different special interest groups lobbied for them to be in there, not because Intuit put them there.
If you see a tax credit for air conditioners purchases by Florida panhandle-based polar bear breeders in the tax code, you should blame the Tallahassee Association of Polar Bear Breeders, not Intuit.
Why not both? Meta is evil for buying private info, TaxAct et al. are evil for selling it. Additionally, they are evil for profiteering on inefficient tax system and lobbying to keep it that way.
> In a letter to the heads of the IRS, the Department of Justice, the Federal Trade Commission and the IRS watchdog, seven lawmakers say their findings “reveal a shocking breach of taxpayer privacy by tax prep companies and by Big Tech firms.”
> Their report said highly personal and financial information about sources of taxpayers’ income, tax deductions and exemptions was made accessible to Meta as taxpayers used the tax software to prepare their taxes.
> That data came to Meta through its Pixel code, which the tax firms installed on their websites to gather information on how to improve their own marketing campaigns. In exchange, Meta was able to access the data to write targeted algorithms for its own users.
> In exchange, Meta was able to access the data to write targeted algorithms for its own users.
Here, I saw it, and it wasn't hard. I didn't give Meta - the company I have no relationship with - to use my private data, entrusted to completely different company, to enhance their business. It's like you'd hire a cleaning company, and the cleaners would ruffle through your private papers while inside your home, get some confidential financial info, and use it for personal profit. I'd say this is something that one may classify as "wrong".
I used tax prep software once or twice. Each time it cost about CAD 20 - 30.
Then I found an accountant that'll do it for CAD 50-100 (for a simple filing).
My time spent filing tax went from 30-60 minutes to about 1 minute. The amount of psychological pain from reading endless legalese, figuring out if I had moved more than x km for work, and many other inane questions, dropped from a lot to almost zero.
And the money I got back skyrocketed because this person actually knows all the legalese. Best investment ever!
You know you still have to tell your accountant whether or not you moved x km for work if it’s relevant for your taxes right? They aren’t omniscient, any questions that your tax software makes you answer your accountant needs to ask you too if they are doing their job right.
If your taxes are simple enough that it costs less than $100 for an accountant to handle it then that means you shouldn’t have to do anything more than enter your income and biographical info into your tax software. There is no reason why hiring an accountant would be any faster or more convenient.
And I’m sorry, but the fact that your taxes were so simple that your accountant charged you less than $100 and somehow still magically found extra money for you smells very fishy to me. At that price point there are no secret deductions your accountant will discover for you that turbo tax won’t automatically claim for you and that you need deep knowledge of legalese to exploit. Are you sure your accountant didn’t claim fraudulent deductions or forget to list certain sources of income?
You should never judge an accountant by how much money they get back for you, when you have
simple taxes your tax burden is both trivial to determine and immutable, and the only way to decrease it is fraud.
Yes but the accountant knows my general situation, knows the current state of the legalese, and can translate this into the minimum number of simple, < dozen word English sentences.
As regards to your last two paragraphs, I think you're missing the point. I could find these deductions in Turbotax if I wanted to. But for $100, I don't have to think about it or wade through endless questionnaires. It's unfortunate that your mind jumps to tax fraud on my or my accountant's part before you consider this, much simpler, point.
AI powered analytics is going to know so much about us that we don't have a chance unless we make the cost of analysis 100-1000x more expensive through obfuscation (which costs latency unfortunately).
The internet is all connected; if some companies in certain countries are prevented from analyzing, others will pick up the slack.
It's possible noscript saved me, but I'm not 100% sure. Sometimes the page has a bug where it won't work without the external JS loaded, and you are forced to load it. My bank recently added a login requirement for some external JS from a site called "launch darkly", which I know very little about.
Can we sue these entities to make them stop this shit? Should be illegal to load external JS on sensitive sites.
They should be forced to hard delete the data and all its copies. Not anonymize it or soft delete it. Hard delete it as in zeroing all of it and all data derived from it.
Then report on how the data was shared and force each entity receiving the data to delete it as well.
Correct me if I am wrong, but there is a significant lacuna in this article, namely: which years? I may be affected by this, so I read the article to confirm whether I am affected based on the years, but they didn't even put in that information.
Who did this? Specifically, who are the individuals involved in doing this? It isn't enough to name the corporations, I want the names and personal information of everyone who did this.
Start doing that, and you start solving this problem.