Hacker News new | past | comments | ask | show | jobs | submit login
Transparent telemetry for open-source projects (swtch.com)
248 points by trulyrandom on Feb 8, 2023 | hide | past | favorite | 300 comments



I've been a pretty strong advocate of the idea that analytics should always be minimal, 100% anonymous, aggregated, and open to the public - otherwise it’s spying. This is how we do analytics on our websites today[0][1], and how we plan to do it in games we release in the future. Maybe one day I will start a dedicated FOSS service that people can use for exactly this with some trusted reputation/transparency/auditability to it.

I think what Russ has described here is decent and well-reasoned. I also think that Go being a product (it is, whether you like that word or not) makes it more fair to desire analytics of this form. I think it being opt-out is reasonable (after all, if it is not, they will make decisions using data that does not come from the vast majority of users, may as well not have analytics at all then.)

But I am afraid of this becoming pervasive not just in products (like CLI tools), but also in libraries, imagine every Go/npm package you use wants to ping the network because the authors want to know 'is this popular? can we deprecate XYZ method?' etc. If transparent telemetry in the form Russ and I have been viewing it becomes a more common thing, it won't be a surprise if more library authors begin to try to adopt something like this and it becomes a pervasive problem IMHO.

[0] https://hexops.com/privacy

[1] https://machengine.org


I am concerned about run-time telemetry in libraries as well. It might make sense for language ecosystems to offer more data about library usage gathered at build time eventually, as a different system than the one I'm posting about today. I think when you get to that level of detail you probably need to start thinking hard about differential privacy and probably cryptographic solutions like ESA or Prio. I don't think we know enough to design the library solution yet.


Telemetry embedded in libraries is simply abusive, in my opinion. At the very least, the decision about whether or not to include telemetry should be made by the application developers, not the toolmakers.


Right. My hope would be that language tooling offering library developers visibility into compile-time information about library usage would reduce their desire to insert run-time collection instead.


Opt-in should be the default where the tool asks for consent politely. Information on how the application is being interacted with belongs to the user not the developer thus it should favour their choice over sneakily enabling it for those that didn't pay attention or don't understand it.

It's on the project to convince the user to turn on telemetry rather than the user having to remember to turn it off. Excuses such as "nobody would turn it on" don't apply.


Yes. I fundamentally don't care how "good" Go telemetrics would be, because I don't want the FOSS ecosystem as a whole to take any more steps down that slippery slope. There will not be a way back from this.


> the authors want to know 'is this popular? can we deprecate XYZ method?'

This is something that was common for internal libraries at some of the places I've worked. I'm honestly a little surprised it isn't a thing we see externally. I for sure do not want to see it, but I'm surprised we don't. Its probably enough to look at the public usage on GitHub, and make inferences and post notice on future-major-versions of libraries. Github honestly should make a tool to do this, they'd have a huge opportunity to inspect the data.


> I've been a pretty strong advocate of the idea that analytics should always be minimal, 100% anonymous, aggregated, and open to the public

And opt-in.

> I also think that Go being a product (it is, whether you like that word or not) makes it more fair to desire analytics of this form.

Not by stealing it.


> 100% anonymous, aggregated, and open to the public

I don't believe the "100% anonymous" is a thing with AI anymore. When AI can identify/fingerprint you by your walk pattern[1], you can't really tell what data can and can't be used to fingerprint you.

[1] https://ieeexplore.ieee.org/document/8275035


Checking out Mach and seeing it's written in Zig. Bit surprised to see it being "used in anger" given that Zig is currently at 0.10.1. Can you share how your experience with the language has been so far? Thanks.


> the vast majority of projects, even large ones that would benefit, stay away from telemetry.

Nomad is one of these projects. We support a dizzying array of platforms (32bit Intel Linux?!). We have no idea how popular our Consul service mesh integration is. Are bug reports a sign of use or just failed experiments? Is anyone running on macOS in production or just ephemeral dev agents?

Surveys about this are just asking humans to do something computers can do better.

Obviously privacy and consent are paramount concerns, but not only are they solvable, in open source they’re fully auditable (and a fork could fairly easily maintain a patch that removes it outright).

I think open source largely rejecting telemetry puts it at a huge disadvantage to proprietary and SaaS software where it is the norm. I’m very excited to see someone as thoughtful and well reasoned as Russ Cox to be trying to move the status quo forward.


On the contrary, I'd argue that the tracing visibility you're looking at isn't inherently a software trait at all. It's a deployment feature, which is something you address at-cost when building a product, but almost never when building FOSS software. It's not that people in FOSS don't see that upsides to it, it's that those upsides are insignificant relative to the cost of sustained market research. It's easier to just... make stuff, and have companies plaster over the gaps when their interests align.

Look at GNOME, which recently pushed for it's users to contribute telemetry: https://linuxiac.com/gnome-survey-results/

Nothing wrong with what they've done here, but we already had most of these metrics. Nothing was really learned, and it took Red Hat and a few thousand users to get here. For smaller-scale projects, imagine how much smaller the returns would be.


Everything involves tradeoffs.

The times "we" (previous companies) tried to implement telemetry in open source non-SaaS products (as distinct from "projects"), we either got huge blowback or users/customers simply blocked it at the firewall (and security teams at major enterprises were unwilling to open holes anyway).

The only workable solution I found was integrating this in a value-add way, so that something in the service/experience/etc was better for the user/customer as a result of enabling telemetry, without the dark pattern of making things intentionally awful/worse without it. We simply never got enough data to matter otherwise. But, again, that was products and not projects.


> The only workable solution I found was integrating this in a value-add way, so that something in the service/experience/etc was better for the user/customer as a result of enabling telemetry, without the dark pattern of making things intentionally awful/worse without it.

This sounds like a great concept, but I'm struggling to come up with concrete examples - how did you approach it?


This is an incorrect assertion.

We have to ask for permission on our SaaS products to collect this data as it's not necessary to collect it for the product to function. The EU GDPR mandates this.

Russ Cox is suggesting that there is no permission step and that the data is collected by default.

That is the issue.


From my reading focused on this specific issue of the GDPR and the national laws of member states, this is not the case. Opt-in is specifically required for personal information. The telemetry data outlined in the proposal would not fall under this requirement. You can even retain time-limited IP logs with some special caveats. The GDPR is actually quite reasonable and fair.

Russ Cox is a very intelligent and effective engineer. He has a history of projects where he first analyses the problem space, then arrives at great solutions. He puts a lot of effort into discussing the problems and proposal with the community, especially after the widely criticized go mod decision by the go team (which is now mostly accepted as unfortunate, but in the end, the correct decision, I would think).

My point is: We all suspect Google and telemetry to be bad. But can we be charitable enough to separate the Go project, that is run by individual humans, and telemetry from our superficial cliches to actually read the proposal?


Google or Russ Cox's reputation is irrelevant. The idea stands alone. I'm merely crediting him with the idea.

I read the proposal. There is no discussion of the legality of this at all. I'd expect anyone with any level of supposed technical competence to consider this in relation to global data protection. I suspect there has been no legal review as mentioned in the thread because I know how slow the lawyers in this space work and the timeline between publishing this and now is too short to have had a conclusive answer.

As for your point about GDPR, I think if you apply your right to withdraw from opt out data collection and what that entails and then ask how this glaring defect is missing from RSC's paper, then you'll see exactly how much privacy consideration really went into this.


Can you articulate how this telemetry collection would violate the GDPR explicitly?


GDPR only cover PI data so your comment is irrelevant.

https://gdpr.eu/eu-gdpr-personal-data/


Everything is PI when you connect enough dots


Probably related to[0].

To anybody complaining that this should be opt-in: opt-in telemetry doesn't work. The reason for this is that most people don't care, but they don't care either way. They don't disable it when prompted, nor would they enable it manually.

The idea of telemetry is being able to prioritize the work that will be most widely useful. For this you need a good and balanced sample of your users. You don't really get any kind of sensible sample if you only do it opt-in. Additionally, this ship has long sailed, everybody does opt-out.

What I do think however, is that it should very clearly notify the user of this, and give them an easy way to disable it. Like in OctoSQL[1] (disclaimer: which I'm the author of) which prompts you on first run and shows explicitly how to disable it.

All things considered, this is an open source project, so you're free to maintain a fork without telemetry. The Go toolchain also uses the Google-hosted module proxy by default, which really is a bit like telemetry already.

[0]: https://news.ycombinator.com/item?id=34707583

[1]: https://asciinema.org/a/eWQsyXQKi1fmithyTekAD5fWS


The argument for this being opt-in isn't about "it works better", it is about it being ethically correct. There are a ton of things that "don't work" unless you do something unethical: that doesn't mean they are OK, it doesn't mean they should be tolerated, and it doesn't mean the people who do them--and, at the end of the day, it is people who make these decisions: there is a human being who refused to say "no" and whose name we might even be able to find out--shouldn't be judged by their peers for doing these things... and that is all true even if it is (currently) legal for them to do it!!


You're framing this as though the "ethical" choice were obvious, or that there was a person who "knew this was the ethical thing to do, but turned a blind eye".

I disagree, I think it's a very contested topic, with lots of discussion whenever it's raised here, with either side possibly being a vocal minority.


The ethical choice is obvious.

The distinction is between "What I do with my computer is none of your business unless I choose to make it your business" versus "What I do with my computer is your business unless I choose to not make it your business".

It's insane that we are still having to justify privacy as a default, or that people continue to rationalize away the concerns.

Yeah, maybe if it's opt-in they won't have much telemetry data. Perhaps, in fact, it would not be much better than having no data at all. That will make some things harder. Major bummer. If it was easy to do the right thing, more companies would do it.


> It's insane that we are still having to justify privacy as a default

Half of the HN is in love with Chrome and nearly all are on Gmail.

The relentless drive towards the erosion of privacy powered by free carrots worked.

There is now a whole generation of people that sees all this as a norm.


Any program is opt-in, so you just don't have to use it. Mayor bummer. Assuming data collection is properly disclosed, ofc, but i don't see anyone here arguing against that.

The "there shouldn't be any" argument just seems so entitled with there being such demand / reasons to do so. I applaud everyone trying to find ways to satisfy both sides, as done by the original article.


You can't satisfy both sides by completely ignoring one side.

"What I do on my machine is none of your business, unless I decide it's your business" is a strong argument, presented multiple times in multiple places, and I haven't seen any rebuttal by any Go representative.

You can ignore the community and do whatever is good for business, but it's hypocritical to pretend you're a "community-driven" project if you do that.


Spying on people without their consent is not ethical.


Is it not consent if you tell the user that it's on, how to turn it off, and the user confirms the prompt having decided to not disable telemetry?

Because that's how OP framed it. There is an ethical way to track how your app is used, with user consent.


Putting text on the screen which would inform the user if the user was to read and comprehend it is not consent, no. (I mean, try to apply that logic to any other situation. You put a post-it note somewhere noticeable which says that you will silently mix some medication into your coworker's lunch unless they write some text on another post-it note placed on the kitchen fridge. When you then notice the absence of that second post-it note, and mix in some medication into their food, is that "consent"?)

Furthermore, looking at the asciinema from OctoSQL, users apparently have to edit their profile files on every single machine they intend to use Go on, then remember to verify that the profile applies and has no typos, make sure to never ever accidentally run the program in a context where environment variables aren't respected (would you remember to always use `sudo -E` if you need to use the tool as another user for example?). The danger seems extremely high that the user would, at some point, accidentally run the command without the env var set, even if they were technically proficient and did their best to opt out.

This is not how consent works.


No because consent is required for it to be on in the first place.


How would you use compiler telemetry to spy on people?


Working hours, work location, home location, favourite cafes, sick days, vacations, hotels, wealth level. And that's just from IP addresses and timestamps alone, and without cross-referencing with all the data that Google vacuums over other channels.


That ship has already sailed. The Go tool already by default makes network requests to the Go proxy, which potentially allows everything that you're talking about there. What's significantly different about this telemetry proposal?


A couple things IMO.

First, making network requests when downloading packages is necessary for the tool to function and unavoidable. People who care about this will be using a VPN of some kind. It's just how the Internet works. But telemetry is something the tool author is choosing to add, not something that's necessary due to the architecture of our computing infrastructure.

Second, the Go telemetry would apparently create a unique, persistent user ID. Normal Internet use doesn't, there's just the IP address which is different from location to location, shared by a bunch of people behind the NAT, and can be masked using common tools.

And yeah, I know this is "anonymised"... but if you have one user ID which uses Go sometimes with an IP address from a particular apartment complex land sometimes from a particular office space, finding out which individual that user ID belongs to is trivial.


> First, making network requests when downloading packages is necessary for the tool to function and unavoidable.

It's technically not unavoidable. The Go authors could have made use of the proxy opt-in rather than opt-out, making the tool less usable as a result. A similar argument applies here, I think.

> Second, the Go telemetry would apparently create a unique, persistent user ID

Where did you see this? I scanned through the "Telemetry Design" article reasonably carefully and couldn't find any mention of this concept, and the type definition for the posted JSON (the `Report` type) doesn't seem to include any such user ID.

In the end, ISTM that you're not complaining about something that actually affects your privacy in any way, but just the _idea_ of telemetry. Is that really something worth taking such a hardline stance on?


> The Go tool already by default makes network requests to the Go proxy

Frankly, that crap should be expunged from the Go toolchain as well. :(


[flagged]


I agree that opt-out is a Bad Thing, but I disagree with this stance. And I think lots of people in the pro-telemetry camp see that there's an ethical issue to be discussed, but they reach a different conclusion. They shouldn't be dismissed so glibly.


Reaching a different conclusion is one thing, but not seeing a dilemma is another. One can always argue that invading a person's autonomy might be necessary given the benefits but seeing no issue is just turning a blind eye.


In the Golang announcements, it's clear that they completely see and understand the dilemma, and have provided a lengthy explanation of why they decided for opt-out anyway.

I respect that. I don't agree with the decision, but it was made with understanding and thought.


I made my original comment misunderstanding what the parent comment meant as "not knowing an ethical problem exists". I also am not talking about this specific decision, but criticizing ethical decision making in the tech industry in general.

In ethics, there is no right or wrong answers (mostly), just right and wrong methodologies. If you go the pragmatic way, you'd argue that the benefits of telemetry are greater than the downsides and implement it. If you go Kant's way, you would already have a maxim (either "never invade privacy" or "prioritize technical benefits regardless of the users" in this case) and act according to that maxim regardless of the situation. If you go the intent way, all that matters is whether your intent for the action is good or bad, in contrast if you go the outcome way, all that matters is the outcome regardless of the intent or the methods.

These are all "valid" ways to discuss an ethical dilemma. However, one must always acknowledge the dilemma. This industry, especially big tech, seems to ignore this quite often, mostly because it's very easy to see people as "just numbers" when you don't see them directly. Don't even get me started on lawmakers who are also ignoring this whole issue. Many standard practices in this industry would be straight up illegal in lots of other areas, especially where there is face-to-face contact.


This is a very extreme position. It's hard to take you seriously when your comment doesn't have any nuance.


Finding the collection of a person's data without consent unethical is not an "extreme position". Since when "consent" or more correctly "autonomy of individual" is called "extreme"? If you did the same thing in my field (medicine), you would lose your license.


I agree, that's not what I think was extreme about your position. I think you've invented straw men in this comment and your previous comment.


Reading your comment again, I can see it now. I misinterpreted "knew this was the ethical thing to do, but turned a blind eye" as "knew there was an ethical problem, but turned a blind eye".

Turns out, my straw man can't read.


Props to you for saying so publicly! I'm not sure if you're unusually open or if I just found the right words to persuade you, but this is a first for me :)


What is the argument for opt-out telemetry being unethical?


Because it most likely means that people are sending data without their consent. Perhaps I am naive or just very old, but I wouldn't expect a compiler to "phone home" with information about what I do with it. Certainly not without me expressing consent first.

So if you want that information, find a way to ask the user first. If you can give a good and understandable explanation on how the information is useful, the users might give their consent happily.


I don't think it's all telemetry. I suppose telemetry could be designed in a way that preserves the users privacy to an extent that is compatible with their native assumption. I suppose that design also depends on what you're building.

If you're building a website, I think it's fairly reasonable for you to store my IP. That's inside my expected privacy loss when dealing with a remote party. I have to connect to your computer, much like i have to physically walk into a store. I don't mind you remembering that I was there. Running a compiler on the other hand feels more "private" to me somehow. My expectation when using a compiler is that it won't send anything to anyone, because why would it?

In general I think our industry is starved for relevant and foundational ethics research, outside of the FSF at least.


Because that goes against informed consent.

Opt-out is generally rejected by European privacy laws.


> Opt-out is generally rejected by European privacy laws.

...where personal data is involved.

It strikes me that this proposal goes to considerable lengths to avoid collecting anything that could be considered personal data.


IANAL but European law is nuanced over whether IP addresses are PII. If I'm not mistaken it's been ruled they are for ISPs, rationale being they have enough other data points that once correlated with IP addresses allow to identify individuals. Whether the same applies to Google (I suppose) is definitely not clear to me.


The proposal explicitly says they don't collect IP addresses or _any_ unique identifiers.


As far as I'm aware/recall, European privacy laws consider any connection back to a telemetry server to count as "collecting" IP addresses, since the telemetry server learns it (even if they pinky swear not to write it down.)


You don't recall perfectly well.

Storing IP addresses in logs means that you are now responsible for them, yes. Drop them out of your logs, and you're perfectly fine.


I think privacy laws only apply to things that “process” PII. Accepting a network connection is not, in and of itself, considered to process PII.


Can you send telemetry data through Tor, though? :thinking:


There are Court cases that have established that the very fact that a connection is being established constitutes a potential collection of IP adresses and needs to be declares under GDPR. (this was specifically about sites using links to Google Fonts on their websites, this was enough to warrant a GDPR declaration that IP are being collected or the sites needed to remove their Font CDNs and supply them locally). Under the same Rule, Companies will need to ddeclare this usage of Go Compiler in their employee GDPR declaritions.


I assumed you need consent to receive PII, full stop. Again IANAL, but I assumed saying you don't do anything at all with the PII you receive doesn't exempt you from anything under GDPR. I may be wrong, though I hope not to be.


I agree it does do that though for context of others reading the thread personal data is a very broad topic:

https://gdpr.eu/eu-gdpr-personal-data/

You have to be very careful to do it properly.


> The idea of telemetry is being able to prioritize the work that will be most widely useful.

It does sort of hinge on the highly suspect assumption that usefulness is correlated with use. An obvious counter-example to this is something like a fire-extinguisher, which will in the ideal case just sit on a wall until it's use-by date passes and then it's discarded having never been used; or on the flip side, an incredibly byzantine workflow that could be reduced to something much simpler will appear important and useful.

Even without these edge cases, interpreting statistics is really hard. Like people with PhDs who have studied these things for years still get them wrong all the time.

What ends up happening more often than not is it's used as a tool to quiet the critics when pushing through unpopular changes.


Most software features are not like fire extinguishers.

More than that, the interesting stats may be not even around user-visible features, but around internal mechanisms, like some cache hit rate, or how often is some branch in the compiler invoked.

As long as stats are clearly inspectable, reasonably anonymized, and are opt-out, I'd be fine with sending them.


There are essential, fire-extinguisher-like features. The canonical example is the joke about backup software: if it were developed according to today's standard of telemetry-driven engagement analytics, the restore from backup functionality would be removed because it's used so infrequently.

This actually happens sometimes: when developing the demo ".kkrieger", a first-person 3D shooter in 96 KiB, demogroup theprodukkt tried to shrink it down to get it under the 96 KiB wire. One of the tricks they used was using a profiler to identify code sections that were never reached and could be removed. One of the sections they removed was the handler for the up arrow key in the main menu, simply because the test player never pressed up in the menu.

If you think that Google or another large software organization won't misuse telemetry by cutting or neglecting important but infrequently used functionality to hit some KPI... have you ever worked in a large software organization?

All stats can be deanonymized. The more data you make available, the more you identify yourself. I do not need software I use stealthily tying up bandwidth by "phoning home" with data about me. It is simultaneously betrayal and resource theft. If I wanted to contribute to the improvement of the software, I'd file a bug report.


I think your view of the ways usage stats are used is a bit simplistic. Not everyone remove "underused" features without giving some consideration, even in big corporations.

But since you clearly don't like telemetry, you should have a way to reliably switch it off. Here we are on the same page: there must be a well-documented and easy way to switch any telemetry off.


If telemetry is on by default, the vendor obviously wants you to have telemetry on. They are incentivized to make switching it off as difficult as possible, and even pull tricks like turning it back on after a delay of 7 or 30 days or so.

Telemetry should be opt-in, if it's provided at all.


The features that are like fire extinguishers are the ones most likely to be unjustly removed with the rationale of looking at telemetry.

See for example Mozilla's bizarre decision to remove the ability to change the override the character encoding of a webpage with some half-baked detector.


We cant solve this problem by carving out our eyes. We must adapt to having data about the world.


If folks were using just their eyes, they'd be calling users in to watch them interact with the software, calling users up to perform user surveys, doing all sorts of live-user testing.

But, that costs money: folks don't generally like to do surveys for free, don't like to come to your site (or have you come to theirs) for free, don't like to participate in tests for free... and the caliber of person who can design and orchestrate all that is _notably_ expensive.

So, companies have discovered that it's far, far cheaper to shove remote eyes into their products. No need to schedule user observation sessions, pay for transportation and participation, have to bother with hiring people who are capable of coming up with a good set of questions and scenarios to walk through.


Except that those remote eyes don't look at what a person would look at. They know if a feature was used but they cannot see the frustration on the face of a user when some feature is missing or does not work as expected or is hard to find.


Nobody wants you to carve out your eyes. They want you to stop shoving eyes into everything you touch.

Meanwhile, I'll decide what I must or must not adapt to, thank you.


Read more carefully


Write more clearly


Wouldn’t have mattered if I did


„ Most software features are not like fire extinguishers. „

Amen

Sometimes some pretty niche feature, that is used by one individual that is loud, gets more attention than a feature, used by thousands or millions that are just silent users.


All this boils down to "an unskilled engineer will misinterpret data even if they have it". I'll assume the Go team knows what they're doing, based on their track record so far.

There's a lot of very simple questions you can answer very reliably, too, like "what proportion of the users are still using a certain compatibility flag".


My point is that scientists whose job is to interpret data and construct experiments gets this wrong on a regular basis, despite years of training in constructing experiments and interpreting data, despite peer review, despite staking their career and reputation on not making these kinds of mistakes. They still happen! A lot!

Interpreting data is very hard.


> I'll assume the Go team knows what they're doing, based on their track record so far.

Funny, I’d assume the exact opposite. After all much of the understanding of privacy and statistics at scale was developed after 1980.


Normally Hacker News is ruthless with stupid comments like this, but some sort of unexamined feelings of inadequacy make Go devs a fair target. I hate this attitude.


Go is actually well known for dubious choices they reverted later, like not using libc, their weird major version scheme, or the absence of generics. There's a good record of charging ahead against the popular wisdom and being proven wrong.


> What I do think however, is that it should very clearly notify the user of this, and give them an easy way to disable it.

You make a good point.

As a for instance from a popular Mac-based package manager that (unexpectedly for many) defaults to telemetry from your CLI:

`brew analytics off` is not hard to type after installing homebrew, but the installation text doesn't mention that; instead it points to a web page you have to read about how wonderful the analytics are before eventually finding the incantation:

https://docs.brew.sh/Analytics

I wonder how many people care enough to click that link, read all the "analytics are actually good for you" copy, and then change their mind to leave it on. I'm guessing almost zero?

But perhaps most users won't cut and paste the link, where if it just suggested `brew analytics off` many users would type it.


> opt-in telemetry doesn't work

That's too bad. Guess you don't get any telemetry data if you want to develop ethical open software.

The answer isn't to bend your ethics.

If a take a dollar from everyone but it's opt out, that's just theft with extra steps.

If I make it opt in, nobody is going to give me the dollar, but that doesn't make opt out morally justifiable.


Why is it unethical? People in every day life constantly assume things about one another and take actions that affect one another without asking first, and this is a practical necessity, so if the reason is that "you may not do anything without me tell you its ok" I don't think thats a defensible position.


> they don't care either way

This is not true. You know it and you are being coy about it.

If your "easy way to disable it" was a simple question next to that unexpected notification displayed once in response to a completely unrelated action - "Would you like to keep telemetry on?" - you bet you'd have massive opt out rates. Nobody wants telemetry.

What you have is a gray pattern. The same pattern that caused the EU to clarify its cookie law to require Yes and No choice to be equally accessible. And what you have is "Accept all" and "See details". Except that yours is worse by being a one-time notification.


> this is an open source project, so you're free to maintain a fork without telemetry.

That option is a joke. The real alternative is rust - or any non-corporate platform that isn't gonna pull these kind of stunts.


But that's wrong. There is no position for this in a civilised society:

"If we ask everyone is going to say no, so we will steal it unless someone tells us not to"


I think the comparison of telemetry and stealing is pretty harsh.

Is opt-out telemetry unethical ... depends. If you use it in a privacy preserving way no, if you spy on your Users, sell the data for money or advertising obviously it is unethical.

The hard truth is, nobody reads the manual. Opt in telemetry is often a minority, and you then work with niche data for a minority that influences your development in certain ways.


It really all boils down to meaningful consent.

> if you spy on your Users

In my opinion, any data collection about me or my machines that occurs without my active informed consent is "spying". This is my fundamental problem with opt-out mechanisms. They do not indicate or imply that active consent was obtained.


A Flash screen at installtime that logging is on an you can disable it in the settings.

Would that be enough for you?


Unless a Windows user is installing the software, that screen would be displayed in approximately zero of the cases where a package manager was used to install the software. Similarly, exactly zero widely-used Docker images that contain the software would display this splash screen, as the software would already have been installed.

In short, unless you're a Windows user there are so __very__ many ways to install software that aren't "Go to the project home page, download a generic install binary, run that binary with world-write permissions.". Aside from very small-time projects, I can't think of the last time I used an officially-maintained install script that I got from the project's servers to install something.


It would be better than nothing, but not really adequate. There are numerous circumstances where such a screen is impossible or impractical, and if every program did this, it would be as good as not doing it because people will react to it like they react to other common warning dialogs -- not really seeing it at all.


But is that not the decision of the person who owns the data?


The world is full of people making decisions for one another. Did you consent to unix files not flushing on every write() call? Its not a meaningful complaint.


That’s not their argument. They say if you ask everyone if it is ok most just ignore your question.


That's how they presented their argument. It can be presented both ways depending on how you want to promote it.


Not responding is not the same as responding no


It is when it comes to giving informed consent.


> To anybody complaining that this should be opt-in: opt-in telemetry doesn't work. The reason for this is that most people don't care, but they don't care either way.

Why, in OSS, do you care about the users that do not care? If the users truly care they should buy support, or at least enable telemetry.

If someone complains about feature getting removed - tell them to enable telemetry next time. Maybe people who want so much privacy shouldn't be expecting free support for features they want.

Maybe you're worried about people like your grandma installing Linux. Then this should apply on the distribution level - there should be an opt-in setting for telemetry, that enables it in various programs being distributed (at the discretion of the package maintainers), so that user doesn't have to opt-in individually.

Being opt-in will also make it compliant with EU data privacy laws.


> opt-in telemetry doesn't work.

Then don't do telemetry at all.

Google doesn't have a right to data.


Yeah, but they do have right to modify projects that they sponsor as they see fit.


At which point they lose the "right" to claim it's a community project...


Any evidence that telemetry actually works? (i.e makes the program better)


Yes, the simplest example is crashes being reported.

Developers can see that a specific crash is being hit by 1% of their userbase and then check the logs to see what went wrong and where the cash happened. The fix is made the program is indeed better.


You can let users report crashes. You can even prepare the data for them. You can even provide a wizard that automatically opens on crashes to help upload that data. But you need to obtain informed consent. Sending data behind the user's back without ASKING FIRST is not ok. Stop doing it.


If it collects actionable data, yes, of course it works.

Crashes, common failures, UI/UX friction points, avarage usage patterns - all can be used to prioritize work to take care of things that have the biggest impact.


I asked for actual concrete evidence, not "can be used".

Is there an example of a program that was crap, implemented telemetry and then got better afterwards? (and of course controlled for factors that might have improved the program anyway)

I mean since telemetry advocates are so into how useful data is, surely they must have data on whether telemetry itself works?


> opt-in telemetry doesn't work.

Opt-out telemetry won't work when people send false data to the servers.


> Additionally, this ship has long sailed, everybody does opt-out.

This is a meaningless point.


Why does the dev team to optimize a use-case that the user doesn't want optimized?


So it sounds like we can't have telemetry?


There's a reason it can't be opt-in: Google intends collect information about politically sensitive software and give that information to governments who wish to punish developers.


(This comment was originally posted to a different thread that we merged here; that's why it now links to the page it's on.)



Related, with further discussion:

https://news.ycombinator.com/item?id=34709078


To anybody complaining that this should be opt-in: opt-in petty doesn't work. The reason for this is that most people don't care, but they don't care either way. They don't allow it when prompted, nor would they report it when not prompted.


Nope, nope, and more nope. You're not moving the Overton Window any more on me.

In fact it seems there's a clear correlation between the quality of software and how much spyware there is embedded in it. It's often merely another way to justify unpopular changes with "but the data says so".

IMHO if you want to collect any information, it should never be anything but opt-in, a conscious decision.


> IMHO if you want to collect any information, it should never be anything but opt-in, a conscious decision.

Serious (general) question: How do you do that given a non-technical user population? Debian’s opt-in popcon kind of manages to get a little bit of data from a fairly technical one, but nowhere near enough to estimate a low usage frequency, and it’s the only opt-in program I’m aware of that gets anything usable at all. Given that I’m unwilling to implement an opt-out system, I don’t really see a workable approach here at all.


Ask for consent during setup or on first run. Syncthing does this and they get plenty of usable data. It's even public: https://data.syncthing.net/


What I hear you saying here is that people don't do what you want if you give them the choice, so you lean towards not giving them the choice rather than respecting their wishes.

Is my interpretation correct?


>> I’m unwilling to implement an opt-out system

> [Y]ou lean towards not giving them the choice rather than respecting their wishes.

> Is my interpretation correct?

I don’t think it is, no :) Rather, I’m not sure how to sell, to put it crassly, users on a choice when properly investigating or even being confronted with that choice would delay them seeing the dancing bunnies[1], but that would also, if I have any say about it, improve the bunnies in the future.

Does that mean there’s a shade of “I know better” in my problem statement? Of course it does, if I didn’t know better than the average user I’d have no business designing such choices. I don’t think there’s anything wrong about that, better than the average at an activity few practice is not a terribly high bar. Not giving the users a choice or manipulating them into making the one I think is right would absolutely be wrong, though.

Basically, how do I make the user think, how do I give them the appropriate data to do so, and how do I deal with the obvious contradiction of that goal with principles of good design[2]? The potential benefits to the software and (thereby) the users are too much to give up without even asking those questions.

(See nearby comment for extended discussion.)

[1] https://blog.codinghorror.com/the-dancing-bunnies-problem/

[2] https://sensible.com/dont-make-me-think/


Thank you for the thoughtful response. We disagree on much, but I respect your opinion nonetheless.

> Not giving the users a choice or manipulating them into making the one I think is right would absolutely be wrong, though.

I'll pull out just this point, though, to perhaps illustrate how different our worldviews are. I consider opt-out to be a manipulative approach.


> I consider opt-out to be a manipulative approach.

So do I, which is why I wrote I’m unwilling to implement it :) The original (and, to be clear, purely theoretical) point was, opt-out is too manipulative while opt-in is likely useless.

Ah shoot. Did you take that to mean that I’m unwilling to implement an off switch at all? That wasn’t it, sorry for the confusion.


Perhaps we aren't so far apart after all.

The struggle is real. As a developer, more data is obviously desirable and can make development much easier. I just can't think of a way to do telemetry that, if I were a user, I would accept. And I don't want to produce software that I wouldn't personally use.

I just don't know how to have my cake and eat it too.


As a developer your entire purpose is to make decisions for users. "Where should this service live, how should security work, how should I increment their billed service usage, when should I shut down their vm..."

I don't think the issue here is making decisions for users and not giving them a choice. 99.999% of software does not have a flag to change it. The issue seems to be more about the precise nature of this specific feature.


> The issue seems to be more about the precise nature of this specific feature.

Of course. Not really just this specific feature, but any and all features that can violate users privacy or security. In the end, I don't think these are decisions that developers should be making for users, because not all users have the same needs and getting this wrong can do harm.

That's why, for these sorts of things, meaningful user consent is critically important.


> people don't do what you want if you give them the choice, so you lean towards not giving them the choice rather than respecting their wishes

This nicely summarizes a very popular approach to telemetry – and to a variety of user-hostile behaviors. Web sites (for example) seem to have mastered the "fight against user preferences" approach, trying to play video when autoplay is blocked, using javascript modals since pop-up windows are blocked, fighting ad blockers, ignoring "do not track", etc..

If users are given any choice, usually it's a difficult opt-out process, which is more effective precisely because it makes it harder for users to make the choice that you don't want them to make, even if it isn't their actual preference. For an extreme example, see Facebook's (anti) privacy settings. Commonly used dark patterns further amplify user manipulation.


First, we are talking about Development tools, so not non-technical population. Second, if Opt-in is considered difficult for the population, what does it say about the opt-out? Opt-out is always, no exception, more difficult than opt-in.


Seems like you also[1] didn’t read the above the way I intended. I meant that I find explicit opt-out (as opposed to explicit opt-in) manipulative so I don’t want to implement it, not that I oppose having the ability to opt out at all.

The difficulty, though, lies not (entirely) with the default position of the toggle, the difficulty lies with making the user think about the question which is not relevant to their immediate task and which in any case they may not have the theoretical tools or time to evaluate properly. The default position of the toggle (if “off”, as I believe it should be) matters only because an opt-in process means you either confront the user with irrelevant questions on first launch or get essentially no data.

(I called this out as a a “general” question because I meant for it to apply not only to the Go toolchain, but to general-use software like Firefox or niche but non-programmer-oriented software like Audacity.)

The systemwide daemon proposed elsethread[2] would solve this nicely as well, but I have to admit that I’ve dismissed it from my thought process more than once before, because I didn’t think we were going to get one with any reasonable usage on any platform. Now that I’ve seen it put in writing, maybe it does deserve to be considered.

[1] https://news.ycombinator.com/item?id=34716342

[2] https://news.ycombinator.com/item?id=34709836


How many people are installing and using open source software but couldn't understand a pop-up explaining what data is collected and asking if they'd like to submit it? Is the non-technical nature of the user the problem or is it just that when you have an opt-in option most people make the choice to opt-out? That's the thing about respecting users by giving them choice, they get to say no. If they mostly say no, and you don't get enough data, that's the will of your users and therefore not really a problem.


> How many people are installing and using open source software but couldn't understand a pop-up explaining what data is collected and asking if they'd like to submit it?

I’ve taught probability theory using randomized response[1] as an exercise problem, and while people can understand it given time and motivation, it’s not immediately obvious. So I’m not exactly hopeful that a prospective Audacity, Blender, or even Free Pascal user (to take an arbitrary set of examples) would get what I mean if I say “I’m collecting no more than 10 bits of information about you using RAPPOR”[2], and I’m not willing to engage in comforting bullshit such as “all collected data is anonymous”, as I’ve been all too close to situations where the difference between the two might be one between freedom and prison.

> Is the non-technical nature of the user the problem or is it just that when you have an opt-in option most people make the choice to opt-out?

Both, because confirmation dialogs, especially privacy-related ones, have been thoroughly poisoned in users’ minds. But confirmation of obscure actions, however beneficial their consequences, is problematic in general—if I go on the street and ask people if they’d like caffeine in their tea or ascorbic acid in their apples, I expect (but have not checked) that the majority will say no, nevermind that both are normally there and intrinsic to the experience.

(The possibility of meaningful consent from a non-specialist is the subject of much discussion and few good answers in med school, or so I’ve heard.)

Whether the ultimate answer is to grant or deny permission, I’m not sure I can present the question in a way that will actually have it made on the basis of merit and not on “scary permission dialog, better say no” or “yes, yes, just let me through to my dancing bunnies[3]” or “yes, if I say no the installer will just tell me to GTFO”.

(In that respect the “Send crash report to vendor” button is unexpectedly good, because you’re not actually interposing yourself between the user and any prospective bunnies. But personally I don’t like to spend time and effort in order to send “feedback” into an unmarked hole where I’ve no idea if anybody will ever look at it. From that point of view, it is background data collection that’s unexpectedly good.)

And even if, for the purposes of this question, it would be best if people took the time to learn the necessary maths, computing, and operational security to make an informed choice, in reality I’m not sure that’s the best thing they can spend their life on.

So it may be the answer is that you simply can’t do telemetry well for the social reason that users won’t ever end up making an informed choice, or that the well has been poisoned so thoroughly that the rational choice is to reject everything. It’s just that I know that it’s basically possible in a technical sense, so I don’t want to give up that easily.

[1] https://en.wikipedia.org/wiki/Randomized_response

[2] https://blog.cryptographyengineering.com/2016/06/15/what-is-...

[3] https://blog.codinghorror.com/the-dancing-bunnies-problem/


Well, its not a humble opinion but a very strong one which is fine since you want certain thing in certain way and nothing else will do.


So how would you design a well-working system to make data-driven decisions rather than guesses? Most collection methods are notoriously bad, but partial collection is also bad since we now have to somehow put a weighing factor on presumed absent data, which turns choices into guesses again.

I think this is a really hard problem, and simply trying to guess in the dark as to what people want isn't the smartest way to go about finding the path forward / priorities / improvements / defects.

It also isn't something that we had in the past, because when we used to buy the IDE, buy the compiler, and then build software, sell that software, and let everyone know what cool tools we used, you'd have sales figures that would inform the creators how the tools were used. Now, the tools are available to everyone, anonymously, and everyone has an opinion on how well it works for them, but doesn't have the time to write a well-written report every time a release happens.


The whole idea of "data-driven decisions" is the problem.

It's an excuse to not respect user's choices and absolve oneself of the blame, or regress to a lowest-common-denominator, "because the data says so".

"Not everything that counts can be counted, and not everything that can be counted, counts."


Telemetry or no, a certain amount of guessing is inescapable.


> When you hear the word telemetry, if you’re like me, you may have a visceral negative reaction to a mental image of intrusive, detailed traces of your every keystroke and mouse click headed back to the developers of the software you’re using.

But that's not my only objection to telemetry. Equally important to me is that so many bad decisions are justified based on telemetry. It's very easy to misunderstand the data, because telemetry leaves out so much, but developers often treat it as if it's giving a complete picture.

As an example, I have seen developers drop really important functionality on the basis that it is rarely used. While that was true, it was also true that when those rare times happen, that functionality was absolutely critical to have.


Or they use the data as an accelerant: move rarely used features to places where they're even less discoverable, making them even less used, and then remove them altogether. The justification then becomes a self-fulfilling prophecy.


Very much against this. Sure, it sounds naive enough, and can give reasons why. But I have 3,436 items in /usr/bin. What if -every- one of these started doing their own telemetry, their own envvars, etc?

If we have to deal with telemetry, then I'd instead hope that there can exist a single telemetry systemwide interface. Not sure how that would be designed or implemented, but would be better than everyone doing their own bespoke thing. Plus easier for me to disable them all in one go.


> "What if -every- one of these started doing their own telemetry, their own envvars, etc?"

What bad thing are you suggesting would happen if they did? Your computer and internet connection can't handle four thousand strings or four thousand HTTP POSTS, or four MB more disk space of telemetry libraries? I bet it can. This isn't a technical problem, it's a control and consent problem.


For one thing a classic way of downplaying metrics, "we're only logging X bits of information", turns into 3000*X. And here X is huge already.

If I get access to detailed metrics from go, gcc, make, tar, gzip, bash, python... of course I can tell which programs you have been running (and frankly, I'm disgusted)


I wouldn't be so sure it's not also a technical problem, the limit of execve can be as low as 128kb which for 4,000 strings gives a maximum of 32 characters per NAME=VALUE environment value


Free disk space can be as low as zero, but we don't blame the tool makers for adding an extra 100Kb or 20MB, we blame the computer owner for not having enough disk space to install the thing they chose.

Wrapper scripts for every utility to do

    UTIL_TELEMETRY_OPT_OUT=1 util ...
so they don't need to be set all at once.


> we don't blame the tool makers for adding an extra 100Kb or 20MB

Honestly, I do. Code bloat is a real thing.


Honestly, I do too, but the world doesn't. If you said "I have 3,500 binaries on my system, imagine if they ALL added 1MB" the reply would be "3.5GB is about twenty cents of NVME storage space" not "oh my, you're right that would be intolerable".


maybe the problem is why do you need 3436 binaries there?


I dunno. It sure makes sense to me to collect telemetry from free software installations, but I feel that having every platform or even piece of software to do it on its own with opt-out will inevitably lead to people being overwhelmed and angry.

I would, personally, prefer a single non-profit service that would list publicly what is being collected and publish the results as open data for anyone to use. Applications (at least on Linux) would not submit their reports directly, but would use a local relay service that could be turned off completely or that could filter what reports to send to the server and what to /dev/null.

Distributions and other software stores would then make it mandatory for software to use this relay and either patch out any other telemetry from their packages or straight out forbid those that would not comply.


I think the issue of telemetry is fundamentally a human issue of incentives and trust. The system you describe is wise because it recognizes this and attempts to address it.

The difficulty with telemetry is that even if we design the perfect, privacy-preserving system to begin with, once the pattern of having a network port open is established, there's nothing to prevent us (humans) from changing our policies about what we're allowed to push/pull over that port.

In real-world analogues for these kinds of thorny policy problems, we have centralized arbiters to solve these problems. That might be a fruitful course of research for people interested in this problem to explore.

Unfortunately, even though this problem has software as its medium, it is a problem that cannot be solved by clever software alone, despite any appearances to the contrary.


> The difficulty with telemetry is that even if we design the perfect, privacy-preserving system

The other difficult is what you mentioned: trust. Even if a piece of software really does telemetry in a perfect, privacy-preserving way -- as a user, I have to take the developer's word for that in the end. That's a hard hurdle to pass, because that trust has been violated so much in the past that nobody gets the benefit of the doubt anymore.

> Unfortunately, even though this problem has software as its medium, it is a problem that cannot be solved by clever software alone

I agree entirely. At the heart of it, this is not a technological problem. It's a human one.


I am all for transparency and limited intrusiveness of telemetry.

But in practical terms, the problem with this approach -- if I'm understanding it correctly -- is that it has no way to detect and reject outliers, and therefore the data can't be validated in any way. It only makes sense if all your clients are 100% trustworthy.

Let's say you want to know whether to keep supporting ARMv5, and your data says 10% of users are using it. There's no way to tell whether that's accurate, or if you have 0.01% of die-hard users who modified their telemetry code to report 1000x as frequently as they're supposed to. Even if you suspect this is happening (and you might not), there's no way to identify the culprit and filter out their data without tracking personal identifiers such as IP addresses.

So even if most of the time the telemetry data is valid, over time it will trend toward uselessness, because it can be endlessly second-guessed unless it confirms a decision you wanted to make anyway.


On-by-default makes me question whether rsc's judgement has been compromised, which leads me to question continuing to use the language. A strange miss for him.


Off-by-default in a scale likely means that there is no telemetry at all. I would not cancel a guy or programming language based on just suggesting that. He has given a lot of though for that if you read the blog posts.


If a take a dollar from everyone but it's opt out, that's still theft.

If I make it opt in, nobody is going to give me the dollar, but that doesn't make opt out morally justifiable.


You are comparing apples to oranges. Telemetry is a curse word these days, but you should still read his posts.


I would expect nothing less of him than to give a topic a great deal of thought and devise a principled and rational solution.

I am, however, reminded of a quote from Peter Drucker: “There is nothing quite so useless as doing with great efficiency something that should not be done at all.”

I’m not picking nits regarding the overall proposal. I’m questioning the judgement that concluded/rationalized “on by default” is the right thing to do.

Also not “cancelling” at the moment, either, but definitely reassessing my future language choices and taking a more critical appraisal of Go’s direction/choices. This isn’t really an isolated incident, and trust has accumulated some dents.


> Although the report would not include any identifiers, the TCP connection uploading the report would expose the system’s public IP address to the server if a proxy is not being used. This IP address would not be associated with the uploaded reports in any way.

Any fully transparent data collection is going to have to include IP addresses and timestamps. Even if the IP isn't being used for debugging, the software still phones home and the IP is still being collected and logged when it otherwise wouldn't be. Either when uploading the report or when downloading the “collection configuration”.

Honestly, assuming full transparency, I'm not opposed to the concept. I question how much telemetry is actually necessary, but I'm certain there will be times when it's nice to have. It'd also be interesting to see how it would go when for once people can see exactly what is collected, when, and from where.

I'm not sure that Google is the best place to showcase such a concept though. I'm sure there are a lot of people who have no problem with handing more data over to Google, but Google has abused the public's good will for the sake of data collection many times, and it's sure to put off some of the people who aren't already completely disgusted by the idea of their favorite open source projects collecting telemetry.


> Any fully transparent data collection is going to have to include IP addresses and timestamps. Even if the IP isn't being used for debugging, the software still phones home and the IP is still being collected and logged when it otherwise wouldn't be. Either when uploading the report or when downloading the “collection configuration”.

How do you verifiably not collect users’ IP addresses when receiving data from them? The verifiable part is the problem, of course you can (and should) just not log the addresses, but then the users can only trust you (and hope you or your uplink haven’t received any legal orders to the contrary). The only approach I can think of would be a Tor hidden service, but while it would technically work, as far as not exposing your users to scrutiny it actually sounds worse.


The only option is to have a proxy sit in the middle between the uploader and the server. You mentioned Tor but it doesn't have to be Tor, just some proxy most users would trust not to collude with the server and that doesn't itself derive benefit from seeing the IP addresses. If there were a different entity that could be relied upon to run servers doing this and were highly trusted by users, I'd be interested to use it. Failing that, the usual answer for an enterprise or company is to run their own HTTP proxy. The design explicitly supports that.


> their favorite open source projects collecting telemetry.

Their favorite Google open source project. This is specially important for project which can't realistically exist without main sponsor / benefactor. It also help people to pay whatever little/high cost in term of conscience when they take part or consume something willingly but do not approve of makers.


This is not okay. The only ethical way to do telemetry is opt-in. If not enough people are opting in, you need to incentivize them to -- most simply by just paying them for their data. After all, telemetry is "valuable", isn't it? But if you can't figure out how to convince people to opt-in, then tough luck, sucks to be you.

Opt-in or GTFO, Google. I'll be patching this out of the Alpine package for Go the day it ships.


You and I may not agree on a lot, but I sure agree with you on this one.


This week one of my tasks is to figure out how to neutralize some telemetry in one of our apps. We had no idea it was there, we do not want to be sending data. Last week, the parent company decided they didn't want to maintain the telemetry server any longer, and got rid of it.

Now the tool has generated thousands of log messages that it can't phone home.

And so it must be silenced, since it is cluttering up the logs, generating false alerts, etc.

Please, no more.


The existence of telemetry is the main reasons why I avoid using new software anymore. Really, opt-in, opt-out, it doesn't matter. I can't trust that any of those mechanisms actually work, that if I opt out, an update won't reenable it, or that the data collected is actually limited and anonymized.


If there is any virtue to collecting telemetry, make it opt-in. Any developer convinced of this being useful will gladly enable it. But making it opt-out is just nefarious, because most users will not be aware of it.


This is naive, no one ever turn telemetry on if it's turned off by default, that's the reason why it's on by default.


> no one ever turn telemetry on if it's turned off by default

If nobody would voluntarily do it, why do you think it's okay to do it at all? By your very admission, nobody wants this. Because if they did, they'd turn it on!


Still, opt-out is just inacceptable. At least with a mechanism which can easily fail, like setting an environment variable. This basically forces you to wrap the go tool in a script which ensures the environment variable to be set.

As this seem to cache the results, another option is to fiddle with the cache to report bogus information.


You can set env variable for the go toolchain with a command such as: go env -w TELEMETRY=off which will be written to disk and use by the go cli.


Where will it be written to and how is it guaranteed to be picked up by any further invocations of the tools?


What if a new version of go uses TELEMETRY_ENABLED? Do you read all the changelogs, always?


Yeah, given go's history with breaking changes (and the habit of OKR-chasing managers and Senior Staff to make Number Go Up to look good on the promo packet), I definitely would not trust any opt-out mechanism to not receive operationally-significant changes in the next five years.


As far as I known go never changed any flags since release ( 2012 ), is it good enough for you?


"This is naïve, no one would ever allow me into their homes if I asked first, and how else would I find out what diseases they have?"


Setting aside the question if on by default telemetry is unethical in general, I personally think it is, my point in this comment is that in the context of open source it is impossible for it to be because:

The whole point of open source is the security of the rights and freedoms of the users, and in case of a conflict with the convenience of the developers, the user rights take priority EVERY TIME. If you're not ok with this, you should not write open source software. If nobody opts in to your telemetry scheme if it were the default to choose, too bad, you're just gonna have to live with it and respect user choice no matter how inconvenient or how much better the alternative would be for everyone. If you fail to grasp this very basic thing you will be better served working on proprietary products instead. OSS is not a product you own, it's a shared resource you are in charge of stewarding and the ethical burden is much higher because of that. I checked, Go uses a permissive license, Google is more than welcome to run a proprietary fork with telemetry built in. Keep that out of open source.


Imagine if GNU started adding telemetry to their compiler toolchain...

If that sounds fucking stupid, which it does, then so does this.


This is perhaps unintentionally amusing:

> To be clear, I am only suggesting that the instrumentation be added to the Go command-line tools written and distributed by the Go team, such as the go command, the Go compiler, gopls, and govulncheck. I am not suggesting that instrumentation be added by the Go compiler to all Go programs in the world: that’s clearly inappropriate."

Well that dispels any lingering thoughts I might have had about ever using golang for anything (not many to be sure). Someone feels the need to assure everyone that they won't be stuffing telemetry code into every binary their compiler produces? Google just wants all the data about everyone everywhere all the time...

https://www.komando.com/security-privacy/ways-google-invades...


If they add "telemetry" my response would not be to set an environment variable, but to uninstall golang. I used it a few years ago, both personally and in a work setting, but I'll do so no more in the future. Just my opinion.


This is well done. It only exposes counters, and rather then pushing data up, the telemetry server must know the names of what it can ask for. No wildcards.


I hope this proposal is defeated and they don't implement this. I don't buy the premise that the benefit is worth the price. I think CLI tools like the ones in the Go Toolchain and their usage patterns are fairly well understood by this point. I'm sick and tired of every piece of software I interact with phoning home.

That said, as long as they give me reasonable means to configure the software the way I want, it's probably not a deal-breaker for me. In other words, I will just set the $ENV_VAR_WHATEVER to turn this off, and that's that.


Honestly, this may be unpopular with hacker news, but just add your own telemetry. If people don't like it they can turn it off, and telemetry is essential for a good product.

Do let people turn it off though please.


> If people don't like it they can turn it off

If I, perchance, encounter software I use phoning home without my explicit permission it's done on my systems. Period.


That is fine, but in this case telemetry trades you (and other more hardline users) as a user for all the extra users you gain from instant crash reports, quick feedback, and generally better productivity.

I would never personally make that trade-off, and would always put (disableable) telemetry in my projects.


Well according to our telemetry 0% of users turn it off so it seems pretty popular.

But more realistically what you gain in privacy you give up in having your voice heard by the devs. The decisions about the future of the product/project will be driven by the data, specifically the data from the kind of people who leave telemetry on.


See, I _am_ a dev. I run telemetry on my infrastructure, I analyse it and fix what's broken, and if necessary, try and get upstream fixed. Also, I'm not opposed to telemetry in general, but if a switch like this is turned on by default, trust is broken for good.

Software which does any type of computing without it's user's informed consent is classifiable as malware, mind you.


If 0% of your users disable it, that kind of screams there's something wrong with your opt-out mechanism. Is it broken? Hidden? Difficult to do?

I mean, with any group of people, there will always be a percentage that will disable it. If the telemetry is popular, that percentage might be very small, but it would be non-zero.


I think you missed the joke.


Oh! That's what the whooshing sound over my head was!


Oh shit, I missed it as well.


Well, those that turned it off are not phoning home, so the rest will be 100% ;)


> telemetry is essential for a good product

Up until now, you've had to make these design decisions on your own, relying only on perplexing intangibilities like 'taste' and 'intuition'.


Those design decisions were never made in a vacuum. They relied on telemetry (or, less scarily, user testing, user feedback, user research, etc) to figure out what works best. As a reminder, our intuition doesn't come from nowhere, rather centuries of survival and expectations. If you do not know what these expectations are, and if you do not know how your users interact with your product based on these expectations, you cannot make a good product. Certainly, you can make a product that appeals only to you, but how many of yous exist?

The design of a teapot is a great one. It didn't magically appear with handles and a spout and a place to hold leaves, but after years of refining based on usage. As shocking as that is, tea wasn't even discovered on purpose, let alone having a specific vessel for it right out the gate.

So yes, telemetry is essential. Taste is personal.


I think it's somewhat silly to fly blind on the assumption that your taste is better than any real world observations you can make.

Especially if you haven't had the chance to develop an intuition yet and are new to the field. Without data to correct you, how do you get better?


> telemetry is essential for a good product

No, it isn't, and the idea that is is toxic.


Essential is probably not right, this is true.

But I'm confident I'll make a better product with telemetry vs without.


I think maybe even making telemetry mandatory with an open license, and customizable with a support license might be a sustainable way to run an open source company.

For many open source project (anything with an attached business model), either telemetry or tight communication around usage patterns will be necessary to inform development. The latter of those two options consumes business resources.


Mandatory with an open license might shoot yourself in the foot when people fork your code.

For me, the harshest you can go is to have telemetry and not prompt for the setting on start with the user opted in by default. Ideally, you ask on first load, and that's what I'd probably aim for.

You can't ever try to force people to do anything in open source. They'll run right by you and make it do what they want it to do with or without you.

I'd even imagine paying customers are broadly more happy with telemetry than open source ones. And their needs more important anyways.


> making telemetry mandatory with an open license

If it's mandatory to run the code that does telemetrics, it's not a very open license.


Just because Linux is open source doesn't mean you can't have both Fedora and Red Hat (an enterprise version built on the same codebase)

I don't think any closed source goes into Red Hat, it's just the patch delivery pipelines, package repositories, etc that require a license. And support of course.

Same with any distributed system whose core contributors could gain insight from telemetry. All the components are open source, but they can package it all up and make it available under different terms.

If there's a community version and an enterprise version, you can then make telemetry required in the community version. If people don't like it, they can pull the package apart and put it back together however they want, or they can pay for an enterprise license.


You can make submitting telemetrics a condition of some other agreement, such as copyright license on the Red Hat name, or a B2B support contract. That, however, is pretty far removed from what's discussed here; if the software license itself makes telemetrics "mandatory", then it's no longer an open license.


I don't think I made myself clear.

The company ships one product which is a community edition of a bundle of open source software. That version has telemetry enabled and can't be disabled. Users who want to patch the code manually can of course disable it, just like they can disable the Ad lens in Ubuntu if they want to build it themselves, and those users will be off the beaten path and likely to run into issues that aren't easy to find paved solutions for.

They also offer an enterprise edition with a support license, on which telemetry is enabled by default, but can also be disabled.


> bundle of open source software. That version has telemetry enabled and can't be disabled.

I'd recommend saying "and cannot be configured off without code changes", or something. Of course it can be disabled, if it's open source.

Go compiler without telemetrics opt-out would fragment the community even harder than Go compiler with telemetrics default on. While your scenario is, of course, possible, it's not very relevant to the Go compiler.

As for your original "might be a sustainable way to run an open source company", if the primary feature one gets buy paying is "easy to turn off telemetrics", that just doesn't sound like enough value.


Well I mean. In a lot of the popular Linux distributions you kind of get what the distro comes with.

Can you configure openSUSE to use apt instead of zypper? I mean, sure, probably, it's open source.

Is it going to be straightforward? I don't know. I suspect it's going to be harder than it's worth, and you might need to rebuild the operating system image to change the directory layout to work with apt's assumptions or something.

So in practice, even though these things are all open source, people who bundle complex software lay down the paths that 99.9% of people will take.

I wasn't proposing "disabling telemetry" as the primary business proposition.

Instead I was suggesting that open-source companies spend a lot of time dealing with issues from people who use the open-source software in unanticipated ways. If "the beaten path" for using the software without support includes telemetry, they get value from that.

If people want to use the software without telemetry, then they're paying for a support license, so the issues they run into which the supporting company has no telemetric insight into are at least better aligned with their support resource allocation.


Distros are roughly defined by their package managers. Replacing one is a huge amount of effort, compared to adding `false &&` to a single if.


My take is that distros are collections of opinions, some with exposed customization allowed.

The package manager is part of it, sure.

The filesystmem is another big part of it (though it's possible most are following XDG now https://specifications.freedesktop.org/basedir-spec/basedir-... )

Init system used to be a big point of deleniation, but I think systemd is the standard now (for better or worse)

The filesystem and networking stack still have some variability.

There's still default applications, kernel modules, a gui app installer, the desktop, included drivers, and many more things that go into a distro. If you go with a distro that uses KDE and you switch to GNOME for example, you might lose a lot of GUI support for customization, might have to build addon packages yourself, etc.

It's all open source at the end of the day, but that optionality leads to a less streamlined user experience and lack of guardrails as soon as you step off the beaten path, than you would get with something like OSX


To be more explicit: If your license does not let users patch out your telemetry code it is not an open source license at all.


From my response to your sibling:

I don't think I made myself clear.

The company ships one product which is a community edition of a bundle of open source software. That version has telemetry enabled and can't be disabled. Users who want to patch the code manually can of course disable it, just like they can disable the Ad lens in Ubuntu if they want to build it themselves, and those users will be off the beaten path and likely to run into issues that aren't easy to find paved solutions for.S

They also offer an enterprise edition with a support license, on which telemetry is enabled by default, but can also be disabled.


If you do, and if people find out about it, they'll send you false data.


This is just part 1, but all articles in the series have been published: https://research.swtch.com/telemetry


This kind of push is only going to make people want to disable telemetry even more. Privacy is sacrosanct and should be accepted as the norm, not something we need to opt-in.

Go already has some form of telemetry built-in (by way of a google proxy, I suppose) and adding an official one that is opt-out is just going to make me refuse to ever work with it.

Telemetry should always be opt-in, and only opt-in. We have so much issues with telemetry, privacy, and such because the big players and corporations insists opt-out is better (maybe you get more data, but you violate end-users trust as well). Is that really worth it?

There is blow-back and distrust in the industry as a result and it's only going to get worse the more you try to push for opt-out telemetry (or just assuming telemetry should be the default).


Have you read the articles? How is this in any way violating privacy?


Because it's impossible to get telemetry from any source without violating some aspect of the users' privacy.


So you see this as just the same, from a privacy perspective, as the way that the Go tool already dials out to the Go proxy by default? That is, if you're OK with that (I'd assume not, but it is at least existing functionality), you'd see the telemetry proposal as similar?

In other words, I think you object to the Go tool already, so this is really no different?


Telemetry in open source exists for a long time. Debian has the popcon package that can be installed and reports weekly usage of the software packages. The telemetry data are published in the open. The Debian popcon FAQ could be used as guideline for other telemetry needs. https://popcon.debian.org/


It does sound quite similar. But in addition to the crucial difference of opt-in vs. opt-out there's also an interesting contrast in how it's framed.

Debian talks about what you, the user, can do: help out, participate and vote. If you choose to do so.

The Go team talks about what the developers and their software will do to the user's machine, but the user is completely passive in their description. This is also reflected in the term "telemetry" itself: the software is not a tool in the user's hands but rather a remote-controlled probe in the user's habitat that pokes at the user to elicit interesting responses.


popcon should not be used as an example of how to do telemetry, as it is far worse for privacy than the Go proposal:

1. Sends names of private packages to the server, and publishes them.

2. Sends a unique identifier (a UUID stored in /etc/popularity-contest.conf) to the server, which is stored.

3. Doesn't use sampling, so if you use popcon you will be submitting a report once a week (Go's telemetry would average just one report a year).

4. Submits over plaintext protocols by default.

popcon may be opt-in (in the sense that the prompt during installation has "No" selected by default) but the prompt doesn't disclose the large privacy risks.

People are not appreciating the thought that has gone into the Go proposal to minimize the collection of private data, either intentionally or by accident, such as the client-enforced requirement that the the names of counters be published in a tamper-proof log so anyone can verify that, for example, no private package names are being disclosed. Everyone is focusing on opt-in vs out-out, but to me these other details are far more important.


> Debian has the popcon package that can be installed and reports weekly usage of the software packages.

Right. That's fully opt-in, to the point the package isn't even installed by default, which is the only moral way to do this.


I don’t really see why a classic community-driven open source project would care about what non-contributing users are doing with the software. In that case, helpful users come with built-in telemetry (pull requests).

But I guess this could be helpful corporatized read-only repo projects, or other groups that aren’t sure if they are building a community or a customer base.


Because pull requests aren't the only reason you might do a community-driven open source project. Perhaps you're just altruistic, or want to populatize some technology etc.


My new TV wouldn't work unless I agreed to it recoding and uploading those recordings to it's servers which may be temporary stored while they are transcribing the audio to text for more permanent storage.

My TV is forcing me into a employment agreement where I generated data to train their models or otherwise 'improve service'.

Data is so valuable companies are risking a huge PR backlash. Data collection is the business model and I assume the same ethos will make its away into open source.


I wish there was a standard way of disabling telemetry across software dependencies.

While I leave it turned on for personal projects, several projects at work require disabling it.

I have spent hours auditing through transitive dependencies to turn it off. It should not be this painful.


No reason they couldn't all aim to respect a `TELEMETRY=false` env var, akin to the web's 'do not track' request.


Does anyone really respect DNT though?


A firewall is probably your best bet. Don't allow network traffic originating from anything other than a short whitelist.


There's a lot of strong reactions here, which I don't think are generally unfounded. Telemetry has certainly been misused and will continue to be, but it can also be an invaluable tool for product development.

For example, we had a CLI with many commands and flags, some of which were costly to maintain. By adding analytics, we were able to see that literally no one was using certain commands, and we could safely remove them without messing up workflows.

On each CLI invocation, we collected:

  - hash of user ID
  - which command is run
  - which flags were included
  - operating system (not version information, just mac/pc/linux)
This data wasn't used for marketing, had no identifiable information, and was diasablable (but opt-out). You could also log exactly what was sent to the server, so you could see.

We could have collected some of this via occasional surveys, but the data would have been less useful and less accurate.

I didn't look into the details of what Go is proposing to collect, but treating all telemetry of any kind as a boogeyman isn't productive; just have to do it the right way.


> Telemetry has certainly been misused and will continue to be, but it can also be an invaluable tool for product development.

This perfectly exemplifies the whole problem. Google views Go as a product which Google is responsible for.

Go is a programming language. Go is critical infrastructure. Go is not a product. Viewing it as such is a fundamental misunderstanding of what Go is to its users.

This will practically implement my choice of programming language going forward. My question will no longer be, "is Go the right choice for this?", but rather, "is a Google product the right choice for this?". The answer is often yes to the former question, and no to the latter question.


I guess that begs the question - if Go was an independent project (totally unattached from google) and had the exact same telemetry plans, how would you feel?


Better, because most independent projects aren't the world's biggest advertising company whose business model entirely revolves around invading people's privacy as much as possible.

But that doesn't mean it would be good. The fundamental misunderstanding of Go's place as infrastructure rather than a product remains. And even though I acknowledge that spying on your users provides useful insight into their behavior, that doesn't mean spying on users is good.


The information and rate of upload as described seem reasonable.

Is the fear from most people that it will be a foot in the door? And a way for Google to collect extra overtime?

Note: I think Go is a regressive technology. That would have been great in 1970s. Not today. But that's a different topic. My point is that I tend to be biased very negatively against Go. But here I don't see something wrong.


Interested as to why you believe go is regressive, could you expand on that?


You can find plenty of well articulated rant online. The gist is Go ignores all research in programming languages. Is very hard to use properly. It is really hard to produce safe APIs. You are encouraged to pop threads (preemptive user threads) all over the place. And may the god of lost state saves you from the hell of data races. Also what is this slice/vector type thing? Don't forget to check `err`, nope not this one, the other one.


> The gist is Go ignores all research in programming languages.

This is an assumption and it's IMHO false. Go creators watched all the research, probably closely, lived with its results and were not happy with what they get. Their experience reminded them that less features in a PL make it arguably less expressive, sure, but in the same time it makes it faster to learn and master, easier to read and debug. Writing tools is simpler if a parser for your language can be hacked over the weekend from scratch, etc.

At the end of the day its a matter of personal preferences, but it has, IMO, nothing to do with "ignoring all research in programming languages", quite the opposite.


You start with a dizizlgy large amount states (RAM, storage, network, threads...). The goal of programming languages is to give you a way to reduce the vast number of states as much as possible. Without removing the ones you need to do the work. Each layer of abstraction reducing what is possible, without impeding the work that ultimately needs to be performed.

In my humble opinion, you should learn a bit more about, easy vs simple, complicated vs complex.


Opaque telemetry can also be a barrier to adoption: my users’ IP addresses may legally be PII that I cannot disclose.


How transparent is Scarf's product adoption metrics for OSS projects? https://about.scarf.sh/

I follow them on Twitter but haven't looked much into it other than reading their documentation, which makes me think that most of their telemetry is done at the point of the package distribution system: https://about.scarf.sh/package-sdks


Is this even up for debate, or is this post more of a FYI?


No it's not up for debate at all. Much like when Microsoft did this with .Net core, the Github thread is clearly a misguided post by RSC expecting the community to conform or support it. They didn't so now it's a damage control exercise. It will happen.

Any corporate controlled project on this scale is prone to this failure mode.


From a legal point of view, how companies will react to this, be it default-on or default-off?

Some companies are using it for internal use which I'm sure all of us know cases with a number NDA-ed projects from third-parties or outsourced companies that collaborate on the matter.

So, who is going to sue whom here when the one party will disable OR has already disabled the telemetry and the other will have it on by default, for whatever reason?


While this articles point out all the right explanation on why telemetry is needed and how it can be made little more transparent by Go toolchain acting as intermediary and publishing the telemetry data publicly, it fails to point out the disadvantages/risks of such system. At the core, the issue is about trust and the user not having any incentive.


I haven't worked with golang in some time. How do golang devs generally obtain the compiler?

If you're getting it from distro repos, it should be straightforward to convince the distro package maintainer to disable the telemetry / patch it out.

Or is it a nvm/pyenv/rustup situation where you prefer to use bespoke toolchain managers to download upstream's compilers?


I mainly get it straight from golang.org, but this will be able to be disabled via environment variable just like the modules proxy stuff was. https://research.swtch.com/telemetry-design#opt-out


Running the command in that just shows me a message of "go: unknown go command variable GOTELEMETRY"


Because this proposal has not yet been implemented.


So is there a way to disable this ahead of time? Or do I have to install the version with telemetry first?


Yes, you can set an environment variable at any point in time.


You can also

   echo GOTELEMETRY=off >> $(go env GOENV)


There are tons of cases where the person installing go won't know that telemetry is enabled otherwise. For example, let's say you're at a bootcamp and the go installation instructions from your teacher don't mention telemetry -- how would the person know to disable telemetry? My concerns are around nation state actors, domestic abuse, journalist privacy, lawyer confidentiality and fully believe that this sort of telemetry can and will be abused in someway somehow, eventually, and probably in some obscure fashion.

Would be nice if this system threw something to stderr at runtime every. single. time. unless the message was explicitly disabled. Something like:

  Go telemetrics are enabled! We are collecting {json:object} from your machine for X purposes. If you would like to opt out, run "echo GOTELEMETRY=off >> $(go env GOENV)". To disable this message, "echo GOTELEMETRYWARNING=off >> $(go env GOENV)"


If you run your build inside of a docker container, wouldn't it be enabled by default as well? Docker containers do not inherit the environment from the host that I know of.


From the description of the implementation, it would only send telemetry data after seven days of being on. So, yes, as long as the docker container is up for that long. I didn't see anything that mentions whether it would communicate to any telemetry server on initialization.


It depends on which compiler you want to use, but precompiled binaries include the gc compiler by default. If you want to compile from source yourself, you can use gc or gccgo assuming you already have the go toolchain, otherwise you would need to bootstrap from an existing binary.


This is a good plan, very simple and clear, and I like the list of system properties at the end. The solution is pretty tailored for the Go toolchain, which is a good strategy that has worked for them in the past.

A more general purpose metrics tool I'm watching closely is Divvi Up https://divviup.org/, a research project by ISRG, the same org that runs LetsEncrypt. The basic idea is to divide up each metric into two parts and publish each part to separate collection servers (one run by you and the other by divviup). Then the servers separately aggregate their half and combine the results, the idea being that each half is useless on it's own but when combined it's still useful.

I wouldn't suggest it for this application, but for the majority of typical apps it would be a vast improvement to privacy compared to the status quo.


  The Go team at Google would run a collection server. Each week, with 10% probability (averaging ~5 times per year) the user’s Go installation would download a “collection configuration” to find out which counter values are of interest to the server and at what sample rate.
If there's interest to use config files to determine how telemetry is done, why can't similar be done about turning telemetry off? I don't want to deal with environment variables (for a gazillion reasons) and would prefer to just use a config file. Especially when it comes to sending arbitrary information from my system to another arbitrary host.

It's so strange to me that the configuration of telemetry has been escalated to uses-configs status while opting out hasn't. Really feels like opting out is an after thought.


As designed, the system allows an opt-out _either_ by setting GOTELEMETRY=off in your environment or by running 'go env -w GOTELEMETRY=off' which writes a config file. (Specifically the one reported by 'go env GOENV'.) If you prefer to edit the config file directly, you are of course welcome to do that.


That's fair, but doesn't take away from the deeply, deeply exhausting need to do this sort of niche configuration thing for everything out there, let alone even recognizing that it's needed. Never mind the unduly need to read changelogs and blog posts to make sure things haven't changed since I last used things.

Have you considered using a more-generic opt-out environment variable that's not go specific? For example "USER_PREF_NO_TELEMETRY=true" would have the same effect, or would ensure that GOTELEMETRY=off is set. I have no idea if anything like that exists right now in other projects, but if it's not, then go is large enough and embedded across enough systems that it could be a good place to start.


Sure, and have to remember to do that every single time you mint a new Docker image for use in CI, every single time you spin up a new Cloud workstation, every single time you get a new physical workstation, every single time you have to run a job in a brand new environment, every single time you... etc, etc, etc.

Thing that work inside the Googleplex often don't work for the rest of us. Not every company is a multi-billion-dollar behemoth that can afford to spend hundreds of millions of dollars annually to work exclusively on internal developer-tooling.


This will collect 0 telemetry from CI builds, so some data will need to be taken with a big grain of salt. I don't have data to prove it, but I would bet most cross-compile happen in CI and not on the dev laptop.


This is really slimy, Google swung and missed and let Go of the bat here.


We need solutions in this space for open source projects, I've been monitoring https://divviup.org as an option too!


Sometimes the post sorting algorithm produces interesting results

  53. Transparent telemetry for open-source projects (swtch.com)
      224 points by trulyrandom 1 day ago | flag | hide | 265 comments
  54. Windows 11: a spyware machine out of users' control (techspot.com)
      419 points by jlpcsl 19 hours ago | flag | hide | 292 comments


I'm usually against telemetry but not only is the approach here somewhat reasonable, I think I actually trust Google more than, say, homebrew to not do something egregious with the data.

Google is at least as broadly compliant as one can be with various standards (of questionable value, natch) but is also on the hook socially and perhaps legally if they fuck this up.


Oh, hell no.


The only way I'd consider this, is if the telemetry my app generated is available to me or can be rerouted to another target. If so, please add this ASAP. Otherwise, I'll stick with my own observability stack.


how long until ads?


> That's why opting out is just an environment variable (GOTELEMETRY=off) or a single command (go env -w GOTELEMETRY=off)


I have set DOTNET_CLI_TELEMETRY_OPTOUT=1 as an environment variable in my .profile file. What should I do for golang?


> To opt out, users would set GOTELEMETRY=off in their environment or run a simple command like go env -w GOTELEMETRY=off; The first telemetry report is not sent until at least one week after installation, giving ample time to opt out. Opting out stops all collection and reporting: no “opt out” event is sent. It is simply impossible to see systems that install Go and then opt out in the next seven days.

Source: https://research.swtch.com/telemetry-intro


"The system is on by default, but opting out is easy, effective, and persistent."


This is really well considered way to telemetry. I wish all telemetries are like this.


The only moral response is to send false data to the servers.


There's a lot of confusion in these comments about opt-out vs opt-in. The debate isn't settled, but a lot of the issues raised here have been addressed. Reposting Russ' comment:

>Longer answer about opt-out generally, copied from mail I sent to golang-dev.

> I wrote a little about this at https://research.swtch.com/telemetry-design#opt-out. Just to quote the beginning:

“An explicit goal of this design is to build a system that is reasonable to have enabled by default, for two reasons. First, the vast majority of users do not change any default settings. In systems that have collection off by default, opt-in rates tend to be very low, skewing the results toward power users who understand the system well. Second, the existence of an opt-in checkbox is in my opinion too often used as justification for collecting far more data than is necessary. Aiming for an opt-out system with as few reasons as possible to opt out led to this minimal design instead. Also, because the design collects a fixed number of samples, more systems being opted in means collecting less from any given system, reducing the privacy impact to each individual system.”

> To elaborate, one of the core things I believe about designing a system like Go is that it needs to ship with the right defaults, rather than require users to reconfigure the defaults to get best practices for using that system. For example, Go ships with use of the Go module mirror (proxy.golang.org) enabled by default, so that users get more reliable builds out of the box. Similarly, Go ships with the use of the checksum database also enabled by default, so that users get verified module downloads out of the box. We know that most users don't want to and probably won't spend time reconfiguring the system: they trust us to set it up right instead. Of course, that implies a responsibility to actually look out for users' best interests, and we take that very seriously. There are important privacy concerns about the module mirror and about the checksum database, despite their clear benefits, so we designed those systems to address as many of those concerns as possible. Among the decisions we made to improve privacy there: (1) GOPROXY can proxy both the module mirror and the checksum database, (2) we published a very clear privacy policy (proxy.golang.org/privacy), (3) we introduced the concept of a tiled transparency log to keep log fetches from exposing a potential tracking signal.

> Moving back to telemetry, enabling telemetry does not confer the same kind of direct benefits to users as the module mirror and the checksum database do. Instead the direct benefits it confers fall on other users: (1) allowing your Go installation to participate in the system means other installations participate just a little bit less, thanks to sampling, and (2) allowing your system to send usage information strengthens the signal from others with similar usage. There is still an important indirect benefit: one system opted out won't have much of an impact, but 99% of systems opted out has a huge impact, and that leads to mistakes like the ones I mentioned in the first blog post, which do make Go worse for you.

> Like with the module mirror and checksum database, there are good privacy concerns to telemetry despite the clear benefits, so the design of transparent telemetry aims to address as many of those as possible. The bullet list in the GitHub discussion (also at the end of the blog post) enumerates the most important ones.

> Most people leave defaults alone or make intuitive guesses about what they want. That's totally reasonable: no one wants to spend half an hour learning the details of each specific setting. But my goal for the system is that if I did spend half an hour explaining how the system worked, then the vast majority of users would agree with the default and see no reason to opt out. Of course, some people will always opt out on general principle, and perhaps there are others who would opt in to some systems but not this one. For those people, my goal is simply to make the opt-out as easy and effective as possible. That's why opting out is just an environment variable (GOTELEMETRY=off) or a single command (go env -w GOTELEMETRY=off), and there's a quiet period of at least a week after installation to give plenty of opportunity to opt out before there's any chance of data being sent.

> I expect that this will not change your mind, and that you and a few others will still believe the telemetry should be opt-out. I accept that: I don't expect to convince everyone about this point. But I hope this helps explain how I am thinking about the decision.


> The debate isn't settled, but a lot of the issues raised here have been addressed.

That's not addressing the issue as much as dismissing it.


@rsc, if you ever see this, your proposal here means that I will never use any software written in Go ever again, if at all possible.

What others have said in this thread about telemetry becoming an "accelerant" will happen. Abuse will happen. Data will be put up for sale. IP's will be logged because users can't verify that they're not.

The only thing users can verify is what is sent and to whom. And only if they run packet inspection. Most users don't.

(Edit: I just realized that users may not even be able to tell who data is sent to because of proxies or the original collector selling the data.)

I have no reason to believe your personal motives are anything but pure; however, this capability will not just be in your hands. It will be in the hands of anyone with less-than-pure motives.

I applaud your efforts to make telemetry more transparent, but they are destined to fail.

When it comes to figuring out how users use software, the only thing to do legwork. Ask your users. Watch them if they'll let you do user studies. Pay non-users to use the software for a user study and put them through all situations, including rare ones.

This is the same thing we programmers tell the police to do when the police whine about end-to-end encryption: do old-fashioned legwork. Why should we, as programmers, demand that of police when we give ourselves tools to violate the privacy of users in the exact same way that police want?

Yes, that's right, the exact same way. Telemetry is a backdoor on a private conversation between a user and a machine.

Just do the work. I'm pretty sure Google has the money to do so.

You may respond that this is for Open Source developers to get data on their users. Well, if those developers are hobbyists, they don't have time to crunch data, and they're probably scratching an itch. If they are not hobbyists, they are paid and should do the legwork.

There is no excuse for telemetry. Just do the work.


> @rsc, if you ever see this, your proposal here means that I will never use any software written in Go ever again, if at all possible.

Have you actually read the articles?

The "data put up for sale" is to be made available publicly.

IP logging can already be done (the Go proxy is enabled by default).

All the source code is open.

What's your actual problem with this, beyond a knee-jerk reaction to the idea?


> The "data put up for sale" is to be made available publicly.

How can users verify this?

> IP logging can already be done (the Go proxy is enabled by default).

Sure, but more data will be attached to it. Also, in his proposal, he said that IP addresses will not be logged. I seriously doubt that.

> What's your actual problem with this, beyond a knee-jerk reaction to the idea?

Putting telemetry in a programming language. Working with a programming language is the number one thing I do on a computer. This means that, except for the fact that I don't work in Go, most of my private conversation with a machine could be backdoored.


> Sure, but more data will be attached to it. Also, in his proposal, he said that IP addresses will not be logged. I seriously doubt that.

I think it's worth quoting what Russ said in the article, which sounds very reasonable to me:

> The server would necessarily observe the source IP address in the TCP session uploading the report, but the server would not record that address with the data, a fact that can be confirmed by inspecting the reporting server source code (the server would be open source like the rest of Go) or by reference to a stated privacy policy like the one for the Go module mirror, depending on whether you lean more toward trusting software engineers or lawyers. A company could also run their own HTTP proxy to shield individual system’s IP addresses and arrange for employee systems to set GOTELEMETRY to the address of that proxy. It may also make sense to allow Go module proxies to proxy uploads, so that the existing GOPROXY setting also works for redirecting the upload and shielding the system’s IP address.

> This means that, except for the fact that I don't work in Go, most of my private conversation with a machine could be backdoored.

I don't get this. Given the design that the article is describing, how could most of your private conversation with a machine be backdoored? Specifically given that the Go tool is open source and used by millions already. Are you worried about sneaky code hidden inside that source code? If so, you should be worried already, because there's no reason that they couldn't already be doing that if they were so inclined.


> The server would necessarily observe the source IP address in the TCP session uploading the report, but the server would not record that address with the data

Users can't confirm this. In fact, this makes the next part a falsehood:

> a fact that can be confirmed by inspecting the reporting server source code (the server would be open source like the rest of Go) or by reference to a stated privacy policy like the one for the Go module mirror, depending on whether you lean more toward trusting software engineers or lawyers.

Sure, the source code of the server might be available, but you can't confirm that the server wasn't built with modified source code.

Second, as we've seen before privacy policies are empty; companies violate them all the time.

IOW, I don't trust software engineers, and I don't trust lawyers, and I would bet my life savings that there will be instances of companies lying in the ways I mentioned above.

> I don't get this. Given the design that the article is describing, how could most of your private conversation with a machine be backdoored?

Counts are enough. He says that counts are the only thing that will be uploaded, but he forgot that timing will also come into play.

(A week's delay is only an offset to subtract, by the way.)

Here's how it works: the tool reports counts, maybe in batches per hour. The server logs the counts and the hour those counts came from.

Yes, there's already another piece of data they captured, even though the tool ostensibly only sent counts.

Then those counts plus their timings can be used to infer things. For an example outside of Go (this is one I saw somewhere else), imagine a person texting more and more as the weekend approaches, until they are texting frantically. Then they suddenly stop in the evening of Saturday.

You only get the report of counts and the hours they happened in. Can you give some plausible explanations?

I can. They were texting someone they were planning on meeting that weekend, and then meet them. Can you give a few guesses as to why they're meeting them?

I'll let you fill in the blank.

Sure, there might be other reasons, but I would bet there are not many. Enumerate them, and you already know more. Find the similarities between all possibilities, and you know even more.

People forget about side channels all the time. In this case, the side channel was timing, but it doesn't matter what the side channel is; data can be extracted from it. And companies will.

> Specifically given that the Go tool is open source and used by millions already. Are you worried about sneaky code hidden inside that source code?

Yes. Just because there are eyeballs on that code doesn't mean they won't put sneaky stuff in. For example, the counts could be packed in a different order to tell the server more information. Or the tool could time its uploads. Or it could batch some counts and not batch others.

I'm not smart enough to catch all of the tricks they might pull. Are you?

> If so, you should be worried already, because there's no reason that they couldn't already be doing that if they were so inclined.

It's Google. Of course, they are so inclined!


What world do you live in?


A custom-built Gentoo that uses the Awesome Window Manager for a minimal install, builds Firefox from source, and uses OpenSnitch to sniff everything.

My machine is locked down hard.

Oh, and I checked what depends on Go on my machine. The one kicker was libcap, which won't depend on Go if I tell it not to build captree. So I did that.

I uninstalled Docker.

That leaves:

* `arduino-builder` (for my custom keyboard).

* Hugo (for my websites).

* An unnamed program.

* Gitea.

Besides `arduino-builder`, I have a plan to get rid of all three of those. For two, Hugo and Gitea, I had already planned to. The unnamed program is harder, but someone has already done one. Unfortunately, it's in Go, so I'm going to have to do something else myself.

`arduino-builder`, though, that's tough.


[flagged]


It actually is because I can utilize my machine better by having less processes running and lighter ones at that. I can run ZFS easily. I can have a minimal kernel, reducing my attack surface.

I can customize installed packages, such as what I did above.

Also, it taught me system administration.

Totally worth the effort.

As for getting rid of Go, I'm surprised that I had so few Go programs, and like I said, I was already planning on replacing two with my own stuff.


All browser have telemetry in them, but he probably used netcat to post that message.


Oh Google - never stop being you.

Not only is it going to be opt-out (because of course it would be coming from Google), I really like the whole "wait a week before sending telemetry" part that just coincidentally has the benefit of sneaking right past people that actively look for suspicious network activity when they've freshly installed something.

Am I being uncharitable?


Google is institutionally incapable of producing software that doesn't track its users over the internet.

One example is the stock calculator app on Android, which according to their privacy statement may track your app interactions, device id, and email. Like what if users actually subtract more than they add or something.

https://play.google.com/store/apps/datasafety?id=com.google....

Or the wallpapers they include on your phone - you guessed it!

https://play.google.com/store/apps/datasafety?id=com.google....

If that's the kind of environment you work in I'm not surprised this proposals seems modest by comparison.


As the saying goes, companies ship their org chart, and Google's is what, 90% ads?


Very popular programming language and IDE have telemetry on by default, VSCode, C#, Java etc ...

People act like they discover telemetry in 2023.

I don't think it's a big deal, ultimately it's to improve Go and the proposal makes it very easy to disable it ( single env variable ).


What the hell? "People act like they discover telemetry in 2023"? Are we just going to ignore the fact that the issue of companies spying on their users has been a hotly discussed controversial topic since the practice began? Do you think objection to telemetry first appeared in 2023?


Totally agree. Telemetry has been around and matured and benefits users. I’m not sure the benefits for Go would be as significant as other software but, really, why not?


> Telemetry has been around and matured and benefits users.

Does it? Telemetry mostly seems used to justify removing features I need on grounds that they’re little used.

As an other user noted, if telemetry is your yardstick, the average backup software removed the “restore” feature because that’s barely ever used.


Microsoft deprecated the disk-image backup in Windows 7 because it was infrequently used... buy random grandparents.

It was basically a "free" wrapper on top of the Volume Shadow Service (VSS) built into the operating system, but only IT professionals ever used it, so... it had to go.


There is a long list of use cases, which go far beyond "removing features": https://research.swtch.com/telemetry-uses

"Is it safe to remove support for X?" is one use case. Right now the strategy more or less amounts to "remove and see if anyone complains, possibly too late to change".


> Is it safe to remove support for X?

What the hell, it's a freaking compiler. What do you mean, "too late to change"? Fail the compilation is suddenly a showstopper bug?

If go wants to deprecate features, just follow the same procedure done by literally all other non-spyware compilers.


I don’t like this example in particular because observing too much “restore” activity is an excellent piece of information.


I fail to see what’s that got to “too much” restore activity doesn’t tell you anything actionable about your software, if anything it’s creepy as hell.


> VSCode, C#

These are both Microsoft products. Microsoft's position is well known.

> Java etc ...

Really? Which Java distribution?

It's definitely opt-in in Jetbrains IDEs.


I think they should all be opt-in as well. However, as a developer and pretend sysadmin, I am generally a nice guy about not turning off telemetry on software products with a user facing UI that I use frequently.


Since you asked, yes you are being uncharitable. It's rather hard to imagine that the people who are details-oriented enough to look for suspicious network activity after installing something wouldn't notice the disclosure on the download page (edit: or the release notes). On the other hand, the explanation given by Russ for delaying a week (so people have ample time to opt-out) makes sense.

Do you actually think Russ' explanation is just a pretext so they can evade detection by people who monitor for suspicious network activity (yet don't notice the disclosure on the download page)?


I am jaded and probably being a little uncharitable. However, I don't know Russ personally so I have no reason to place a high level of confidence that a Google employee isn't going to make decisions that align more with Google's interests vs privacy interests.

Regardless, there are plenty of ways to upgrade the Go tool chain (snaps, distro packages, fetching latest via curl, etc) that won't result in the changes being immediately visible. Given that, I think you are painting an overly optimistic picture of a world though where everyone that cares about this is going to be immediately aware that opt-out telemetry has been added vs a lot of installs being silently swept up into this by sheer ignorance.

Also, this is going to require me to go and set environment variables in about a dozen environments to disable the collection and while I can pretty easily manage that task via ansible I'm not happy about having to jump through hoops to turn off telemetry for a freaking compiler tool chain.


> I am jaded and probably being a little uncharitable. However, I don't know Russ personally so I have no reason to place a high level of confidence that a Google employee isn't going to make decisions that align more with Google's interests vs privacy interests.

If the nature of this data were different, I would be suspicious too. But it's really hard for me to see how a set of counters (whose names have various protections to ensure they can't contain private information) being sent approximately once a year is going to help with Google's advertising interests (which is what I assume you meant by "Google's interests"; I think they also have an interest in making Go better and the telemetry proposal aligns with that). This is literally the first time I've been OK with telemetry.

> Regardless, there are plenty of ways to upgrade the Go tool chain (snaps, distro packages, fetching latest via curl, etc) that won't result in the changes being immediately visible. Given that, I think you are painting an overly optimistic picture of a world though where everyone that cares about this is going to be immediately aware that opt-out telemetry has been added vs a lot of installs being silently swept up into this by sheer ignorance.

I agree there will be people who won't notice the disclosure (which will also be in the release notes), but again I tend to think that the people sniffing network traffic after installing a program would also scrutinize release notes instead of just blindly installing upgrades, which is why I find it pretty improbable that Russ' explanation was a pretext.

> Also, this is going to require me to go and set environment variables in about a dozen environments to disable the collection and while I can pretty easily manage that task via ansible I'm not happy about having to jump through hoops to turn off telemetry for a freaking compiler tool chain.

I think the best suggestion I've seen is that there should be a single environment variable (e.g. $TELEMETRY) that all programs should respect, to avoid the need to do work for every application.


> I think the best suggestion I've seen is that there should be a single environment variable (e.g. $TELEMETRY) that all programs should respect, to avoid the need to do work for every application.

This is a nonstarter, as DNT demonstrated in spades.


> there should be a single environment variable (e.g. $TELEMETRY) that all programs should respect, to avoid the need to do work for every application.

There was a proposal for that some years ago, but that didn't really go anywhere, partially because of the author's rather unpleasant attitude towards projects he wanted to implement it and their overly broad definition of "tracking" (which includes e.g. update checks).

Some discussions:

https://news.ycombinator.com/item?id=27746587

https://lobste.rs/s/htbkqd/console_do_not_track


And there it is. The real intentions of Google and the the Go Programming Language.

Google really can’t help themselves, to stick telemetry in anything.


They could allow public access to that data. That can help more people than just the Go team, and it would add transparency.


That's literally the plan if you read it.


Was hoping for a big highly designed webpage with "enter your github URL here". but alas

(it did say "transparent", like a service people opt into that could relate installations to github URLs)


I see a very frustrating pattern emerging in which $COMPANY asks its users if it can do something, the users say "no", and $COMPANY storms off under the guise that "the discussion is unproductive".

I am left with the impression that the decision has already been made, and that we are witnessing a PR strategy to make Google appear reasonable. I think that Mr. Cox, with all the respect I hold for him, is playing the part of the "useful idiot" here.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: