Transparent telemetry for open-source projects

slimsag · on Feb 8, 2023

I've been a pretty strong advocate of the idea that analytics should always be minimal, 100% anonymous, aggregated, and open to the public - otherwise it’s spying. This is how we do analytics on our websites today[0][1], and how we plan to do it in games we release in the future. Maybe one day I will start a dedicated FOSS service that people can use for exactly this with some trusted reputation/transparency/auditability to it.

I think what Russ has described here is decent and well-reasoned. I also think that Go being a product (it is, whether you like that word or not) makes it more fair to desire analytics of this form. I think it being opt-out is reasonable (after all, if it is not, they will make decisions using data that does not come from the vast majority of users, may as well not have analytics at all then.)

But I am afraid of this becoming pervasive not just in products (like CLI tools), but also in libraries, imagine every Go/npm package you use wants to ping the network because the authors want to know 'is this popular? can we deprecate XYZ method?' etc. If transparent telemetry in the form Russ and I have been viewing it becomes a more common thing, it won't be a surprise if more library authors begin to try to adopt something like this and it becomes a pervasive problem IMHO.

[0] https://hexops.com/privacy

[1] https://machengine.org

rsc · on Feb 8, 2023

I am concerned about run-time telemetry in libraries as well. It might make sense for language ecosystems to offer more data about library usage gathered at build time eventually, as a different system than the one I'm posting about today. I think when you get to that level of detail you probably need to start thinking hard about differential privacy and probably cryptographic solutions like ESA or Prio. I don't think we know enough to design the library solution yet.

JohnFen · on Feb 8, 2023

Telemetry embedded in libraries is simply abusive, in my opinion. At the very least, the decision about whether or not to include telemetry should be made by the application developers, not the toolmakers.

rsc · on Feb 8, 2023

Right. My hope would be that language tooling offering library developers visibility into compile-time information about library usage would reduce their desire to insert run-time collection instead.

nyberg · on Feb 9, 2023

Opt-in should be the default where the tool asks for consent politely. Information on how the application is being interacted with belongs to the user not the developer thus it should favour their choice over sneakily enabling it for those that didn't pay attention or don't understand it.

It's on the project to convince the user to turn on telemetry rather than the user having to remember to turn it off. Excuses such as "nobody would turn it on" don't apply.

yencabulator · on Feb 8, 2023

Yes. I fundamentally don't care how "good" Go telemetrics would be, because I don't want the FOSS ecosystem as a whole to take any more steps down that slippery slope. There will not be a way back from this.

vineyardmike · on Feb 8, 2023

> the authors want to know 'is this popular? can we deprecate XYZ method?'

This is something that was common for internal libraries at some of the places I've worked. I'm honestly a little surprised it isn't a thing we see externally. I for sure do not want to see it, but I'm surprised we don't. Its probably enough to look at the public usage on GitHub, and make inferences and post notice on future-major-versions of libraries. Github honestly should make a tool to do this, they'd have a huge opportunity to inspect the data.

msla · on Feb 8, 2023

> I've been a pretty strong advocate of the idea that analytics should always be minimal, 100% anonymous, aggregated, and open to the public

And opt-in.

> I also think that Go being a product (it is, whether you like that word or not) makes it more fair to desire analytics of this form.

Not by stealing it.

eliasdaler · on Feb 9, 2023

> 100% anonymous, aggregated, and open to the public

I don't believe the "100% anonymous" is a thing with AI anymore. When AI can identify/fingerprint you by your walk pattern[1], you can't really tell what data can and can't be used to fingerprint you.

[1] https://ieeexplore.ieee.org/document/8275035

xwowsersx · on Feb 9, 2023

Checking out Mach and seeing it's written in Zig. Bit surprised to see it being "used in anger" given that Zig is currently at 0.10.1. Can you share how your experience with the language has been so far? Thanks.

schmichael · on Feb 8, 2023

> the vast majority of projects, even large ones that would benefit, stay away from telemetry.

Nomad is one of these projects. We support a dizzying array of platforms (32bit Intel Linux?!). We have no idea how popular our Consul service mesh integration is. Are bug reports a sign of use or just failed experiments? Is anyone running on macOS in production or just ephemeral dev agents?

Surveys about this are just asking humans to do something computers can do better.

Obviously privacy and consent are paramount concerns, but not only are they solvable, in open source they’re fully auditable (and a fork could fairly easily maintain a patch that removes it outright).

I think open source largely rejecting telemetry puts it at a huge disadvantage to proprietary and SaaS software where it is the norm. I’m very excited to see someone as thoughtful and well reasoned as Russ Cox to be trying to move the status quo forward.

smoldesu · on Feb 8, 2023

On the contrary, I'd argue that the tracing visibility you're looking at isn't inherently a software trait at all. It's a deployment feature, which is something you address at-cost when building a product, but almost never when building FOSS software. It's not that people in FOSS don't see that upsides to it, it's that those upsides are insignificant relative to the cost of sustained market research. It's easier to just... make stuff, and have companies plaster over the gaps when their interests align.

Look at GNOME, which recently pushed for it's users to contribute telemetry: https://linuxiac.com/gnome-survey-results/

Nothing wrong with what they've done here, but we already had most of these metrics. Nothing was really learned, and it took Red Hat and a few thousand users to get here. For smaller-scale projects, imagine how much smaller the returns would be.

ergonaught · on Feb 8, 2023

Everything involves tradeoffs.

The times "we" (previous companies) tried to implement telemetry in open source non-SaaS products (as distinct from "projects"), we either got huge blowback or users/customers simply blocked it at the firewall (and security teams at major enterprises were unwilling to open holes anyway).

The only workable solution I found was integrating this in a value-add way, so that something in the service/experience/etc was better for the user/customer as a result of enabling telemetry, without the dark pattern of making things intentionally awful/worse without it. We simply never got enough data to matter otherwise. But, again, that was products and not projects.

GordonS · on Feb 21, 2023

> The only workable solution I found was integrating this in a value-add way, so that something in the service/experience/etc was better for the user/customer as a result of enabling telemetry, without the dark pattern of making things intentionally awful/worse without it.

This sounds like a great concept, but I'm struggling to come up with concrete examples - how did you approach it?

wrldos · on Feb 8, 2023

This is an incorrect assertion.

We have to ask for permission on our SaaS products to collect this data as it's not necessary to collect it for the product to function. The EU GDPR mandates this.

Russ Cox is suggesting that there is no permission step and that the data is collected by default.

That is the issue.

mbnull · on Feb 8, 2023

From my reading focused on this specific issue of the GDPR and the national laws of member states, this is not the case. Opt-in is specifically required for personal information. The telemetry data outlined in the proposal would not fall under this requirement. You can even retain time-limited IP logs with some special caveats. The GDPR is actually quite reasonable and fair.

Russ Cox is a very intelligent and effective engineer. He has a history of projects where he first analyses the problem space, then arrives at great solutions. He puts a lot of effort into discussing the problems and proposal with the community, especially after the widely criticized go mod decision by the go team (which is now mostly accepted as unfortunate, but in the end, the correct decision, I would think).

My point is: We all suspect Google and telemetry to be bad. But can we be charitable enough to separate the Go project, that is run by individual humans, and telemetry from our superficial cliches to actually read the proposal?

wrldos · on Feb 8, 2023

Google or Russ Cox's reputation is irrelevant. The idea stands alone. I'm merely crediting him with the idea.

I read the proposal. There is no discussion of the legality of this at all. I'd expect anyone with any level of supposed technical competence to consider this in relation to global data protection. I suspect there has been no legal review as mentioned in the thread because I know how slow the lawyers in this space work and the timeline between publishing this and now is too short to have had a conclusive answer.

As for your point about GDPR, I think if you apply your right to withdraw from opt out data collection and what that entails and then ask how this glaring defect is missing from RSC's paper, then you'll see exactly how much privacy consideration really went into this.

badrequest · on Feb 9, 2023

Can you articulate how this telemetry collection would violate the GDPR explicitly?

Thaxll · on Feb 9, 2023

GDPR only cover PI data so your comment is irrelevant.

https://gdpr.eu/eu-gdpr-personal-data/

kasabali · on Feb 9, 2023

Everything is PI when you connect enough dots

cube2222 · on Feb 8, 2023

Probably related to[0].

To anybody complaining that this should be opt-in: opt-in telemetry doesn't work. The reason for this is that most people don't care, but they don't care either way. They don't disable it when prompted, nor would they enable it manually.

The idea of telemetry is being able to prioritize the work that will be most widely useful. For this you need a good and balanced sample of your users. You don't really get any kind of sensible sample if you only do it opt-in. Additionally, this ship has long sailed, everybody does opt-out.

What I do think however, is that it should very clearly notify the user of this, and give them an easy way to disable it. Like in OctoSQL[1] (disclaimer: which I'm the author of) which prompts you on first run and shows explicitly how to disable it.

All things considered, this is an open source project, so you're free to maintain a fork without telemetry. The Go toolchain also uses the Google-hosted module proxy by default, which really is a bit like telemetry already.

[0]: https://news.ycombinator.com/item?id=34707583

[1]: https://asciinema.org/a/eWQsyXQKi1fmithyTekAD5fWS

saurik · on Feb 8, 2023

The argument for this being opt-in isn't about "it works better", it is about it being ethically correct. There are a ton of things that "don't work" unless you do something unethical: that doesn't mean they are OK, it doesn't mean they should be tolerated, and it doesn't mean the people who do them--and, at the end of the day, it is people who make these decisions: there is a human being who refused to say "no" and whose name we might even be able to find out--shouldn't be judged by their peers for doing these things... and that is all true even if it is (currently) legal for them to do it!!

cube2222 · on Feb 8, 2023

You're framing this as though the "ethical" choice were obvious, or that there was a person who "knew this was the ethical thing to do, but turned a blind eye".

I disagree, I think it's a very contested topic, with lots of discussion whenever it's raised here, with either side possibly being a vocal minority.

ergonaught · on Feb 9, 2023

The ethical choice is obvious.

The distinction is between "What I do with my computer is none of your business unless I choose to make it your business" versus "What I do with my computer is your business unless I choose to not make it your business".

It's insane that we are still having to justify privacy as a default, or that people continue to rationalize away the concerns.

Yeah, maybe if it's opt-in they won't have much telemetry data. Perhaps, in fact, it would not be much better than having no data at all. That will make some things harder. Major bummer. If it was easy to do the right thing, more companies would do it.

eps · on Feb 9, 2023

> It's insane that we are still having to justify privacy as a default

Half of the HN is in love with Chrome and nearly all are on Gmail.

The relentless drive towards the erosion of privacy powered by free carrots worked.

There is now a whole generation of people that sees all this as a norm.

Faark · on Feb 9, 2023

Any program is opt-in, so you just don't have to use it. Mayor bummer. Assuming data collection is properly disclosed, ofc, but i don't see anyone here arguing against that.

The "there shouldn't be any" argument just seems so entitled with there being such demand / reasons to do so. I applaud everyone trying to find ways to satisfy both sides, as done by the original article.

bheadmaster · on Feb 11, 2023

You can't satisfy both sides by completely ignoring one side.

"What I do on my machine is none of your business, unless I decide it's your business" is a strong argument, presented multiple times in multiple places, and I haven't seen any rebuttal by any Go representative.

You can ignore the community and do whatever is good for business, but it's hypocritical to pretend you're a "community-driven" project if you do that.

mort96 · on Feb 8, 2023

Spying on people without their consent is not ethical.

0x6c6f6c · on Feb 8, 2023

Is it not consent if you tell the user that it's on, how to turn it off, and the user confirms the prompt having decided to not disable telemetry?

Because that's how OP framed it. There is an ethical way to track how your app is used, with user consent.

mort96 · on Feb 8, 2023

Putting text on the screen which would inform the user if the user was to read and comprehend it is not consent, no. (I mean, try to apply that logic to any other situation. You put a post-it note somewhere noticeable which says that you will silently mix some medication into your coworker's lunch unless they write some text on another post-it note placed on the kitchen fridge. When you then notice the absence of that second post-it note, and mix in some medication into their food, is that "consent"?)

Furthermore, looking at the asciinema from OctoSQL, users apparently have to edit their profile files on every single machine they intend to use Go on, then remember to verify that the profile applies and has no typos, make sure to never ever accidentally run the program in a context where environment variables aren't respected (would you remember to always use `sudo -E` if you need to use the tool as another user for example?). The danger seems extremely high that the user would, at some point, accidentally run the command without the env var set, even if they were technically proficient and did their best to opt out.

This is not how consent works.

ragall · on Feb 9, 2023

No because consent is required for it to be on in the first place.

badrequest · on Feb 9, 2023

How would you use compiler telemetry to spy on people?

eps · on Feb 9, 2023

Working hours, work location, home location, favourite cafes, sick days, vacations, hotels, wealth level. And that's just from IP addresses and timestamps alone, and without cross-referencing with all the data that Google vacuums over other channels.

rogpeppe1 · on Feb 9, 2023

That ship has already sailed. The Go tool already by default makes network requests to the Go proxy, which potentially allows everything that you're talking about there. What's significantly different about this telemetry proposal?

mort96 · on Feb 9, 2023

A couple things IMO.

First, making network requests when downloading packages is necessary for the tool to function and unavoidable. People who care about this will be using a VPN of some kind. It's just how the Internet works. But telemetry is something the tool author is choosing to add, not something that's necessary due to the architecture of our computing infrastructure.

Second, the Go telemetry would apparently create a unique, persistent user ID. Normal Internet use doesn't, there's just the IP address which is different from location to location, shared by a bunch of people behind the NAT, and can be masked using common tools.

And yeah, I know this is "anonymised"... but if you have one user ID which uses Go sometimes with an IP address from a particular apartment complex land sometimes from a particular office space, finding out which individual that user ID belongs to is trivial.

rogpeppe1 · on Feb 10, 2023

> First, making network requests when downloading packages is necessary for the tool to function and unavoidable.

It's technically not unavoidable. The Go authors could have made use of the proxy opt-in rather than opt-out, making the tool less usable as a result. A similar argument applies here, I think.

> Second, the Go telemetry would apparently create a unique, persistent user ID

Where did you see this? I scanned through the "Telemetry Design" article reasonably carefully and couldn't find any mention of this concept, and the type definition for the posted JSON (the `Report` type) doesn't seem to include any such user ID.

In the end, ISTM that you're not complaining about something that actually affects your privacy in any way, but just the _idea_ of telemetry. Is that really something worth taking such a hardline stance on?

justinclift · on Feb 10, 2023

> The Go tool already by default makes network requests to the Go proxy

Frankly, that crap should be expunged from the Go toolchain as well. :(

atahanacar · on Feb 8, 2023

[flagged]

JohnFen · on Feb 8, 2023

I agree that opt-out is a Bad Thing, but I disagree with this stance. And I think lots of people in the pro-telemetry camp see that there's an ethical issue to be discussed, but they reach a different conclusion. They shouldn't be dismissed so glibly.

atahanacar · on Feb 8, 2023

Reaching a different conclusion is one thing, but not seeing a dilemma is another. One can always argue that invading a person's autonomy might be necessary given the benefits but seeing no issue is just turning a blind eye.

JohnFen · on Feb 8, 2023

In the Golang announcements, it's clear that they completely see and understand the dilemma, and have provided a lengthy explanation of why they decided for opt-out anyway.

I respect that. I don't agree with the decision, but it was made with understanding and thought.

atahanacar · on Feb 8, 2023

I made my original comment misunderstanding what the parent comment meant as "not knowing an ethical problem exists". I also am not talking about this specific decision, but criticizing ethical decision making in the tech industry in general.

In ethics, there is no right or wrong answers (mostly), just right and wrong methodologies. If you go the pragmatic way, you'd argue that the benefits of telemetry are greater than the downsides and implement it. If you go Kant's way, you would already have a maxim (either "never invade privacy" or "prioritize technical benefits regardless of the users" in this case) and act according to that maxim regardless of the situation. If you go the intent way, all that matters is whether your intent for the action is good or bad, in contrast if you go the outcome way, all that matters is the outcome regardless of the intent or the methods.

These are all "valid" ways to discuss an ethical dilemma. However, one must always acknowledge the dilemma. This industry, especially big tech, seems to ignore this quite often, mostly because it's very easy to see people as "just numbers" when you don't see them directly. Don't even get me started on lawmakers who are also ignoring this whole issue. Many standard practices in this industry would be straight up illegal in lots of other areas, especially where there is face-to-face contact.

philosopher1234 · on Feb 8, 2023

This is a very extreme position. It's hard to take you seriously when your comment doesn't have any nuance.

atahanacar · on Feb 8, 2023

Finding the collection of a person's data without consent unethical is not an "extreme position". Since when "consent" or more correctly "autonomy of individual" is called "extreme"? If you did the same thing in my field (medicine), you would lose your license.

philosopher1234 · on Feb 8, 2023

I agree, that's not what I think was extreme about your position. I think you've invented straw men in this comment and your previous comment.

atahanacar · on Feb 8, 2023

Reading your comment again, I can see it now. I misinterpreted "knew this was the ethical thing to do, but turned a blind eye" as "knew there was an ethical problem, but turned a blind eye".

Turns out, my straw man can't read.

philosopher1234 · on Feb 8, 2023

Props to you for saying so publicly! I'm not sure if you're unusually open or if I just found the right words to persuade you, but this is a first for me :)

rcme · on Feb 8, 2023

What is the argument for opt-out telemetry being unethical?

_ph_ · on Feb 8, 2023

Because it most likely means that people are sending data without their consent. Perhaps I am naive or just very old, but I wouldn't expect a compiler to "phone home" with information about what I do with it. Certainly not without me expressing consent first.

So if you want that information, find a way to ask the user first. If you can give a good and understandable explanation on how the information is useful, the users might give their consent happily.

delusional · on Feb 8, 2023

I don't think it's all telemetry. I suppose telemetry could be designed in a way that preserves the users privacy to an extent that is compatible with their native assumption. I suppose that design also depends on what you're building.

If you're building a website, I think it's fairly reasonable for you to store my IP. That's inside my expected privacy loss when dealing with a remote party. I have to connect to your computer, much like i have to physically walk into a store. I don't mind you remembering that I was there. Running a compiler on the other hand feels more "private" to me somehow. My expectation when using a compiler is that it won't send anything to anyone, because why would it?

In general I think our industry is starved for relevant and foundational ethics research, outside of the FSF at least.

masklinn · on Feb 8, 2023

Because that goes against informed consent.

Opt-out is generally rejected by European privacy laws.

remus · on Feb 8, 2023

> Opt-out is generally rejected by European privacy laws.

...where personal data is involved.

It strikes me that this proposal goes to considerable lengths to avoid collecting anything that could be considered personal data.

balena · on Feb 8, 2023

IANAL but European law is nuanced over whether IP addresses are PII. If I'm not mistaken it's been ruled they are for ISPs, rationale being they have enough other data points that once correlated with IP addresses allow to identify individuals. Whether the same applies to Google (I suppose) is definitely not clear to me.

Scaevolus · on Feb 8, 2023

The proposal explicitly says they don't collect IP addresses or _any_ unique identifiers.

gwillen · on Feb 8, 2023

As far as I'm aware/recall, European privacy laws consider any connection back to a telemetry server to count as "collecting" IP addresses, since the telemetry server learns it (even if they pinky swear not to write it down.)

ohgodplsno · on Feb 9, 2023

You don't recall perfectly well.

Storing IP addresses in logs means that you are now responsible for them, yes. Drop them out of your logs, and you're perfectly fine.

rcme · on Feb 9, 2023

I think privacy laws only apply to things that “process” PII. Accepting a network connection is not, in and of itself, considered to process PII.

notpushkin · on Feb 9, 2023

Can you send telemetry data through Tor, though? :thinking:

cowl · on Feb 9, 2023

There are Court cases that have established that the very fact that a connection is being established constitutes a potential collection of IP adresses and needs to be declares under GDPR. (this was specifically about sites using links to Google Fonts on their websites, this was enough to warrant a GDPR declaration that IP are being collected or the sites needed to remove their Font CDNs and supply them locally). Under the same Rule, Companies will need to ddeclare this usage of Go Compiler in their employee GDPR declaritions.

balena · on Feb 8, 2023

I assumed you need consent to receive PII, full stop. Again IANAL, but I assumed saying you don't do anything at all with the PII you receive doesn't exempt you from anything under GDPR. I may be wrong, though I hope not to be.

IanCal · on Feb 8, 2023

I agree it does do that though for context of others reading the thread personal data is a very broad topic:

https://gdpr.eu/eu-gdpr-personal-data/

You have to be very careful to do it properly.

marginalia_nu · on Feb 8, 2023

> The idea of telemetry is being able to prioritize the work that will be most widely useful.

It does sort of hinge on the highly suspect assumption that usefulness is correlated with use. An obvious counter-example to this is something like a fire-extinguisher, which will in the ideal case just sit on a wall until it's use-by date passes and then it's discarded having never been used; or on the flip side, an incredibly byzantine workflow that could be reduced to something much simpler will appear important and useful.

Even without these edge cases, interpreting statistics is really hard. Like people with PhDs who have studied these things for years still get them wrong all the time.

What ends up happening more often than not is it's used as a tool to quiet the critics when pushing through unpopular changes.

nine_k · on Feb 8, 2023

Most software features are not like fire extinguishers.

More than that, the interesting stats may be not even around user-visible features, but around internal mechanisms, like some cache hit rate, or how often is some branch in the compiler invoked.

As long as stats are clearly inspectable, reasonably anonymized, and are opt-out, I'd be fine with sending them.

bitwize · on Feb 8, 2023

There are essential, fire-extinguisher-like features. The canonical example is the joke about backup software: if it were developed according to today's standard of telemetry-driven engagement analytics, the restore from backup functionality would be removed because it's used so infrequently.

This actually happens sometimes: when developing the demo ".kkrieger", a first-person 3D shooter in 96 KiB, demogroup theprodukkt tried to shrink it down to get it under the 96 KiB wire. One of the tricks they used was using a profiler to identify code sections that were never reached and could be removed. One of the sections they removed was the handler for the up arrow key in the main menu, simply because the test player never pressed up in the menu.

If you think that Google or another large software organization won't misuse telemetry by cutting or neglecting important but infrequently used functionality to hit some KPI... have you ever worked in a large software organization?

All stats can be deanonymized. The more data you make available, the more you identify yourself. I do not need software I use stealthily tying up bandwidth by "phoning home" with data about me. It is simultaneously betrayal and resource theft. If I wanted to contribute to the improvement of the software, I'd file a bug report.

nine_k · on Feb 8, 2023

I think your view of the ways usage stats are used is a bit simplistic. Not everyone remove "underused" features without giving some consideration, even in big corporations.

But since you clearly don't like telemetry, you should have a way to reliably switch it off. Here we are on the same page: there must be a well-documented and easy way to switch any telemetry off.

bitwize · on Feb 8, 2023

If telemetry is on by default, the vendor obviously wants you to have telemetry on. They are incentivized to make switching it off as difficult as possible, and even pull tricks like turning it back on after a delay of 7 or 30 days or so.

Telemetry should be opt-in, if it's provided at all.

marginalia_nu · on Feb 8, 2023

The features that are like fire extinguishers are the ones most likely to be unjustly removed with the rationale of looking at telemetry.

See for example Mozilla's bizarre decision to remove the ability to change the override the character encoding of a webpage with some half-baked detector.

philosopher1234 · on Feb 9, 2023

We cant solve this problem by carving out our eyes. We must adapt to having data about the world.

simoncion · on Feb 9, 2023

If folks were using just their eyes, they'd be calling users in to watch them interact with the software, calling users up to perform user surveys, doing all sorts of live-user testing.

But, that costs money: folks don't generally like to do surveys for free, don't like to come to your site (or have you come to theirs) for free, don't like to participate in tests for free... and the caliber of person who can design and orchestrate all that is _notably_ expensive.

So, companies have discovered that it's far, far cheaper to shove remote eyes into their products. No need to schedule user observation sessions, pay for transportation and participation, have to bother with hiring people who are capable of coming up with a good set of questions and scenarios to walk through.

pmontra · on Feb 13, 2023

Except that those remote eyes don't look at what a person would look at. They know if a feature was used but they cannot see the frustration on the face of a user when some feature is missing or does not work as expected or is hard to find.

stonogo · on Feb 9, 2023

Nobody wants you to carve out your eyes. They want you to stop shoving eyes into everything you touch.

Meanwhile, I'll decide what I must or must not adapt to, thank you.

philosopher1234 · on Feb 9, 2023

Read more carefully

stonogo · on Feb 9, 2023

Write more clearly

philosopher1234 · on Feb 9, 2023

Wouldn’t have mattered if I did

BonoboIO · on Feb 8, 2023

„ Most software features are not like fire extinguishers. „

Amen

Sometimes some pretty niche feature, that is used by one individual that is loud, gets more attention than a feature, used by thousands or millions that are just silent users.

cube2222 · on Feb 8, 2023

All this boils down to "an unskilled engineer will misinterpret data even if they have it". I'll assume the Go team knows what they're doing, based on their track record so far.

There's a lot of very simple questions you can answer very reliably, too, like "what proportion of the users are still using a certain compatibility flag".

marginalia_nu · on Feb 8, 2023

My point is that scientists whose job is to interpret data and construct experiments gets this wrong on a regular basis, despite years of training in constructing experiments and interpreting data, despite peer review, despite staking their career and reputation on not making these kinds of mistakes. They still happen! A lot!

Interpreting data is very hard.

masklinn · on Feb 8, 2023

> I'll assume the Go team knows what they're doing, based on their track record so far.

Funny, I’d assume the exact opposite. After all much of the understanding of privacy and statistics at scale was developed after 1980.

philosopher1234 · on Feb 9, 2023

Normally Hacker News is ruthless with stupid comments like this, but some sort of unexamined feelings of inadequacy make Go devs a fair target. I hate this attitude.

remram · on Feb 9, 2023

Go is actually well known for dubious choices they reverted later, like not using libc, their weird major version scheme, or the absence of generics. There's a good record of charging ahead against the popular wisdom and being proven wrong.

Terretta · on Feb 8, 2023

> What I do think however, is that it should very clearly notify the user of this, and give them an easy way to disable it.

You make a good point.

As a for instance from a popular Mac-based package manager that (unexpectedly for many) defaults to telemetry from your CLI:

`brew analytics off` is not hard to type after installing homebrew, but the installation text doesn't mention that; instead it points to a web page you have to read about how wonderful the analytics are before eventually finding the incantation:

https://docs.brew.sh/Analytics

I wonder how many people care enough to click that link, read all the "analytics are actually good for you" copy, and then change their mind to leave it on. I'm guessing almost zero?

But perhaps most users won't cut and paste the link, where if it just suggested `brew analytics off` many users would type it.

cwkoss · on Feb 8, 2023

> opt-in telemetry doesn't work

That's too bad. Guess you don't get any telemetry data if you want to develop ethical open software.

The answer isn't to bend your ethics.

If a take a dollar from everyone but it's opt out, that's just theft with extra steps.

If I make it opt in, nobody is going to give me the dollar, but that doesn't make opt out morally justifiable.

philosopher1234 · on Feb 8, 2023

Why is it unethical? People in every day life constantly assume things about one another and take actions that affect one another without asking first, and this is a practical necessity, so if the reason is that "you may not do anything without me tell you its ok" I don't think thats a defensible position.

eps · on Feb 9, 2023

> they don't care either way

This is not true. You know it and you are being coy about it.

If your "easy way to disable it" was a simple question next to that unexpected notification displayed once in response to a completely unrelated action - "Would you like to keep telemetry on?" - you bet you'd have massive opt out rates. Nobody wants telemetry.

What you have is a gray pattern. The same pattern that caused the EU to clarify its cookie law to require Yes and No choice to be equally accessible. And what you have is "Accept all" and "See details". Except that yours is worse by being a one-time notification.

the_gipsy · on Feb 8, 2023

> this is an open source project, so you're free to maintain a fork without telemetry.

That option is a joke. The real alternative is rust - or any non-corporate platform that isn't gonna pull these kind of stunts.

wrldos · on Feb 8, 2023

But that's wrong. There is no position for this in a civilised society:

"If we ask everyone is going to say no, so we will steal it unless someone tells us not to"

BonoboIO · on Feb 8, 2023

I think the comparison of telemetry and stealing is pretty harsh.

Is opt-out telemetry unethical ... depends. If you use it in a privacy preserving way no, if you spy on your Users, sell the data for money or advertising obviously it is unethical.

The hard truth is, nobody reads the manual. Opt in telemetry is often a minority, and you then work with niche data for a minority that influences your development in certain ways.

JohnFen · on Feb 8, 2023

It really all boils down to meaningful consent.

> if you spy on your Users

In my opinion, any data collection about me or my machines that occurs without my active informed consent is "spying". This is my fundamental problem with opt-out mechanisms. They do not indicate or imply that active consent was obtained.

BonoboIO · on Feb 9, 2023

A Flash screen at installtime that logging is on an you can disable it in the settings.

Would that be enough for you?

simoncion · on Feb 9, 2023

Unless a Windows user is installing the software, that screen would be displayed in approximately zero of the cases where a package manager was used to install the software. Similarly, exactly zero widely-used Docker images that contain the software would display this splash screen, as the software would already have been installed.

In short, unless you're a Windows user there are so __very__ many ways to install software that aren't "Go to the project home page, download a generic install binary, run that binary with world-write permissions.". Aside from very small-time projects, I can't think of the last time I used an officially-maintained install script that I got from the project's servers to install something.

JohnFen · on Feb 9, 2023

It would be better than nothing, but not really adequate. There are numerous circumstances where such a screen is impossible or impractical, and if every program did this, it would be as good as not doing it because people will react to it like they react to other common warning dialogs -- not really seeing it at all.

wrldos · on Feb 8, 2023

But is that not the decision of the person who owns the data?

philosopher1234 · on Feb 9, 2023

The world is full of people making decisions for one another. Did you consent to unix files not flushing on every write() call? Its not a meaningful complaint.

lokar · on Feb 8, 2023

That’s not their argument. They say if you ask everyone if it is ok most just ignore your question.

wrldos · on Feb 9, 2023

That's how they presented their argument. It can be presented both ways depending on how you want to promote it.

lokar · on Feb 9, 2023

Not responding is not the same as responding no

account42 · on Feb 9, 2023

It is when it comes to giving informed consent.

js8 · on Feb 9, 2023

> To anybody complaining that this should be opt-in: opt-in telemetry doesn't work. The reason for this is that most people don't care, but they don't care either way.

Why, in OSS, do you care about the users that do not care? If the users truly care they should buy support, or at least enable telemetry.

If someone complains about feature getting removed - tell them to enable telemetry next time. Maybe people who want so much privacy shouldn't be expecting free support for features they want.

Maybe you're worried about people like your grandma installing Linux. Then this should apply on the distribution level - there should be an opt-in setting for telemetry, that enables it in various programs being distributed (at the discretion of the package maintainers), so that user doesn't have to opt-in individually.

Being opt-in will also make it compliant with EU data privacy laws.

msla · on Feb 8, 2023

> opt-in telemetry doesn't work.

Then don't do telemetry at all.

Google doesn't have a right to data.

geodel · on Feb 8, 2023

Yeah, but they do have right to modify projects that they sponsor as they see fit.

yencabulator · on Feb 8, 2023

At which point they lose the "right" to claim it's a community project...

mh7 · on Feb 8, 2023

Any evidence that telemetry actually works? (i.e makes the program better)

charcircuit · on Feb 9, 2023

Yes, the simplest example is crashes being reported.

Developers can see that a specific crash is being hit by 1% of their userbase and then check the logs to see what went wrong and where the cash happened. The fix is made the program is indeed better.

account42 · on Feb 9, 2023

You can let users report crashes. You can even prepare the data for them. You can even provide a wizard that automatically opens on crashes to help upload that data. But you need to obtain informed consent. Sending data behind the user's back without ASKING FIRST is not ok. Stop doing it.

eps · on Feb 9, 2023

If it collects actionable data, yes, of course it works.

Crashes, common failures, UI/UX friction points, avarage usage patterns - all can be used to prioritize work to take care of things that have the biggest impact.

mh7 · on Feb 10, 2023

I asked for actual concrete evidence, not "can be used".

Is there an example of a program that was crap, implemented telemetry and then got better afterwards? (and of course controlled for factors that might have improved the program anyway)

I mean since telemetry advocates are so into how useful data is, surely they must have data on whether telemetry itself works?

msla · on Feb 8, 2023

> opt-in telemetry doesn't work.

Opt-out telemetry won't work when people send false data to the servers.

JohnFen · on Feb 8, 2023

> Additionally, this ship has long sailed, everybody does opt-out.

This is a meaningless point.

gowld · on Feb 8, 2023

Why does the dev team to optimize a use-case that the user doesn't want optimized?

omginternets · on Feb 8, 2023

So it sounds like we can't have telemetry?

msla · on Feb 9, 2023

There's a reason it can't be opt-in: Google intends collect information about politically sensitive software and give that information to governments who wish to punish developers.

dang · on Feb 8, 2023

(This comment was originally posted to a different thread that we merged here; that's why it now links to the page it's on.)

msla · on Feb 9, 2023

Telemetry in the Go toolchain:

https://github.com/golang/go/discussions/58409

msla · on Feb 8, 2023

Related, with further discussion:

https://news.ycombinator.com/item?id=34709078

account42 · on Feb 9, 2023

To anybody complaining that this should be opt-in: opt-in petty doesn't work. The reason for this is that most people don't care, but they don't care either way. They don't allow it when prompted, nor would they report it when not prompted.

userbinator · on Feb 8, 2023

Nope, nope, and more nope. You're not moving the Overton Window any more on me.

In fact it seems there's a clear correlation between the quality of software and how much spyware there is embedded in it. It's often merely another way to justify unpopular changes with "but the data says so".

IMHO if you want to collect any information, it should never be anything but opt-in, a conscious decision.

mananaysiempre · on Feb 8, 2023

> IMHO if you want to collect any information, it should never be anything but opt-in, a conscious decision.

Serious (general) question: How do you do that given a non-technical user population? Debian’s opt-in popcon kind of manages to get a little bit of data from a fairly technical one, but nowhere near enough to estimate a low usage frequency, and it’s the only opt-in program I’m aware of that gets anything usable at all. Given that I’m unwilling to implement an opt-out system, I don’t really see a workable approach here at all.

rom-antics · on Feb 8, 2023

Ask for consent during setup or on first run. Syncthing does this and they get plenty of usable data. It's even public: https://data.syncthing.net/

JohnFen · on Feb 8, 2023

What I hear you saying here is that people don't do what you want if you give them the choice, so you lean towards not giving them the choice rather than respecting their wishes.

Is my interpretation correct?

mananaysiempre · on Feb 8, 2023

>> I’m unwilling to implement an opt-out system

> [Y]ou lean towards not giving them the choice rather than respecting their wishes.

> Is my interpretation correct?

I don’t think it is, no :) Rather, I’m not sure how to sell, to put it crassly, users on a choice when properly investigating or even being confronted with that choice would delay them seeing the dancing bunnies[1], but that would also, if I have any say about it, improve the bunnies in the future.

Does that mean there’s a shade of “I know better” in my problem statement? Of course it does, if I didn’t know better than the average user I’d have no business designing such choices. I don’t think there’s anything wrong about that, better than the average at an activity few practice is not a terribly high bar. Not giving the users a choice or manipulating them into making the one I think is right would absolutely be wrong, though.

Basically, how do I make the user think, how do I give them the appropriate data to do so, and how do I deal with the obvious contradiction of that goal with principles of good design[2]? The potential benefits to the software and (thereby) the users are too much to give up without even asking those questions.

(See nearby comment for extended discussion.)

[1] https://blog.codinghorror.com/the-dancing-bunnies-problem/

[2] https://sensible.com/dont-make-me-think/

JohnFen · on Feb 8, 2023

Thank you for the thoughtful response. We disagree on much, but I respect your opinion nonetheless.

> Not giving the users a choice or manipulating them into making the one I think is right would absolutely be wrong, though.

I'll pull out just this point, though, to perhaps illustrate how different our worldviews are. I consider opt-out to be a manipulative approach.

mananaysiempre · on Feb 8, 2023

> I consider opt-out to be a manipulative approach.

So do I, which is why I wrote I’m unwilling to implement it :) The original (and, to be clear, purely theoretical) point was, opt-out is too manipulative while opt-in is likely useless.

Ah shoot. Did you take that to mean that I’m unwilling to implement an off switch at all? That wasn’t it, sorry for the confusion.

JohnFen · on Feb 8, 2023

Perhaps we aren't so far apart after all.

The struggle is real. As a developer, more data is obviously desirable and can make development much easier. I just can't think of a way to do telemetry that, if I were a user, I would accept. And I don't want to produce software that I wouldn't personally use.

I just don't know how to have my cake and eat it too.

philosopher1234 · on Feb 9, 2023

As a developer your entire purpose is to make decisions for users. "Where should this service live, how should security work, how should I increment their billed service usage, when should I shut down their vm..."

I don't think the issue here is making decisions for users and not giving them a choice. 99.999% of software does not have a flag to change it. The issue seems to be more about the precise nature of this specific feature.

JohnFen · on Feb 9, 2023

> The issue seems to be more about the precise nature of this specific feature.

Of course. Not really just this specific feature, but any and all features that can violate users privacy or security. In the end, I don't think these are decisions that developers should be making for users, because not all users have the same needs and getting this wrong can do harm.

That's why, for these sorts of things, meaningful user consent is critically important.

musicale · on Feb 9, 2023

> people don't do what you want if you give them the choice, so you lean towards not giving them the choice rather than respecting their wishes

This nicely summarizes a very popular approach to telemetry – and to a variety of user-hostile behaviors. Web sites (for example) seem to have mastered the "fight against user preferences" approach, trying to play video when autoplay is blocked, using javascript modals since pop-up windows are blocked, fighting ad blockers, ignoring "do not track", etc..

If users are given any choice, usually it's a difficult opt-out process, which is more effective precisely because it makes it harder for users to make the choice that you don't want them to make, even if it isn't their actual preference. For an extreme example, see Facebook's (anti) privacy settings. Commonly used dark patterns further amplify user manipulation.

cowl · on Feb 9, 2023

First, we are talking about Development tools, so not non-technical population. Second, if Opt-in is considered difficult for the population, what does it say about the opt-out? Opt-out is always, no exception, more difficult than opt-in.

mananaysiempre · on Feb 9, 2023

Seems like you also[1] didn’t read the above the way I intended. I meant that I find explicit opt-out (as opposed to explicit opt-in) manipulative so I don’t want to implement it, not that I oppose having the ability to opt out at all.

The difficulty, though, lies not (entirely) with the default position of the toggle, the difficulty lies with making the user think about the question which is not relevant to their immediate task and which in any case they may not have the theoretical tools or time to evaluate properly. The default position of the toggle (if “off”, as I believe it should be) matters only because an opt-in process means you either confront the user with irrelevant questions on first launch or get essentially no data.

(I called this out as a a “general” question because I meant for it to apply not only to the Go toolchain, but to general-use software like Firefox or niche but non-programmer-oriented software like Audacity.)

The systemwide daemon proposed elsethread[2] would solve this nicely as well, but I have to admit that I’ve dismissed it from my thought process more than once before, because I didn’t think we were going to get one with any reasonable usage on any platform. Now that I’ve seen it put in writing, maybe it does deserve to be considered.

[1] https://news.ycombinator.com/item?id=34716342

[2] https://news.ycombinator.com/item?id=34709836

autoexec · on Feb 8, 2023

How many people are installing and using open source software but couldn't understand a pop-up explaining what data is collected and asking if they'd like to submit it? Is the non-technical nature of the user the problem or is it just that when you have an opt-in option most people make the choice to opt-out? That's the thing about respecting users by giving them choice, they get to say no. If they mostly say no, and you don't get enough data, that's the will of your users and therefore not really a problem.

mananaysiempre · on Feb 8, 2023

> How many people are installing and using open source software but couldn't understand a pop-up explaining what data is collected and asking if they'd like to submit it?

I’ve taught probability theory using randomized response[1] as an exercise problem, and while people can understand it given time and motivation, it’s not immediately obvious. So I’m not exactly hopeful that a prospective Audacity, Blender, or even Free Pascal user (to take an arbitrary set of examples) would get what I mean if I say “I’m collecting no more than 10 bits of information about you using RAPPOR”[2], and I’m not willing to engage in comforting bullshit such as “all collected data is anonymous”, as I’ve been all too close to situations where the difference between the two might be one between freedom and prison.

> Is the non-technical nature of the user the problem or is it just that when you have an opt-in option most people make the choice to opt-out?

Both, because confirmation dialogs, especially privacy-related ones, have been thoroughly poisoned in users’ minds. But confirmation of obscure actions, however beneficial their consequences, is problematic in general—if I go on the street and ask people if they’d like caffeine in their tea or ascorbic acid in their apples, I expect (but have not checked) that the majority will say no, nevermind that both are normally there and intrinsic to the experience.

(The possibility of meaningful consent from a non-specialist is the subject of much discussion and few good answers in med school, or so I’ve heard.)

Whether the ultimate answer is to grant or deny permission, I’m not sure I can present the question in a way that will actually have it made on the basis of merit and not on “scary permission dialog, better say no” or “yes, yes, just let me through to my dancing bunnies[3]” or “yes, if I say no the installer will just tell me to GTFO”.

(In that respect the “Send crash report to vendor” button is unexpectedly good, because you’re not actually interposing yourself between the user and any prospective bunnies. But personally I don’t like to spend time and effort in order to send “feedback” into an unmarked hole where I’ve no idea if anybody will ever look at it. From that point of view, it is background data collection that’s unexpectedly good.)

And even if, for the purposes of this question, it would be best if people took the time to learn the necessary maths, computing, and operational security to make an informed choice, in reality I’m not sure that’s the best thing they can spend their life on.

So it may be the answer is that you simply can’t do telemetry well for the social reason that users won’t ever end up making an informed choice, or that the well has been poisoned so thoroughly that the rational choice is to reject everything. It’s just that I know that it’s basically possible in a technical sense, so I don’t want to give up that easily.

[1] https://en.wikipedia.org/wiki/Randomized_response

[2] https://blog.cryptographyengineering.com/2016/06/15/what-is-...

[3] https://blog.codinghorror.com/the-dancing-bunnies-problem/

geodel · on Feb 8, 2023

Well, its not a humble opinion but a very strong one which is fine since you want certain thing in certain way and nothing else will do.

oneplane · on Feb 8, 2023

So how would you design a well-working system to make data-driven decisions rather than guesses? Most collection methods are notoriously bad, but partial collection is also bad since we now have to somehow put a weighing factor on presumed absent data, which turns choices into guesses again.

I think this is a really hard problem, and simply trying to guess in the dark as to what people want isn't the smartest way to go about finding the path forward / priorities / improvements / defects.

It also isn't something that we had in the past, because when we used to buy the IDE, buy the compiler, and then build software, sell that software, and let everyone know what cool tools we used, you'd have sales figures that would inform the creators how the tools were used. Now, the tools are available to everyone, anonymously, and everyone has an opinion on how well it works for them, but doesn't have the time to write a well-written report every time a release happens.

userbinator · on Feb 9, 2023

The whole idea of "data-driven decisions" is the problem.

It's an excuse to not respect user's choices and absolve oneself of the blame, or regress to a lowest-common-denominator, "because the data says so".

"Not everything that counts can be counted, and not everything that can be counted, counts."

JohnFen · on Feb 8, 2023

Telemetry or no, a certain amount of guessing is inescapable.

JohnFen · on Feb 8, 2023

> When you hear the word telemetry, if you’re like me, you may have a visceral negative reaction to a mental image of intrusive, detailed traces of your every keystroke and mouse click headed back to the developers of the software you’re using.

But that's not my only objection to telemetry. Equally important to me is that so many bad decisions are justified based on telemetry. It's very easy to misunderstand the data, because telemetry leaves out so much, but developers often treat it as if it's giving a complete picture.

As an example, I have seen developers drop really important functionality on the basis that it is rarely used. While that was true, it was also true that when those rare times happen, that functionality was absolutely critical to have.

userbinator · on Feb 8, 2023

Or they use the data as an accelerant: move rarely used features to places where they're even less discoverable, making them even less used, and then remove them altogether. The justification then becomes a self-fulfilling prophecy.

silisili · on Feb 8, 2023

Very much against this. Sure, it sounds naive enough, and can give reasons why. But I have 3,436 items in /usr/bin. What if -every- one of these started doing their own telemetry, their own envvars, etc?

If we have to deal with telemetry, then I'd instead hope that there can exist a single telemetry systemwide interface. Not sure how that would be designed or implemented, but would be better than everyone doing their own bespoke thing. Plus easier for me to disable them all in one go.

jodrellblank · on Feb 8, 2023

> "What if -every- one of these started doing their own telemetry, their own envvars, etc?"

What bad thing are you suggesting would happen if they did? Your computer and internet connection can't handle four thousand strings or four thousand HTTP POSTS, or four MB more disk space of telemetry libraries? I bet it can. This isn't a technical problem, it's a control and consent problem.

tveita · on Feb 9, 2023

For one thing a classic way of downplaying metrics, "we're only logging X bits of information", turns into 3000*X. And here X is huge already.

If I get access to detailed metrics from go, gcc, make, tar, gzip, bash, python... of course I can tell which programs you have been running (and frankly, I'm disgusted)

Macha · on Feb 8, 2023

I wouldn't be so sure it's not also a technical problem, the limit of execve can be as low as 128kb which for 4,000 strings gives a maximum of 32 characters per NAME=VALUE environment value

jodrellblank · on Feb 8, 2023

Free disk space can be as low as zero, but we don't blame the tool makers for adding an extra 100Kb or 20MB, we blame the computer owner for not having enough disk space to install the thing they chose.

Wrapper scripts for every utility to do

    UTIL_TELEMETRY_OPT_OUT=1 util ...

so they don't need to be set all at once.

JohnFen · on Feb 8, 2023

> we don't blame the tool makers for adding an extra 100Kb or 20MB

Honestly, I do. Code bloat is a real thing.

jodrellblank · on Feb 9, 2023

Honestly, I do too, but the world doesn't. If you said "I have 3,500 binaries on my system, imagine if they ALL added 1MB" the reply would be "3.5GB is about twenty cents of NVME storage space" not "oh my, you're right that would be intolerable".

arccy · on Feb 8, 2023

maybe the problem is why do you need 3436 binaries there?

mordae · on Feb 8, 2023

I dunno. It sure makes sense to me to collect telemetry from free software installations, but I feel that having every platform or even piece of software to do it on its own with opt-out will inevitably lead to people being overwhelmed and angry.

I would, personally, prefer a single non-profit service that would list publicly what is being collected and publish the results as open data for anyone to use. Applications (at least on Linux) would not submit their reports directly, but would use a local relay service that could be turned off completely or that could filter what reports to send to the server and what to /dev/null.

Distributions and other software stores would then make it mandatory for software to use this relay and either patch out any other telemetry from their packages or straight out forbid those that would not comply.

gen220 · on Feb 8, 2023

I think the issue of telemetry is fundamentally a human issue of incentives and trust. The system you describe is wise because it recognizes this and attempts to address it.

The difficulty with telemetry is that even if we design the perfect, privacy-preserving system to begin with, once the pattern of having a network port open is established, there's nothing to prevent us (humans) from changing our policies about what we're allowed to push/pull over that port.

In real-world analogues for these kinds of thorny policy problems, we have centralized arbiters to solve these problems. That might be a fruitful course of research for people interested in this problem to explore.

Unfortunately, even though this problem has software as its medium, it is a problem that cannot be solved by clever software alone, despite any appearances to the contrary.

JohnFen · on Feb 8, 2023

> The difficulty with telemetry is that even if we design the perfect, privacy-preserving system

The other difficult is what you mentioned: trust. Even if a piece of software really does telemetry in a perfect, privacy-preserving way -- as a user, I have to take the developer's word for that in the end. That's a hard hurdle to pass, because that trust has been violated so much in the past that nobody gets the benefit of the doubt anymore.

> Unfortunately, even though this problem has software as its medium, it is a problem that cannot be solved by clever software alone

I agree entirely. At the heart of it, this is not a technological problem. It's a human one.

teraflop · on Feb 8, 2023

I am all for transparency and limited intrusiveness of telemetry.

But in practical terms, the problem with this approach -- if I'm understanding it correctly -- is that it has no way to detect and reject outliers, and therefore the data can't be validated in any way. It only makes sense if all your clients are 100% trustworthy.

Let's say you want to know whether to keep supporting ARMv5, and your data says 10% of users are using it. There's no way to tell whether that's accurate, or if you have 0.01% of die-hard users who modified their telemetry code to report 1000x as frequently as they're supposed to. Even if you suspect this is happening (and you might not), there's no way to identify the culprit and filter out their data without tracking personal identifiers such as IP addresses.

So even if most of the time the telemetry data is valid, over time it will trend toward uselessness, because it can be endlessly second-guessed unless it confirms a decision you wanted to make anyway.

ergonaught · on Feb 8, 2023

On-by-default makes me question whether rsc's judgement has been compromised, which leads me to question continuing to use the language. A strange miss for him.

nicce · on Feb 8, 2023

Off-by-default in a scale likely means that there is no telemetry at all. I would not cancel a guy or programming language based on just suggesting that. He has given a lot of though for that if you read the blog posts.

cwkoss · on Feb 8, 2023

If a take a dollar from everyone but it's opt out, that's still theft.

If I make it opt in, nobody is going to give me the dollar, but that doesn't make opt out morally justifiable.

nicce · on Feb 8, 2023

You are comparing apples to oranges. Telemetry is a curse word these days, but you should still read his posts.

ergonaught · on Feb 9, 2023

I would expect nothing less of him than to give a topic a great deal of thought and devise a principled and rational solution.

I am, however, reminded of a quote from Peter Drucker: “There is nothing quite so useless as doing with great efficiency something that should not be done at all.”

I’m not picking nits regarding the overall proposal. I’m questioning the judgement that concluded/rationalized “on by default” is the right thing to do.

Also not “cancelling” at the moment, either, but definitely reassessing my future language choices and taking a more critical appraisal of Go’s direction/choices. This isn’t really an isolated incident, and trust has accumulated some dents.

autoexec · on Feb 8, 2023

> Although the report would not include any identifiers, the TCP connection uploading the report would expose the system’s public IP address to the server if a proxy is not being used. This IP address would not be associated with the uploaded reports in any way.

Any fully transparent data collection is going to have to include IP addresses and timestamps. Even if the IP isn't being used for debugging, the software still phones home and the IP is still being collected and logged when it otherwise wouldn't be. Either when uploading the report or when downloading the “collection configuration”.

Honestly, assuming full transparency, I'm not opposed to the concept. I question how much telemetry is actually necessary, but I'm certain there will be times when it's nice to have. It'd also be interesting to see how it would go when for once people can see exactly what is collected, when, and from where.

I'm not sure that Google is the best place to showcase such a concept though. I'm sure there are a lot of people who have no problem with handing more data over to Google, but Google has abused the public's good will for the sake of data collection many times, and it's sure to put off some of the people who aren't already completely disgusted by the idea of their favorite open source projects collecting telemetry.

mananaysiempre · on Feb 8, 2023

> Any fully transparent data collection is going to have to include IP addresses and timestamps. Even if the IP isn't being used for debugging, the software still phones home and the IP is still being collected and logged when it otherwise wouldn't be. Either when uploading the report or when downloading the “collection configuration”.

How do you verifiably not collect users’ IP addresses when receiving data from them? The verifiable part is the problem, of course you can (and should) just not log the addresses, but then the users can only trust you (and hope you or your uplink haven’t received any legal orders to the contrary). The only approach I can think of would be a Tor hidden service, but while it would technically work, as far as not exposing your users to scrutiny it actually sounds worse.

rsc · on Feb 8, 2023

The only option is to have a proxy sit in the middle between the uploader and the server. You mentioned Tor but it doesn't have to be Tor, just some proxy most users would trust not to collude with the server and that doesn't itself derive benefit from seeing the IP addresses. If there were a different entity that could be relied upon to run servers doing this and were highly trusted by users, I'd be interested to use it. Failing that, the usual answer for an enterprise or company is to run their own HTTP proxy. The design explicitly supports that.

geodel · on Feb 8, 2023

> their favorite open source projects collecting telemetry.

Their favorite Google open source project. This is specially important for project which can't realistically exist without main sponsor / benefactor. It also help people to pay whatever little/high cost in term of conscience when they take part or consume something willingly but do not approve of makers.

ddevault · on Feb 8, 2023

This is not okay. The only ethical way to do telemetry is opt-in. If not enough people are opting in, you need to incentivize them to -- most simply by just paying them for their data. After all, telemetry is "valuable", isn't it? But if you can't figure out how to convince people to opt-in, then tough luck, sucks to be you.

Opt-in or GTFO, Google. I'll be patching this out of the Alpine package for Go the day it ships.

gavinhoward · on Feb 8, 2023

You and I may not agree on a lot, but I sure agree with you on this one.

deathanatos · on Feb 8, 2023

This week one of my tasks is to figure out how to neutralize some telemetry in one of our apps. We had no idea it was there, we do not want to be sending data. Last week, the parent company decided they didn't want to maintain the telemetry server any longer, and got rid of it.

Now the tool has generated thousands of log messages that it can't phone home.

And so it must be silenced, since it is cluttering up the logs, generating false alerts, etc.

Please, no more.

JohnFen · on Feb 8, 2023

The existence of telemetry is the main reasons why I avoid using new software anymore. Really, opt-in, opt-out, it doesn't matter. I can't trust that any of those mechanisms actually work, that if I opt out, an update won't reenable it, or that the data collected is actually limited and anonymized.

_ph_ · on Feb 8, 2023

If there is any virtue to collecting telemetry, make it opt-in. Any developer convinced of this being useful will gladly enable it. But making it opt-out is just nefarious, because most users will not be aware of it.

Thaxll · on Feb 8, 2023

This is naive, no one ever turn telemetry on if it's turned off by default, that's the reason why it's on by default.

ocdtrekkie · on Feb 8, 2023

> no one ever turn telemetry on if it's turned off by default

If nobody would voluntarily do it, why do you think it's okay to do it at all? By your very admission, nobody wants this. Because if they did, they'd turn it on!

_ph_ · on Feb 8, 2023

Still, opt-out is just inacceptable. At least with a mechanism which can easily fail, like setting an environment variable. This basically forces you to wrap the go tool in a script which ensures the environment variable to be set.

As this seem to cache the results, another option is to fiddle with the cache to report bogus information.

Thaxll · on Feb 8, 2023

You can set env variable for the go toolchain with a command such as: go env -w TELEMETRY=off which will be written to disk and use by the go cli.

_ph_ · on Feb 8, 2023

Where will it be written to and how is it guaranteed to be picked up by any further invocations of the tools?

JetSpiegel · on Feb 9, 2023

What if a new version of go uses TELEMETRY_ENABLED? Do you read all the changelogs, always?

simoncion · on Feb 9, 2023

Yeah, given go's history with breaking changes (and the habit of OKR-chasing managers and Senior Staff to make Number Go Up to look good on the promo packet), I definitely would not trust any opt-out mechanism to not receive operationally-significant changes in the next five years.

Thaxll · on Feb 9, 2023

As far as I known go never changed any flags since release ( 2012 ), is it good enough for you?

msla · on Feb 8, 2023

"This is naïve, no one would ever allow me into their homes if I asked first, and how else would I find out what diseases they have?"

lispegistus · on Feb 9, 2023

Setting aside the question if on by default telemetry is unethical in general, I personally think it is, my point in this comment is that in the context of open source it is impossible for it to be because:

The whole point of open source is the security of the rights and freedoms of the users, and in case of a conflict with the convenience of the developers, the user rights take priority EVERY TIME. If you're not ok with this, you should not write open source software. If nobody opts in to your telemetry scheme if it were the default to choose, too bad, you're just gonna have to live with it and respect user choice no matter how inconvenient or how much better the alternative would be for everyone. If you fail to grasp this very basic thing you will be better served working on proprietary products instead. OSS is not a product you own, it's a shared resource you are in charge of stewarding and the ethical burden is much higher because of that. I checked, Go uses a permissive license, Google is more than welcome to run a proprietary fork with telemetry built in. Keep that out of open source.

wrldos · on Feb 8, 2023

Imagine if GNU started adding telemetry to their compiler toolchain...

If that sounds fucking stupid, which it does, then so does this.

photochemsyn · on Feb 8, 2023

This is perhaps unintentionally amusing:

> To be clear, I am only suggesting that the instrumentation be added to the Go command-line tools written and distributed by the Go team, such as the go command, the Go compiler, gopls, and govulncheck. I am not suggesting that instrumentation be added by the Go compiler to all Go programs in the world: that’s clearly inappropriate."

Well that dispels any lingering thoughts I might have had about ever using golang for anything (not many to be sure). Someone feels the need to assure everyone that they won't be stuffing telemetry code into every binary their compiler produces? Google just wants all the data about everyone everywhere all the time...

https://www.komando.com/security-privacy/ways-google-invades...

creepycrawler · on Feb 8, 2023

If they add "telemetry" my response would not be to set an environment variable, but to uninstall golang. I used it a few years ago, both personally and in a work setting, but I'll do so no more in the future. Just my opinion.

kardianos · on Feb 8, 2023

This is well done. It only exposes counters, and rather then pushing data up, the telemetry server must know the names of what it can ask for. No wildcards.

mftb · on Feb 8, 2023

I hope this proposal is defeated and they don't implement this. I don't buy the premise that the benefit is worth the price. I think CLI tools like the ones in the Go Toolchain and their usage patterns are fairly well understood by this point. I'm sick and tired of every piece of software I interact with phoning home.

That said, as long as they give me reasonable means to configure the software the way I want, it's probably not a deal-breaker for me. In other words, I will just set the $ENV_VAR_WHATEVER to turn this off, and that's that.

bioemerl · on Feb 8, 2023

Honestly, this may be unpopular with hacker news, but just add your own telemetry. If people don't like it they can turn it off, and telemetry is essential for a good product.

Do let people turn it off though please.

groestl · on Feb 8, 2023

> If people don't like it they can turn it off

If I, perchance, encounter software I use phoning home without my explicit permission it's done on my systems. Period.

bioemerl · on Feb 8, 2023

That is fine, but in this case telemetry trades you (and other more hardline users) as a user for all the extra users you gain from instant crash reports, quick feedback, and generally better productivity.

I would never personally make that trade-off, and would always put (disableable) telemetry in my projects.

Spivak · on Feb 8, 2023

Well according to our telemetry 0% of users turn it off so it seems pretty popular.

But more realistically what you gain in privacy you give up in having your voice heard by the devs. The decisions about the future of the product/project will be driven by the data, specifically the data from the kind of people who leave telemetry on.

groestl · on Feb 8, 2023

See, I _am_ a dev. I run telemetry on my infrastructure, I analyse it and fix what's broken, and if necessary, try and get upstream fixed. Also, I'm not opposed to telemetry in general, but if a switch like this is turned on by default, trust is broken for good.

Software which does any type of computing without it's user's informed consent is classifiable as malware, mind you.

JohnFen · on Feb 8, 2023

If 0% of your users disable it, that kind of screams there's something wrong with your opt-out mechanism. Is it broken? Hidden? Difficult to do?

I mean, with any group of people, there will always be a percentage that will disable it. If the telemetry is popular, that percentage might be very small, but it would be non-zero.

jwilk · on Feb 8, 2023

I think you missed the joke.

JohnFen · on Feb 8, 2023

Oh! That's what the whooshing sound over my head was!

bioemerl · on Feb 8, 2023

Oh shit, I missed it as well.

groestl · on Feb 8, 2023

Well, those that turned it off are not phoning home, so the rest will be 100% ;)

Lammy · on Feb 8, 2023

> telemetry is essential for a good product

Up until now, you've had to make these design decisions on your own, relying only on perplexing intangibilities like 'taste' and 'intuition'.

waboremo · on Feb 8, 2023

Those design decisions were never made in a vacuum. They relied on telemetry (or, less scarily, user testing, user feedback, user research, etc) to figure out what works best. As a reminder, our intuition doesn't come from nowhere, rather centuries of survival and expectations. If you do not know what these expectations are, and if you do not know how your users interact with your product based on these expectations, you cannot make a good product. Certainly, you can make a product that appeals only to you, but how many of yous exist?

The design of a teapot is a great one. It didn't magically appear with handles and a spout and a place to hold leaves, but after years of refining based on usage. As shocking as that is, tea wasn't even discovered on purpose, let alone having a specific vessel for it right out the gate.

So yes, telemetry is essential. Taste is personal.

bioemerl · on Feb 8, 2023

I think it's somewhat silly to fly blind on the assumption that your taste is better than any real world observations you can make.

Especially if you haven't had the chance to develop an intuition yet and are new to the field. Without data to correct you, how do you get better?

msla · on Feb 8, 2023

> telemetry is essential for a good product

No, it isn't, and the idea that is is toxic.

bioemerl · on Feb 8, 2023

Essential is probably not right, this is true.

But I'm confident I'll make a better product with telemetry vs without.

pcthrowaway · on Feb 8, 2023

I think maybe even making telemetry mandatory with an open license, and customizable with a support license might be a sustainable way to run an open source company.

For many open source project (anything with an attached business model), either telemetry or tight communication around usage patterns will be necessary to inform development. The latter of those two options consumes business resources.

bioemerl · on Feb 8, 2023

Mandatory with an open license might shoot yourself in the foot when people fork your code.

For me, the harshest you can go is to have telemetry and not prompt for the setting on start with the user opted in by default. Ideally, you ask on first load, and that's what I'd probably aim for.

You can't ever try to force people to do anything in open source. They'll run right by you and make it do what they want it to do with or without you.

I'd even imagine paying customers are broadly more happy with telemetry than open source ones. And their needs more important anyways.

yencabulator · on Feb 9, 2023

> making telemetry mandatory with an open license

If it's mandatory to run the code that does telemetrics, it's not a very open license.

pcthrowaway · on Feb 9, 2023

Just because Linux is open source doesn't mean you can't have both Fedora and Red Hat (an enterprise version built on the same codebase)

I don't think any closed source goes into Red Hat, it's just the patch delivery pipelines, package repositories, etc that require a license. And support of course.

Same with any distributed system whose core contributors could gain insight from telemetry. All the components are open source, but they can package it all up and make it available under different terms.

If there's a community version and an enterprise version, you can then make telemetry required in the community version. If people don't like it, they can pull the package apart and put it back together however they want, or they can pay for an enterprise license.

yencabulator · on Feb 9, 2023

You can make submitting telemetrics a condition of some other agreement, such as copyright license on the Red Hat name, or a B2B support contract. That, however, is pretty far removed from what's discussed here; if the software license itself makes telemetrics "mandatory", then it's no longer an open license.

pcthrowaway · on Feb 9, 2023

I don't think I made myself clear.

The company ships one product which is a community edition of a bundle of open source software. That version has telemetry enabled and can't be disabled. Users who want to patch the code manually can of course disable it, just like they can disable the Ad lens in Ubuntu if they want to build it themselves, and those users will be off the beaten path and likely to run into issues that aren't easy to find paved solutions for.

They also offer an enterprise edition with a support license, on which telemetry is enabled by default, but can also be disabled.

yencabulator · on Feb 9, 2023

> bundle of open source software. That version has telemetry enabled and can't be disabled.

I'd recommend saying "and cannot be configured off without code changes", or something. Of course it can be disabled, if it's open source.

Go compiler without telemetrics opt-out would fragment the community even harder than Go compiler with telemetrics default on. While your scenario is, of course, possible, it's not very relevant to the Go compiler.

As for your original "might be a sustainable way to run an open source company", if the primary feature one gets buy paying is "easy to turn off telemetrics", that just doesn't sound like enough value.

pcthrowaway · on Feb 10, 2023

Well I mean. In a lot of the popular Linux distributions you kind of get what the distro comes with.

Can you configure openSUSE to use apt instead of zypper? I mean, sure, probably, it's open source.

Is it going to be straightforward? I don't know. I suspect it's going to be harder than it's worth, and you might need to rebuild the operating system image to change the directory layout to work with apt's assumptions or something.

So in practice, even though these things are all open source, people who bundle complex software lay down the paths that 99.9% of people will take.

I wasn't proposing "disabling telemetry" as the primary business proposition.

Instead I was suggesting that open-source companies spend a lot of time dealing with issues from people who use the open-source software in unanticipated ways. If "the beaten path" for using the software without support includes telemetry, they get value from that.

If people want to use the software without telemetry, then they're paying for a support license, so the issues they run into which the supporting company has no telemetric insight into are at least better aligned with their support resource allocation.