Sure. But I have a question: why? Why should we opt out of the telemetry? To me, this idea seems to not just be admitting defeat, it's ensuring defeat right from the start.
Telemetry should always be opt-in. Yes, that means vendors will get much less data. It's on them to deal with it.
On a related note, I wonder how long it takes until one of the vendors of popular CLI tools or desktop apps get fined for GDPR violation. I wonder how much of existing telemetry already crosses the "informed consent" requirement threshold. I'll definitely be filing a complaint if I find a tool that doesn't ask for it when, by law, it should.
I'll play the devil's advocate. Most people will shoot support e-mails at you which are more or less "app crashes". If you have not already encountered the problem you have to walk them through tedious debugging process. If you collect crash reports, you have probably already fixed the problem.
For usage data, it allows developers to focus on features that matter and know which ones you can remove.
For example I don't collect any data in my app, but it also means that I fear removing any features that are slowing me down, because I have no idea about how people use it.
As for why sometimes this would be better as opt-out, well, on iOS crash reports are opt-in, and only about 20-30% of users have them enabled. That is fine for huge programs or ones with little surface.
Your point is basically "surveillance data is useful". And, well, yes. There would be zero debate over surveillance if there were literally no desirable reasons to have it.
Calling it surveillance is misleading in most cases.
The most recent crash reporting system I worked with was little more than a stack trace without any user data recorded at all. Not even IP addresses of the report.
We didn’t care who was crashing and we didn’t collect any PII at all. It was a simple report about where in our code the crash occurred.
It was very useful for fixing bugs. No surveillance or PII involved.
If that was the norm then people would opt in. But the trust is gone now, bad actors have ruined it for everyone. And the solution to that is not to enable it by default.
I think this might be the winning argument. There may not be any meaningful reason for telemetry to outweigh the bad actors and damage they've done.
For me, i'd love to enable telemetry for some of my more liked, FOSS apps - but even with those my question immediately arises "What are you sending?".
Without someway to monitor, are they sending filenames? Are they sending file contents? How much is it? etcetc
To satisfy my questions i need some sort of router-enabled monitoring of all telemetry specific traffic. So i can individually approve types of info... and that seems difficult. But the days of blanket allowances from me are long gone due to the bad actors.
Excellent point. As a user, here are my requirements if you want me to opt into your data collection scheme:
1. All exfiltration of data must be under my direct control, each time it happens. You can collect all the data you want in the background, but any time it is transmitted to the company, I must give consent and issue a command to do it (or click a button).
2. All data that is exfiltrated must be described in detail before it is exfiltrated. "Diagnostic data" isn't good enough. List everything. Stack trace? Crash report? Memory dump? Personal info (list them all out)? Location information? Images (what are they, screenshots? from my camera?) Time stamps from each collection. If it's nebulous "feature usage data" then list each activity that is being logged (localized in my language). Lay them all out for me or I'm not going to press that Submit button.
3. I need to be able to verify that #2 is accurate, so save that dump to disk somewhere I can analyze later.
4. The identifiers used to submit this data should be disclosed. Is a unique user id required to upload? Do you link subsequent uploads to the same unique id? Is that id in any way associated with me as an account or as a person in your backend? I want you to disclose all of this.
5. For how long do you retain the data I sent? How is this enforced? Is there a way for me to delete the data? How can I ensure that the data gets deleted when I request it to be deleted?
6. Do you monetize the data in any way, and if so, am I entitled to compensation for it?
I don't know of many (if any) data collection schemes that meet this bar, yet.
If you install an app via Google Play, crash reporting & some telemetry are silent and enabled by default. This is in addition to any crash reporting built into the app.
Analogy time: in the early days of the Internet, any IP address could send an e-mail by directly connecting to the mail change host listed for the domain of the To: address via SMTP. This was a good, nice thing.
Bad agents, spammers, ruined that. The trust is gone, and wow it's a terrible idea to accept SMTP connections from any random IP addresses.
Legitimate senders have to "opt in" to the SMTP sending system by getting a static IP of good repute, or else using a forwarding host which has those attributes.
That, and additionally, if there's an opportunity to monetize data from (or derived from) telemetry further, companies won't hesitate selling it (or access to it) to third parties.
That, and additionally, if the data is stored, it's a question of when, not if, it'll become a part of some data breach.
To developers and entrepreneurs feeling surprised by the pushback here, please take a honest look at the state of the software industry. Any trust software companies enjoyed by default is gone now.
I think the old crash report screen will be answer to the paranoid crowds now. App crash the stack trace windows appears with the data and a button to click to email the stack trace to the developers. That used to be the norm before privacy data became lucrative business.
It sounds like there must be a standard of privacy for certain apps.
I work with UL a lot and they have lists of standards and specifications that help us meet the safety requirements of electronic devices. These standards are then used to meet the customers demand for a high level of safety. Customers in my field do not even consider products that don’t have UL. This strategy is good to better inform the consumer while standards are kept by an independent firm whose incentives are aligned to maintain their credibility.
I am not deep in the software field but I can imagine that groups like the EFF or similar orgs have a standard. The issue is that the consumers of these products don’t seem to care about this outside of the privacy advocate world.
In Linux, we use `abrt` for that, but user need to review and submit bug report with stack trace manually, so nobody complains about that. Yep, it's very useful.
How do users know if that's all the data that's submitted? Auditing every program to see exactly what gets sent (and repeating the process for every update) is way too much work; it's safer to just opt-out by default.
Now that I think about it, it's safer to simply not use software that opts users into studies (like telemetry analysis) without informed consent.
> How do users know if that's all the data that's submitted?
That's the thing, isn't it? They'll never know. They can't; it takes deep technical knowledge to even be able of conceptualize what data could be sent, and how it could be potentially misused.
Which is to say, it's all a matter of trust. Shipping software with opt-out telemetry (or one you can't disable) isn't a good way to earn that trust.
Even with deep technical knowledge and severely pared down telemetry, PII embedded in side-channel like outlets could be missed. Think a simple stack trace with no data is PII-free? Probably. But are you sure that the stack of functions called doesn't depend on the bytes in your name?
The anti-tracking point in the era of Do-Not-Track was “building advertising profiles that follow people around the web is evil.” Software telemetry is neither a personal profile, nor for advertising, nor correlated across more than one surface. As such I think it’s a little bit of a stretch to put this in the same bucket as 3rd party JS adtech crap.
> Your point is basically "surveillance data is useful". And, well, yes. There would be zero debate over surveillance if there were literally no desirable reasons to have it.
Well, not exactly. The original comment was that "surveillance data" is useful to the user and their installation. For instance in getting better response to your error report. Or, I'd say, in keeping your software patched and up to date (check for updates is included in the OP as suppressed by 'do not track', since it necessarily reveals IP address).
Even if none of these things that make it useful to the actual users were in effect... surveillance would still be happening because it lets the vendor monetize the users data.
Useful to whom and for what seems relevant, instead of just aggregating it all as "sure, it's useful in some way to someone or it wouldn't happen."
> For usage data, it allows developers to focus on features that matter and know which ones you can remove.
I find tools that don't do this are generally more powerful because they allow for deep expertise and provide a ton of payoff if you put in the effort.
E.g: Vim. 80%+ of users probably don't use macros. Hell, I use them <1% of the time. But I'm sure glad they're there when I need them.
No, you can't remove it. Even though I'm using it rarely it's existence might be the reason for me to use the tool at all, so that when I need it the feature is available.
This came about with audacity. There I have my set of standard filters I run all the time, even though they don't bring much benefit, they are there and nice. They will be on top of a usage statistics. Then there are filters I need for a special effect or to repair something really broken. Those I use ahrdly, but when, they make the difference.
Or when talking command line: `ls` without options I use a lot (Well actually a lie, i have some alias in my shell rc), sometimes I use `-a` or `-l`. This doesn't mean that maintainers should remove `-i` since once per year or so I need it to compare inodes with log entries or something and then it's important that flag exists.
You need qualified information about what features are important. Not unqualified statistics.
Ah, so the question of whether to remove feature A or feature B is solved by simply not removing either. That's brilliant!
I get that this might not be a popular sentiment, but resources are finite. If we have a situation where we can't maintain both features, which one do we focus on? Usage metrics can absolutely be beneficial there.
I don't think it's usually a problem of "remove either A or B", it's usually "should we remove A or invest work in making it compatible with changes ahead?".
I see how usage telemetry can be useful in deciding whether or not it's worth it to keep supporting a feature, but I offer two counterbalancing points:
1. What people may be worried about - what I myself am worried about - is the methodology creep; it's too easy to end up having telemetry drive feature removal decisions, as in, "monthly report says feature X is used by less than 1% of users, therefore let's schedule its removal for the next sprint". The problem here is, telemetry alone will likely lead you astray. It's useful as a data source, not as the optimization function of product development.
2. If a feature you're worrying about has significant use, you most likely already know it without telemetry - all it takes is following on-line discussions mentioning your product (yes, someone might need to do it full-time). If removing the feature will have major impact on your maintenance budget, and non-telemetry sources don't flag this feature as being actively used, you can just axe it - revenue hit from lost userbase you've missed is unlikely to be big.
From this follows that the telemetry is most useful for deciding the fate of features that aren't used much, and don't cost much to maintain. At which point, I wonder, do you really have such low margins that you can't afford to carry the feature a while longer? I'm strongly biased here, because I'm going only by my personal experience - but I'm yet to see a software company[0] that doesn't have ridiculous amounts of slack. Between the complexity, management mess-ups, piles of technical debt and the nature of knowledge work being high-variance, having a feature slow your current development down by half[1] won't have much long-term impact.
--
The question thus is, are the gains from usage telemetry really worth the risk and potential ethical compromise? Would those gains be significantly lessened, if the telemetry was opt-in, and the company put more work into getting to know the users better? I suspect the answer is, respecting your users this way won't hurt you much, and may even benefit you in the long run.
--
[0] - Other than one outsourcing code farm I briefly worked in (my boss loaned me for a couple weeks to a friend, to help him meet a tight deadline), but these kind of companies don't make product decisions, they just close tickets as fast as possible.
[1] - And hopefully leading some devs notice the need for a refactoring, in order for that feature to not be a prolonged maintenance burden.
I'm a hundred percent certain that 1 is what Spotify does. It seems every release they remove or hide features that I use occasionally (e.g. clicking on the currently playing album picture to reveal the active playlist). It's extremely frustrating that since power users now are in a minority they are completely ignored.
At the first release like this they basically said "This new version has a lot less features, but you can vote on which ones we will add back!". They added none of the ones that got votes, and eventually removed the feature voting system altogether.
> Most people will shoot support e-mails at you which are more or less "app crashes". (...) If you collect crash reports, you have probably already fixed the problem.
Fair enough. Still, there are two separate steps here: collecting crash reports and sending them. What if the app asked if it can send the report, letting you optionally review it? Many programs today do that, I think it's an effective compromise. Additionally, the app could store some amount of past crash reports, and the places for the users to get the support e-mail (a form, a button, an address in a help file...) could request you to check for, or automatically call up, those past crash reports, and give the user choice to include them. The way I see it, the app should give users near-zero friction to opt-in, but still have them opt-in.
It won't solve the problem of bad support requests completely, but nothing ever does - random people will still write you with problems for which you have no data (e.g. network was down when crash occurred), or for which no data exists (because requester is a troll).
> For usage data, it allows developers to focus on features that matter and know which ones you can remove.
I accept this as an argument in favor, though personally, I don't consider it a strong one. I feel that "data-driven development" tends to create worse software, as companies end up optimizing for metrics they can measure, in lieu of actually checking things with real users, and thus tend to miss the forest for the trees.
Picking good metrics is hard, especially in terms of usage. The most powerful and useful features are often not the ones frequently used. Like, I may not use batch processing functionality very often, but when I do, it's critical, because it lets me do a couple day's worth of work in a couple minutes.
So, for me, can usage telemetry improve software? Shmaybe. Is it the only way? No, there are other effective - if less convenient - methods. Is the potential improvement worth sacrificing users' privacy? No.
> on iOS crash reports are opt-in, and only about 20-30% of users have them enabled. That is fine for huge programs or ones with little surface.
I feel the main reason this is a problem is because of the perverse incentives of app stores, where what you're really worried about is not crashes, but people giving you bad reviews because of them. Mobile space is tricky. But then, forcing everyone into opt-in telemetry doesn't alter the playing field in any way.
This would be tricky - the nature of software means some people would try to keep crashing their programs on purpose, sending repetitive crash reports, in order to make money. Developers would now have to deal with a flood of spam in their crash reports.
I think the only way crash reporting can work, outside of support contracts, is as a favor by the user to the vendor. But, to maximize the amount of such favors, the vendor would have to treat users with respect - which is pretty much anathema to the industry these days.
Valid concern, but bug bounties are a thing; It’s up to the developer to decide if the bug is worthy of a payout. Maybe make it so that if it crashes, and they provide a useful log (or steps), then you pay out.
From what I know, bug bounties already have a spam problem. I definitely saw some devs in my circles complaining about people repeatedly sending garbage submissions in hopes of getting a payout.
What bug bounties also have is a big barrier to entry. You generally need to be at least marginally competent in software development, and do plenty of leg work, to make money with them. Turning regular crash reports into bug bounties removes that barrier, amplifying the spam problem.
One point here - support emails do not help you identify problems that less invested users may be having with your product.
For example, I used to work developer relations on TensorFlow. We wanted to make the framework accessible to enterprise data scientists. The problem was that these users were not familiar with the tools that we commonly used to get feedback - GitHub issues, the mailing list, etc.
Most of them were using TensorFlow on Windows via Jupyter, which wasn't well-represented among the users that we had frequent contact with.
It was really hard to understand the universe of issues that prevented most of these users from getting beyond the "Getting Started" experiences. Ultimately, these users are better served by easier to use frameworks like PyTorch, but I think a big reason that TensorFlow couldn't adapt to their needs is that we didn't understand what their needs were.
Another big problem is that it takes a certain level of technical sophistication to know how to send maintainers a useful crash report. If you rely on this mechanism, you will have a very biased view of your potential user base.
Sure, and I can spy on girls I like just so I can learn how to offer them a better experience when they meet me.
Having good intents does not justify skipping consent. The “opt-out” mentality is a very slippery slope, since you’re already stating that consent does not need to be explicit (hint: it’s not consent if not given explicitly AND freely).
I don’t think the simile fails for that reason. “Informed consent” is generally accepted as the standard, and it’s arguable that click-through EULAs are not in any way “informed consent” for the average person.
To bring the original simile full circle, let’s say a person signs a contract with another person, and then that person forces themselves on them. I doubt the presence of a clause in the contract saying “I agree to have sex with X” would absolve them of guilt.
There certainly is a difference in harm between forcing sex and sending unique user IDs, and it’s not unreasonable to make a contract about the latter if one wishes, is it?
This is rather a reason to prefer free software licenses. Culture has yet to catch up to this, but in the long run I hope the collective consciousness learns to distrust and avoid complicated proprietary software licences.
Why not collect the data locally and only share it after a crash when the user agreed?
Why not add a splash screen on the program start up that informs your user of upcoming plans so they can intervene? Like "Hey we are planning to remove feature X to speed up the program, do you agree?"
And this is because actual usage metrics don't really translate to opinions. I have features in programs that I use maybe once every two years, but then I really need them. Then there are other features I use daily and I really hate them with a passion.
That certainly is the devil’s side. Unfortunately too many firms have affixed phony halos and then exfiltrated the People‘a personal data. Opt-in is the only way the People will be able to choose whom they trust.
Telemetry alone doesn't tell you how valuable a given functionality is. A critical problem, one imposed through the tyranny of the minimum viable user,[1] is that the high value creator community of any tool or service is going to be small. Quite often 1%, if a very small fraction of that.
Your telemetry data will show you only that such features see little use. They won't tell you how much value is derived from that use, or what the effects of removing such functionality will be on the suitability and viability of your application.
Default telemetry is demonstrable harm for negative gains.
You have basically just given a justification that crash reporting can be conducted based on legitimate interest instead of consent, and as such does not require opt-in.
Many people mistakenly believe consent is the only possible justification for data processing under GDPR, whereas there are actually 6 possibilities, and you can ask a lawyer which one can apply for a given data processing flow.
Note that whereas I do believe that crash reporting can indeed be considered legitimate interest, I wouldn't consider plain telemetry ("phone home without a technical good reason") to fall under that umbrella...
Not everyone is using a flatrate and thus both crash reporting and telemetry might cost the sender some money (or calculate against their hugh speed quota). Some people seem to expect that data transfers are always free ...
That's one reason (besides privacy) why I have Netguard running as a firewall on our Android phones and set to block traffic by default for each app, unless the app's creator convincingly explains why their app should be allowed to access the net.
Actually, this is why I want telemetry to be opt-in. I have a consistent policy of providing telemetry and I want the software to be biased to my needs. I want them to conclude that some feature used only by some privacy-conscious user is unused and should be axed because I want the software to be hyper-tailored to me.
I want the software to be streamlined, have no features except what I'll use, and for the community to be specifically people like me. I want other people to not use the software and use up dev bandwidth.
And I love it when telemetry biases the stats towards me. That way all devs will eventually be making software for people just like me.
I know you're sarcastic, but it wouldn't be a bad outcome. Sure, the vendor would have to be particularly dumb in their usage of telemetry[0], but the result would be... software that is useful to you. All the professional features you need would be in there, with none of the glitter.
Of course, I would opt-in too, with the same mindset but different use cases, and the software would provide equally for both of us. Add in a few more people like us, and we'd end up with a quality tool, offering powerful and streamlined workflows. Those who don't like it would start using a competing product, and tailor it towards their needs. Everyone wins.
Reality of course is not that pretty, but at face value, it still beats software optimized to lowest common denominator, serving everyone a little bit, but sucking out the oxygen from the market, preventing powerful functionality from being available anywhere.
--
[0] - It's a mistake that's much easier to make when you're flooded with data from everyone, rather than having a small trickle of data from people who bothered to opt in.
I am actually not being sarcastic. It is a life goal for me to have most policy organized to serve me or people like me (along whatever genetic/social/cultural/economic grouping is most likely to benefit me). i.e. I encourage communities not like me to refuse participation in medical research, forcing participants to be in my ethnic group; I encourage stringent data sharing norms and a culture of fear around what is done with health data in socio-economic and ethnic groups that are not mine; I encourage organizations to have strict opt-in requirements, in general, which I have no problem meeting, so that tools are built to be best used by me and adequate for others.
My dream is that everything is above the adequacy threshold for everyone else so that they don't build their own equivalent tool but that everything is also past my pleasantness threshold. I think the most effective means of doing this is to focus existing products into being past my pleasantness threshold while ignoring others since high switching costs keep most people on the same path they were before, and because things like medical research they don't really get to re-optimize.
I understand that this sounds sarcastic, but it is not.
Well... then I apologize for assuming. I'm not sure how such philosophy sits with me, I need to think about it more. One thing for sure, what you say makes you the model of an ideal free market participant :).
My experience is third party products/add-ins (specifically McAfee Enterprise Suite) cause most of the crashes I have with Office products. Most OS blue scrrens are non-microsoft hardware drivers.
I was mainly referring to their apps on Mac and Android.. They crash a LOT. Especially Teams for Mac and Outlook on Android. And both use lots and lots of telemetry. Lots and lots of traffic to hockeyapp.com (which is a MS telemetry platform) and various telemetry URLs at microsoft.com.
Yeah, I understand this becomes necessary to minimize maintenance effort on rarely used features,but it feels like even more often it's used to turn useful applications into barren wastelands. Firefox, perhaps the best example, is a fraction of what it used to be.
This may be a noble goal but then it may also lead to the program never being updated. If you need to do a major update (underlying system library no longer works, dependency is deprecated and has a security hole, a new direction requires overhaul of backend) you may need to prioritise what to keep and what to let go.
A problem I encountered was also localisation. Once you localise your program, adding any string is exponential work. In this case removing features can give you a lot of slack.
> For usage data, it allows developers to focus on features that matter and know which ones you can remove.
Sorry, but if devs are requiring THAT tight connection with end users to MAINTAIN software, they are probably should stop and leave. Its impossible to figure out a new feature from the such reactive approach, and they would have to resort to more traditional way to interact with end users. Thus... making coverage analysis a totally redundant thing.
Tighter user connection is suitable for enterprise software, not for general deployment.
And why so much worries about removing working(!) features?
I agree wholeheartedly with the idea that it should be opt-in but both approaches are equally unenforceable. The inverse of what's suggested in the article would be:
export DO_TRACK=0
Project owners that want to track you simply won't take any notice of these flags anyway.
I agree. The approach I'd like to see is, standardizing on some kind of DO_TRACK for convenience [0], and then doubling down on legal enforcement of opt-in telemetry. Project owners should be incentivized to seek consent, by threat of legal action from data protection authorities - and then, standardizing on some sort of DO_TRACK flag would be a no-brainer for them.
As it is, letting the industry standardize on a DNT opt-out is just making telemetry more established as a standard practice, making it harder to argue that it should, in fact, be opt-in.
The problems we have with tracking on the web are in big part because it was an established practice before appropriate legislation against it was drafted. In the CLI space, we have an opportunity to nip it in the bud, because it's not - as of yet - standard practice for console tools to silently spy on you.
--
[0] - And while we're at it, standardizing on a browser-provided consent UI, instead of each site providing its own, with its own dark patterns. It's the same idea.
> And while we're at it, standardizing on a browser-provided consent UI, instead of each site providing its own, with its own dark patterns. It's the same idea.
We've already been there and it was basically shelved because users were indifferent and companies wanted the data regardless.
> Project owners should be incentivized to seek consent, by threat of legal action from data protection authorities
Definitely agreed, and I'd want to have some form of strict liability for data breaches, based on what kind of information has been leaked. Currently, a company holding data about me (e.g. name, email address, phone number, credit history) causes a large amount of risk to me, but themselves carry no risk in case of a data breach. They are the ones who can decide to collect less information, keep shorter retention policies, or restrict access to prevent a breach, but they have no incentives to do so.
Yes, this would be a complete up-ending of many business models, but if your business model relies on collecting data without collecting the associated risk, it's a business model that society shouldn't allow to exist.
The problem is a simple one. Most telemetry tools (Mixpanel, Sentry, etc.) don't give the developers who are adding them into their products the ability to quickly add respectful consent flows.
This really needs to be a feature of the telemetry tools in the first place. Because, ultimately, most telemetry is being implemented by startup engineers who are burning the midnight oil to complete the telemetry JIRA ticket before going back to the long list of other stuff they have to implement.
I have experienced this from all three sides - as a software engineer implementing telemetry, as a product manager consuming telemetry, and now as a founder who is building a tool to collect telemetry in the most respectful manner possible.
> now as a founder who is building a tool to collect telemetry in the most respectful manner possible.
Thank you for taking being respectful to users seriously.
I'd be very interested in learning how your consent flows look, and what other aspects of your product are driven by the goal to "collect telemetry in the most respectful manner possible". I couldn't see much on it on the landing page, so if you have a moment, could you provide additional information, either here or in private?
Consent is calculated at the time that each reports are sent back. This means that your users can grant and revoke their consent on a per-report basis, which is the only respectful way to do things.
We are also building programs which will deidentify reports on the client side, before any data is even sent back to our servers. This work is still in the early stages, but here's v0.0.1 of the Python stack trace deidentifier: https://www.kaggle.com/simiotic/python-tracebacks-redactor/e...
Besides Python, we also support Javascript, Go, and we added Java support last week.
I would really love to hear any feedback you have.
That is, are you relying entirely on trust and/or contractual obligations, or do you have some means of enforcing that the user of your SDK isn't cheating?
> Consent is calculated at the time that each reports are sent back. This means that your users can grant and revoke their consent on a per-report basis, which is the only respectful way to do things.
Correct. I like how you think about this. I assume the SDK user will be ultimately responsible for prompting the end-user for consent; I wonder if you have any "best practices" documents for the software authors, so that they don't have to reinvent respectful consent flow UX from scratch?
> We are also building programs which will deidentify reports on the client side, before any data is even sent back to our servers.
I don't see any code in that Kaggle notebook you linked (I'm not very familiar with Kaggle, I might be clicking wrong). Should I assume your approach is based on training a black-box ML model? Or do you use some heuristics to identify what data to cut?
Thanks for looking at the code, and for your feedback!
Here is a recipe for adding error reporting (reporting of all uncaught exceptions) in a Python project. The highlighted line shows that, when you instantiate a reporter, you have to pass a consent mechanism:
https://github.com/bugout-dev/humbug/blob/main/python/recipe...
We allow you to create a consent mechanism that always returns true:
> consent = HumbugConsent(True)
Of course, someone can always create their own subclass of HumbugConsent which overrides that check. We don't have a good way to prevent this, nor would we want to restrict anyone's freedom to modify code.
Our emphasis is on building simple programs that we can reasonably expect to run on any reasonable client without using an exorbitant amount of CPU or memory. For this reason, we aren't using black box ML models. Rather, we analyzed the data and came up with some simple regex based rules on how to deidentify stack traces for our v1 implementation.
We are in the process of doing this for more languages and building this into a proper deidentification library that can be imported into any runtime - Python + Javascript + Go + etc.
"If you want me to test your app(lication), pay me."
- so-called "power user"
What many of today's software authors want/expect is free testing.
"To me, this seems to not just be admitting defeat, it's ensuring defeat right from the start."
While I do not use any of the example programs the mentioned, it seems like these environmental variables would be appropriate if the user wants to toggle between tracking and no tracking. However, for users who would never want to enable tracking, "no tracking" should be a compile-time option. It would not suprise me if that is not even possible with these programs. How is the user supposed to verify that "Do Not Track" is being honoured.
> What many of today's software authors want/expect is free testing.
Many of apps and tools are open-source and free. While I assume everyone wants to provide best experience, it's hard for me to justify being angry for bugs and problems in tools that I got for free, not bought them.
Secondly, the industry realized that going fast, releasing often, measuring results, and improving over time is a winning strategy. No matter how often we as users will complain that "they changed something again", we still want to get things fast. Deploying new version once per year is not something we would really like in most cases.
And fast development cycle inevitable comes with bugs, but they can be fixed quickly, not in the next year. Because even if you spend 2 months on testing your app, it will still contain bugs that will surface after the first real user touches the app.
Let's imagine the ls command with telemetry. What happens when you make an error like this?
$ ls all-the-pr0n
ls: cannot access 'all-the-pr0n': No such file or directory
Um, what did I just tell the ls vendor? Who are they sharing that data with?
> Telemetry should always be opt-in
Opt in needs to be very precise and spell out exactly what is being shipped. For a lot of command line tools, telemetry is going to create more problems than it is worth.
> I wonder how much of existing telemetry already crosses the "informed consent" requirement threshold.
OPT-IN is so much better for users. But if analytics is helpful, then probably opt in means you get no data, unless you keep bugging people asking people to opt in - which would be horrible too
> But if analytics is helpful, (...) [opt-in telemetry] would be horrible too
I don't question that analytics can be helpful. I do question the degree to which it is, relative to other methods of gaining the same insights (such as better QA, user panels, surveying people, etc.).
I also don't think it would be horrible too. Inconvenient, yes. But horrible? People used to ship working software before opt-out analytics became a thing.
What I meant would be horrible experience - CLI apps halting and asking on every execution much like websites do with news letters, cookie banners, or paywalls.
How can users be incentivized to opt-in, particularly with a free-to-use application? I can see a case for ad-supported software, a developer could reduce or eliminate ads for that user in exchange for telemetry data...
Ask nicely. After all, you want them to do you a favor.
If you're less into respect and more into manipulation, offer them a meaningless trinket. A sticker on the app's home screen saying "I helped", or something.
Yeah, this makes opting out my responsibility. If a company collects the names of all the files in my home directory, it's my fault for not setting some random variable correctly. Oh, and you did remember to also set it in your crontabs, right? If not, oopsie! You're gettin' spied on!
This proposal is terrible and comes at the problem from the exact wrong direction. If someone wants to come up with a "export GO_AHEAD_AND_SPY=yes" envvar that enables telemetry, fine.
If you think GDPR always requires consent, you would be wrong. Consent is just one of many possible legal bases, and usually the one you use only when you can't use any of the rest. In this case, and more widely, it's not at all clear which types of personal data processing do or do not require consent.
I know consent is only one of the possible bases. My comment was not about GDPR per se. I mentioned this law because of the spirit behind it - the law itself sets the bar very low (arguably below the point of what I'd consider ethical behavior). The other reason I mentioned it is because GDPR is currently the only stick we (at least, some of us) have to push back on the invasive telemetry. It's not nearly enough.
GDPR notwithstanding, I'm of firm belief that any kind of telemetry in software should be strictly opt-in and require informed consent. I say should, it's an ethical view, not legal.
If the variable is widely implemented, then that provides the nice, single point of control that the users need.
In the FOSS world, we typically have distros between the applications and the users. If the applications honor the variable, then that's all the control that is required. A distro can implement an opt-in model by defining the variable with a value of 1 in the base system, so that it's present right from boot.
It's not a big enough of a change for people to actually choose Linux distributions over it - those who know about the variable will set it themselves, those who don't will be stuck with a bad default.
My issue isn't with the point of control - it's with the default. Telemetry of all kinds should be opt-in. People shouldn't have to worry that they're constantly being watched. They shouldn't have to hope that every single telemetry stream is operated by competent and careful software engineers, guided by honest and law-abiding managers. You know how this industry works; it's a rare case where a data collection scheme doesn't overreach, accidentally ingest too much, leak data, or turn malicious and pass it to bad actors.
I am installing Windows to a new machine right now. And here's the "Privacy Settings" setup step;
Title: "Diagnostic Data"
Explanation: "Send all Basic diagnostic data, along with info about the websites you browse and how you use apps and features, plus additional info about device health, device activity, and enhanced error reporting."
Right on point. Also its at the mercy of the application and even then one there must be a trusty third party who can certify that an application follows the spec. This is not practical.
The reliable and practical way is to have a ad blocker at the kernel level similar to browser ad blockers.
I really do believe the best solution is a checkbox at install, but the checkbox starts filled. Yes, that's technically OPT-OUT, but it's extremely obvious and in front of you to anyone who actually cares about opting out.
Maybe contractors should take whatever they want from your house while they are doing their job. If you don't like it you should monitor them more closely, and opt-out, or do business with others.
The software you write is yours. My data is not. You have every right to include or not include features, but you have no right to take my data without my permission. Your rights end where mine begin.
>> I wonder how long it takes until one of the vendors of popular CLI tools or desktop apps get fined for GDPR violation.
How would GDPR help with anonymous data? Say you have a CLI that sends back the frequency of usage for all top level commands daily. If the user doesn't log into the tool, or that information isn't sent then the developer would have IP address. If they discard that, how would it land under the remit of GDPR?
I'm curious because I think it's easy for small developers to try and jump on this bandwagon. The big companies will all have vetted their telemetry strategy with their legal teams and have compliance reviews in place, as well as people who will handle cleanup from data spills. Bob is less likely to have this for his popular CLI tool.
> Say you have a CLI that sends back the frequency of usage for all top level commands daily. If the user doesn't log into the tool, or that information isn't sent then the developer would have IP address. If they discard that, how would it land under the remit of GDPR?
I think it wouldn't, given proper handling of the IP address.
Where I'd expect your Bob to land in trouble is in mishandling crash reporting, in particular wrt. logging. It's very common for log files to accidentally acquire passwords or PII, or potentially other secrets protected by different laws. To be safe here, you'd have to ensure no user-provided data, or data derived from user input, ever enters the log files - which may include things like IP addresses and hostnames, names of files stored on the machine, etc.
Because massive chunks of the population will never turn it on not due to ideological commitment, but simply due to no knowledge that it exists. Furthermore, if we define “tracking” as broadly as “any crash reports and any checking for updates” that effectively means these features will not work, and the open-source maintainers will have a much harder time tracking down bugs and encouraging people to update to less buggy or more secure versions of their software.
Why not simply fork or choose not to install code you don’t like, rather than forcing your beliefs about what does or does not constitute acceptable code on the developers?
You are not entitled to crash reports - that applies to open source developers just as to anyone else. If you wan't crash reports, have some kind of wizard or command to submit them and point to that when a crash happens, but you must always gather informed consent before submitting that data.
I like how games typically handle this; if the application crashes a dialog appears asking if you want to send a crash report, and often tells you what's included in said report.
On a more CLI oriented design, take a look at Debian's report-bug program. It's completely transparent, and still gathers enough information that most times a one-line description of the bug is enough for anybody to understand everything that happened on your system.
Many versions of Android do the same (although the details may differ, that's the beauty and the curse of open source). After crashing a (on some devices, only after a few times), you get a popup that says something like "<app name> appears to stop unexpectedly, would you like to report this to the developer?"
If you click on the details button, you can see almost everything outside of pure hexdumps of RAM.
There's no need for automatic sample submission if you respect your users' privacy.
If it’s so important to the project the devs can ask users, educate them, and receive informed consent. There are plenty of ways a dev can force a user’s attention for a few minutes to hear their “pitch” and if the users still don’t want to opt in after hearing the reasons perhaps the reasons aren’t nearly as compelling as the devs believe them to be.
I don't believe its unreasonable for it to be opt-out. Building software is very hard, and even something as inane as "automatically report crashes to the developers so we can fix them quicker" or "tell us how many users are on each version so we can estimate the blast radius of some backward-incompatible change" would be categorized as tracking.
Here's the problem: people are idiots. You can manage or visit any Github issues page for a major project for ten minutes and recognize that even our industry is not immune from this. People also, overwhelmingly, use the defaults. When presented with the option to turn on tracking, most people won't, despite the fact that for most developers, its a legitimate good which benefits the user over the long-term.
You can say "well, if people want to be idiots, that's their right". Idiocy never remains in isolation. If they refuse to update the app, then update Windows and it stops working, Users don't throw their hands up and say "oh well that's my bad". They don't complain to Microsoft. They complain to AppDevs. That becomes a ticket, which is written far-too-often from the perspective of anger and hate. Its triaged by, usually, overworked volunteers.
Telemetry is not all bad. There is no "ensuring defeat" right from the start, as if its some war. Most developers just want to deliver a working project; telemetry enables that. Giving users the ability to opt-out, maybe even fine-grained control over what kinds of telemetry is sent, is fantastic.
> even something as inane as "automatically report crashes to the developers so we can fix them quicker" or "tell us how many users are on each version so we can estimate the blast radius of some backward-incompatible change" would be categorized as tracking.
Devil is in the details. Unless you are very careful, even a basic crash report may leak PII (commonly, through careless logging).
> Here's the problem: people are idiots.
I know what you're referring to, but I have a similarly broad and fractally detailed counter-generalization for you: companies are abusive. Their consider individual customers as cattle, to be exploited at scale. They will lie, cheat and steal at every opportunity, skirting close to the boundaries of what's legal, and considering occasional breaches into outright fraud as costs of doing business.
Yes, I know not all companies are like that - just like not all users are technically illiterate. But the general trend is obvious in both cases.
What this means is, I don't trust software companies. If a company asks me to opt into telemetry, with only a generic "help improve experience" blurb, I'm obviously going to say no. It would be stupid to agree; "help us improve the experience" is the single most cliché line of bullshit in the software industry. There's hardly a week without a story of "some well-known company selling data to advertisers". Introduction of GDPR revealed the true colors of the industry - behind each consent popup with more than two switches there is an abusive, user-hostile company feeding data to a network of their abusive partners. So sorry, you have to do better than tell me it's in the long-term benefit of the users - because every scoundrel says that too, and I have no way of telling you and them apart.
And now for the fractal details part:
> Idiocy never remains in isolation. If they refuse to update the app, then update Windows and it stops working, Users don't throw their hands up and say "oh well that's my bad". They don't complain to Microsoft. They complain to AppDevs.
Yes, idiocy on both ends. There's a reason why users refuse to update the app. It's because developers mix security patches, bugfixes, performance improvements, and "feature" updates in the same update stream - with the latter often being a downgrade from the POV of the user. I'm one of those people who keep auto-update disabled, because I've been burned too many times. I update on my own schedule now, because I can't trust the developers not to permanently replace my application with a more bloated, less functional version.
(Curiously, if usage telemetry is so useful, why software so often gets worse from version to version?)
Secondly, if the user updates Windows and your app stops working, it's most likely your fault. Windows cares deeply about not breaking end-user software, historically it bent over backwards to maintain compatibility even with badly written software. It's entirely reasonable to expect software on Windows to remain working after Windows updates, or even after switching major Windows version.
> Most developers just want to deliver a working project; telemetry enables that.
Telemetry does not enable that. Plenty of developers delivered working projects before telemetry was a thing. What enables delivery of working projects is care and effort. Telemetry is just a small component of that, a feature that gives the team some data that would otherwise require more effort to collect. Data that's just as easy to lead you astray as it is to improve your product.
> Giving users the ability to opt-out, maybe even fine-grained control over what kinds of telemetry is sent, is fantastic.
Yes, and all that except making the telemetry opt-in is even more fantastic. You want the data? Ask for it, justify your reasons for it, and give people reasons to trust you - because the average software company is absolutely not trustworthy.
The standard of "care and effort" in software engineering today begins with observability.
An error-handling branch without a counter on it will not get through code review. That an incident was detected through user reporting and not telemetry/alerting is a deeply embarrassing and career-limiting admission in a postmortem. That logs were insufficiently detailed to reproduce them problem will be a serious and high-priority defect for the team. Something like an entire app without any crash reporting is gross negligence on the part of the Senior VP on whose watch it happened.
I'm not really remarking whether this is good or bad, you're free to think this is a bad move, but from my perspective it is definitely the way the industry moved. Among my colleagues, releasing without a close eye on thorough telemetry is some childish cowboy amateur-hour shit.
You consent to a scope of work. If you want line item control over exactly how the work gets done, what tools the workman gets to bring, what creature comforts he is and isn’t allowed on the jobsite, that’s something you can negotiate.
Like if you’re Amish or under some weird historic preservation regime or working near a delicate billion-dollar scientific instrument. Perhaps you really need carpentry done with hand tools. You can find a contractor who wants to do that. You don’t hire a normal firm and then get mad that they failed to seek consent before plugging in their table saw.
Not if users get something in return. Like with the windows insider program. Opt in to get betas and provide automated feedback. If MS turned off telemetry for regular users they'd be doing a great job.
"PRs and Status" is a very optimistic headline for a list of rejected pull requests.
I like the idea, but the execution leaves a lot to be desired. I can understand why some Homebrew devs think it's just an attempt from someone to pad their resume. It's essentially a single person setting up a website, then submitting a bunch of untested pull requests to a bunch of projects.
I imagine this would work much better if a large distro like Debian would adopt this first. They have the credibility and weight necessary for such a project, they can make it much more useful by asking for the desired setting during OS setup, and they can make sure it's universally respected via patches in the packaging process. From there it would have a chance at wide adoption.
Sprinkling in the website URI into the source code of the PRs was definitely a bold move. I get the logic behind why it was done but one website does not constitute as a standard. If DO_NOT_TRACK were to be standardised then adding a URI to the published standard would have been more obviously intended as a constructive comment. But when the URI is a hobbyist's personal website, as well intended as it was probably meant to be, I can completely sympathise with why maintainers were sceptical about the sincerity of the PRs.
That all said, there has to be a path for unknowns to contribute good ideas back FOSS and I do think this is actually a pretty good idea that deserves to gain attention.
well, good luck getting buy-in from the cli tool devs, then? the other option requires absolutely zero buy-in from homebrew, gatsby, dotnet, or any other cli.
> I can understand why some Homebrew devs think it's just an attempt from someone to pad their resume.
I can't. That suggestion seems the product of desperate reach for an ad hominem attack.
There are clear, plausible and obvious reasons for the projects existence that require no seeking of hidden motive to understand or explain.
I'd agree OS-level integration for this stuff would be better, supposing OS manufacturers would also elect to include their own telemetry under such an umbrella.
I'd love it to gain more traction. It was an idea and I thought it would be better an idea and a website than just an idea.
It was strange to see it get labeled as a marketing attempt during my attempts to gain some
traction, considering I'm not selling a damn thing.
I have severe focus issues, so it had to be a one-day project unfortunately, which is why a couple of the patches weren't tested very well. Treasure your flow state when you get it.
Ultimately I didn't invest any more time into it as developers who put default-on spyware in their apps don't actually want more people opting out. It's a doomed concept.
Indeed, because it assumes developers who are doing opt-out tracking will respect this voluntarily.
I disagree with the poster above about Debian adopting this. They shouldn't adopt DO_NOT_TRACK, they should ban any package that does tracking by default from their repo; in distros that already keep non-free software out of the standard repository this wouldn't be that much of a leap. This seems necessary, as there's no sign that a "please don't track me?" flag will work better in the terminal than it does now in the browser.
> They shouldn't adopt DO_NOT_TRACK, they should ban any package that does tracking by default from their repo
Debian maintainers are amazing and they go one step further: they patch it out when they distribute it. They also patch out old version time bombs and all manner of other phone-home.
As an open source software developer, I do have some sympathy for the upstream devs here, and some frustration with distro maintainer policies. I'm not interested in getting a bunch of bug reports for issues that were fixed 4 years ago, or introduced because Debian maintainers "patched out" something they shouldn't have.
As a Debian Developer, I don't share your view. Debian users want a unified release cadence. That's why they're using Debian. For example, they specifically don't want random packages to update, changing behaviour, because upstream bundled feature changes with bugfixes.
I understand that upstreams might get frustrated that their bugfixes haven't made it to stable distribution releases, but it's important to understand that expecting otherwise (except with manual, per-fix intervention) is generally against the principle of having a stable distribution release in the first place, and exactly why users are choosing to use stable distribution releases.
> Debian maintainers "patched out" something they shouldn't have
Debian sets its own policy about what is and isn't acceptable, in order to give users consistent behaviour across all packages. Again: Debian users want this consistency; users who don't want this use other distributions (like Arch for example, which aims for the opposite). An example is this topic: Debian maintainers generally patch out telemetry-by-default.
> I'm not interested in getting a bunch of bug reports for issues that were fixed 4 years ago, or introduced because Debian maintainers "patched out" something they shouldn't have.
I agree with this part. Distribution users should be reporting bugs to their distribution bug trackers in the first instance, and only sending reports upstream in cases that the bugs are confirmed to be relevant to upstream.
That's unnecessarily harsh. Distro maintainer's primary responsibility is making the Distro as a whole work together and sometimes that means choices that are not optimal for individual programs/libraries on their own. But packaging itself does already reduce the burden on upstream a lot by preempting any build-related support requests from users as well as many compatibility-related ones.
Sometimes upstreams interest is also not aligned with the user's interest (e.g. the topic of this thread) and there the distro will tend to choose the user's interests - that's a good thing.
As for time bombs specifically, those don't make much sense when the software is installed via a repository that has an update mechanism. Not wanting bug reports for old versions is no excuse for planned obsolence.
If you think a distro is increasing your support burden it is quite acceptable to tell users from that distro to use the distro's bug tracker.
DNT has already failed. It is long gone. A lot of people spent a ton of energy trying to make it work but it isn't going to succeed. Discussion around it now is not productive and should instead be focused on new initiatives.
> I have severe focus issues, so it had to be a one-day project unfortunately, which is why a couple of the patches weren't tested very well. Treasure your flow state when you get it.
This isn't really an excuse. Imagine how somebody on the other side of a broken PR sees this? This 100% reads as "I don't care about your project enough to do the work and instead am doing this solely for my personal satisfaction".
I think there is useful discussion to be had as to why DNT failed. IMO it’s largely because people approached the problem from a purely tech-oriented standpoint and ignored any differing ideologies and incentives at play. An allegory to this project could be drawn.
From my perspective, a major issue with DNT was that it wasn't very clear to websites what a DNT request was asking for. The specification explicitly doesn't handle this: "This specification does not define requirements on what a recipient needs to do to comply with a user's expressed tracking preference" https://www.w3.org/TR/tracking-dnt/
Can’t say I’m surprised that putting one day’s of work into getting the industry to adopt a proposed standard you came up with yourself didn’t really work out. Based on your other comments you also seem to have some level of contempt for some of the projects that you were trying to influence.
Yes, contempt is an accurate assessment. Shipping nonconsensual spyware is unethical, and there are too many devs in our industry who are happy to behave unethically so long as their boss tells them to.
Creating more social and reputational consequences for individual worker bees who make such commits on the job is also on my to-do list.
Ultimately the opt out vars are token efforts by developers anyway. These projects only do the bare minimum of opt-out-ability because they have to be able to point to the opt-out setting as justification for their shipping of spyware-by-default. Making it easier to opt out isn't something they want, regardless of how much I do or don't mask my contempt for such unethical, user-hostile practices.
The way the Audacity thing is playing out is instructive. Many devs simply feel entitled to take over your machine as if it is theirs and your double-click is a blank check. My PRs against autoupdates have run in to similar developer resistance.
> Creating more social and reputational consequences for individual worker bees who make such commits on the job is also on my to-do list.
Wow. This is a terrible approach for a worthy problem. You want to make the options into "lose my livelihood via being fired or lose my livelihood via social and rep consequences"? That is awful.
I'm looking forward when I'll be cancelled because I left the default branch name as master on my public github repos.
No joke, in my mind this "solution" is on the same level as this monstrous cancelculture. You put a tracking in an OSS project that sends 2 anonym ID and maybe an IP address (which is easily spoofable anyway)? Time to cancel your career and your livelihood.
Historically, there have always nearly always been negative consequences for doing things that infringe the rights of others.
If someone else telling the truth about one's actions on their webpage is a threat, perhaps people should be more measured in the actions they undertake.
What I'm describing is reporting, not cancellation.
But please explain how publicly calling out and potentially doxxing developers would help this issue? Do you really want to target single developers and unleash the fury of the HN crown on them?
If we talk about morality and ethics, I think this is worse than implementing tracking, isn't it?
I know you guys wants to change the world and I absolutely agree that there are too many tracking in the world, but FFS, let's just take a step back and think about YOUR actions and their consequences.
Not everyone has the luxury to quit a job if they don't agree with the morality of their work. I know we are talking about OSS, but lot of the developers are living on the donations and sponsorships or even got bought out by larger companies and to keep their jobs, they have to do said implementations.
> It was strange to see it get labeled as a marketing attempt during my attempts to gain some traction, considering I'm not selling a damn thing.
Clicking on your HN profile, I see “I am available for hire”. It was clear to me that’s what the maintainers were referring to: you marketing yourself, which you’d have been able to do with greater effectiveness as the originator of a feature adopted by popular open-source projects (had it worked).
Note I’m not claiming that’s what you did, I’m explaining what you found strange.
I do not support this idea. The differences between the things they all group together under one umbrella is wild - ads, usage reporting, automatic updates, and crash reporting are all very different things and it's actively harmful to control all of them with a single switch.
At the very least, there should be two different switches - "ads" and "everything else".
If you don't want telemetry or crash reporting, that's fine - you may not care about helping the developers of an open-source project improve their software and that's your personal choice. Similarly, you may want to manually install security patches, while I want them to be automatically installed so I have one less thing to worry about.
There may even be some crazy person that wants ads. I don't, but I'm not going to try to take away their freedom to choose them, which is what this proposal would do - an all-or-nothing switch for non-essential network access.
Give me granularity. There's no (user-facing) reason not to.
Lots of people in the Free Software community think that programs should exist only to help the user, and should just do exactly what the user tells them to do and nothing else. I am one of those people. Showing ads, making network requests that aren't necessary to carry out the expected functionality, etc. all fall under violations of that principle. Some violations are worse than others, but I treat them the same way.
It's on the developers of the software to make these settings granular and off-by-default. If I want to help a developer, I'll go out of my way to do it and flip one of those switches if that's what it takes.
Personally, I don't think this goes far enough; I think programs should require explicit consent simply to connect to the network, and most should request permission to connect only to certain domains. But that's just me ;).
>it's on the developers of the software to make these settings granular and off-by-default.
why? privacy advocates in these discusssions always jump to this position and assume everybody is going to agree just because they've invoked the magic p-word. telemetry is useful, and most people don't really care enough to change the defaults one way or the other.
assuming that "because privacy" is not an argument that will sway me, can you explain why i should default to the less-useful option just for the sake of appeasing the people most likely to change the defaults?
> assuming that "because privacy" is not an argument that will sway me, can you explain why i should default to the less-useful option just for the sake of appeasing the people most likely to change the defaults?
No, I can't. I think that ethical principles like respect for users' privacy are more important than collecting data to fix bugs/features. Shareholders may disagree, of course; this is where developer agency and collective bargaining can come in, but that's a longer discussion.
> telemetry is useful, and most people don't really care enough to change the defaults one way or the other.
This sentence is correct, but it isn't enough to justify making telemetry opt-out. I don't think this is a situation where the apathy of the majority can overrule the rights of the minority:
- People can't always choose to use or avoid software; they have schools, employers, an inability to make informed consent, etc. that can prevent them from using your software with consent to its terms.
- Privacy is often a need, not a want. People--including their future selves--are often at-risk, and need software to respect their vulnerability by default (as auditing all transmitted data for all software is beyond unrealistic).
- A person should not have to justify having privacy. Others should have to justify taking it. That's how rights like privacy work; privacy is something we have by default until it is infringed upon.
- Rights like privacy, speech, and information access without censorship (from books to newspapers to the Internet) aren't driven by will of the majority. They're driven by the fact that preserving them for the minority is necessary.
Users have lots of things that would be useful to developers, but that doesn't mean developers are entitled to them.
I'm just going to point out that you accused privacy advocates of magical assertions, then did the same thing :)
(there is a point of view which says that you make good products through thoughtful and rigorous design, rather than pouring over databases of events to try and reverse engineer how people are actually using your product)
* Figuring out which features are not being used, and being able to change things according. This also covers UI decisions - Oh. We added this new toolbar. Is anyone actually using it?
* Prioritizing localizations of a particular country / language.
* An actual positive feedback loop for products that don't generate revenue. Most open source project maintainers just receive bug reports, feature requests, and sometimes hate mail. Telemetry information which roughly shows that even though there are some people complaining loudly about the software, there are 50k other users happily using it, is very very motivating.
* Crash reporting / exceptions monitoring - Bugs are inevitable, and this gives so so much information. In my experience, very few users actually report bugs. Many might even just rate your app badly, and then ignore any attempts at following up. Especially cause they have deleted the app.
> there is a point of view which says that you make good products through thoughtful and rigorous design, rather than pouring over databases of events to try and reverse engineer how people are actually using your product
There are two problems with this view.
The first is that it's generally user-hostile. If you're meaning your software to be useful to users, then you need to see what your users' needs are, and how they're actually using your software - not how you think they should be. There are people who design pieces of software that are meant to be opinionated, beautiful gems that are not meant to be used by others (which includes software that pays lip-service to users but is so user-hostile that it's effectively not for them anyway) - and telemetry, crash reports, auto-updates, and more generally a feedback loop from user to developer are not for them.
The second is that it requires a lot of discipline. Look at how user-hostile most applications (both open-source and proprietary) are anyway - do you really think that their developers are going to have the discipline to carefully engineer a good user experience? As an idealist, I wish that they did, but as a realist, I know that in most cases that will never happen, and so telemetry and crash reports (crashing is part of a bad user experience) will get you a slightly better piece of software than nothing at all.
The problem with telemetry is it doesn't tell you what users' need are. It tells you what they do with their software. Not why. Not how they feel about it. Not what they wish it would do.
Thoughtful and rigorous design includes user research and testing.
In my experience it takes more discipline to use telemetry responsibly. I've seen equivalent data used to justify removing 1 feature and making another more prominent. It seems to go hand in hand with UI churn. And it seems to lead to dismissing user feedback.
Crash reports and event streams are different. But there's no reason not to ask consent anyway.
as a developer, i find crash reports helpful when troubleshooting issues, and use device metrics like screen size to determine where to focus limited developer efforts. telemetry is useful because i use it. that's not a magical assertion.
It's not even about privacy, it's the basic ethical principle that you should ask for consent before doing something to me. It is a command line tool that I run on my machine, why should it be allowed to what I didn't ask it to in the first place?
Developers might think telemetry a good thing - then decent ones would care to convince the user that it's a good thing too. Some users will agree, some will not, but asking is just a basic act of respect and decency.
Right, but they specifically listed "automatic update phone-home" as something to be supressed too. As well as some other things that aren't ads; the proposal is about "telemetry" not just ads.
And the proponent of the standard got in an argument with homebrew about supporting the env variable. (Although not necessarily for suppressing automatic updates? Which might be a violation of the standard, to suggest you support it, but only support it incompletely?)
Perhaps the proposed standard needs some more consultation and fine-tuning (as is common with standards for a reason) before trying to strong-arm projects into adopting it.
The person you replied to didn't say that. And updating a package manager manually works fine. I consider forced updates and tracking different problems though.
The OP says that. That is the proposal in the OP that I thought we were discussing here, is why I was discussing it. The proposal in the OP for `DO_NOT_TRACK` does consider them part of the same problem all to be controlled by a `DO_NOT_TRACK` setting.
It may be that both you and I think that's not a great idea, or at least needs more fine-tuning as a proposed standard.
> This is a proposal for a single, standard environment variable that plainly and unambiguously expresses LACK OF CONSENT by a user of that software to any of the following:
> ad tracking
> usage reporting, anonymous or not
> automatic update phone-home
> crash reporting
> non-essential-to-functionality requests of any kind to the creator of the software or other tracking services
I thought we were discussing the comment you quoted and replied to. It has a different but similar suggestion. And Homebrew automatic updates aren't needed anyway.
I don't think fine tuning would help. People who think they're entitled to collect user data without consent don't want to make it easy to opt out.
You literally just said you considered them very different problems? But now you say you think they should both be handled per the OP with a single flag, and you're opposed to both of them on the same grounds -- doesn't sound like you do consider them very different problems?
This seems like one of those internet debates where what we're talking about keeps changing in pursuit of "winning" rather than enlightening.
Agreed with many other posters here - the execution was messy and definitely lacked the calm and collected attitude I'm used to seeing when looking at new standards being discussed.
I was also unimpressed with the tone of the issues/PRs opened. As an adult, one of the many lessons we learn is that if you want people to do something you want, you should be friendly, constructive, and positive - not calling out people on bad behaviors. I definitely don't want to encourage ignoring issues and letting people get away with things, but in this context a change in attitude would have brought much more success IMO
Also I found it really weird that this has just bubbled up to the front page when the original PRs for this project were opened in 2019[1]. Why is this popping back up now?
Someone probably went looking for telemetry opt-out stuff since the recent discussion of Audacity's telemetry and weird policy changes. Somehow they found this.
Unfortunately, the way this guy conducted himself was terrible. His tone and behavior in the issues and insistence that it's a standard just because he made a webpage were really off-putting. I can see why no one was champing at the bit to endorse this.
Unfortunately, opt-outs are purely perfunctory, as these kinds of developers would prefer they not exist at all. I now believe there was little/no hope of getting this sort of thing adopted in any case.
They don't want to make it easier to opt out; the opt out only exists to deflect blame.
> As an adult, one of the many lessons we learn is that if you want people to do something you want, you should be friendly, constructive, and positive - not calling out people on bad behaviors.
The majority of "getting someone else to do something" in our society is done with negative "coercion" (for lack of a better term) than positive.
Most people go to work only because a policeman will come to their house and make them homeless at gunpoint if they don't.
The percentage used to be even higher, but excommunication doesn't carry the sting it once did.
Ultimately, the end result of all legislation and regulation of industry is the implied threat of negative consequences, up to and including state-sanctioned violence, for people who violate laws. Calling out bad behaviors is a common and traditional technique for getting new laws passed.
Shame and bad PR is a powerful tool. Last November I shamed Apple into dropping unencrypted OCSP. Perhaps one day I can shame developers into not spying on their users without consent.
Fact: the bad PR of shipping nonconsensual spyware is the only reason there is an opt-out lever at all. If we can amplify that, we can make it being enabled a default.
On linux you can use network namespaces to deprive the program of access to the network (shouldn't be necessary, but this is the world we live in.)
$ sudo ip netns add jail # create a new net namespace with no access
$ sudo ip netns exec jail /bin/bash # launch a root shell in the new namespace
# su - your-normal-username # become non-root
$ ping google.com # see if network is accessible?
ping: unknown host google.com
$ ifconfig -a
lo Link encap:Local Loopback
LOOPBACK MTU:65536 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1
RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)
$ audacity & # See if you can track me without a network audacity...
[1] 4284
$ _
I really hope that someday someone builds some proper UX around network namespaces for desktop Linux. It would not be all that difficult to implement something akin to Qubes if you made NetworkManager namespace aware and integrated with dbus-launch to start the applications in the desired namespace.
Bad-actors would ignore it wholesale anyway, so this at best gives one a false sense of privacy. It would probably be better _not_ to have it, since people won't be misguided into thinking they're not being tracked.
An actual effective approach to privacy is to use a firewall to block all unwanted connections (allowlist only), and a DNS sinkhole like pi-hole.
IMO we should get more on the more aggressive side when it comes to mitigating tracking, and use tools that actually are a threat to the status quo. Tools like adnauseam[0] was blocked by Google[1] on their store because it actually worked—that says a lot more than a env var string they'll happily ignore.
I don't see why both solutions can't co-exist. If DO_NOT_TRACK suppresses the honest software (of which should be the majority you have installed anyway) then you have fewer applications left to manage manually / less network noise to identify and black.
Saying an imperfect solution is worthless is a little like throwing the baby out with the bath water.
I seriously doubt marketing and advertising departments will care about standards, if they would even notice. The industry can't even manage with web standards, imagine trying to get scummy tracking companies to comply.
IMO it's not even an imperfect solution. I'd say performative, and past DNT efforts have failed, so why are trying this again?
Maybe a tool that lets one easily report software for GDPR violations? That would have more teeth.
> I seriously doubt marketing and advertising departments will care about standards, if they would even notice.
The _vast_ majority of applications this relates to are not going to have marketing and advertising departments. They're unlikely to have more than a handful of maintainers. Most are likely just someone's side project.
> The industry can't even manage with web standards, imagine trying to get scummy tracking companies to comply.
We're not talking about a new standard that's even remotely in the same league as web standards in terms of complexity and nor are we talking about getting every scummy company to comply. A lot of command line applications already have an opt out so all this is proposing is making that opt out option the same.
This isn't any different to other common environmental variables like http_proxy -- not every application supports it but enough does that it is still useful.
> IMO it's not even an imperfect solution. I'd say performative, and past DNT efforts have failed, so why are trying this again?
As of yet, no effort has been made with regards to DNT in this field. You're conflating running untrusted web applications, with running trusted (and often open source) applications locally on the command line. They're two very different fields and as said already DNT options already exist for the latter but with every application having their own preferred variable name. All this proposal is seeking to do is standardise that name.
> Maybe a tool that lets one easily report software for GDPR violations? That would have more teeth.
To reiterate my original comment: the existence of one doesn't prevent the existence of the other. Why pick when we can have both?
Ahh, my big issue is that I don’t mind tracking in genuinely honest software. Want to know how many people are still running your app on 32-bit machines and that’s it? Be my guest!
It does seem like DO_NOT_TRACK isn’t supporting honest software, though. I can’t enable it for Audacity and disable it for Homebrew, for example.
Seriously, your concern is true for any software you run. You install software--say AWS CLI--on trust. But maybe it also installs a non-CLI keylogger, right? You'd have to do some serious investigation to know.
The point, I think, is to have a standard to make it easier on the user to select the option across multiple applications.
It should, imo, be opt in instead of opt-out (defaults to DO_NOT_TRACK=1).
I define a bad actor here as software that makes non-consensual network connections.
Agree on the software installs, and the risks attached. Modern OSes sandbox software wrt IO permissions, IMO we should be doing that sandboxing at the network level too.
Disagree with you on the opt-in. It should be neither, the software vendor should be the one asking permission, either explicitly or as a prompt on a firewall.
I’ve tried Little Snitch, but was quickly annoyed at the high interaction required. Maybe I should give it another shot, especially since I use uMatrix on Firefox, which is a similar concept.
It's overwhelming for the first couple of days, as periodic jobs wake up and ask for permission to connect to the Internet. Once you're over that initial hump, it basically disappears until some new app wants to connect to something unexpected.
Silicon Valley software developers have a huge problem understanding consent. Look at most dialogs presented for things that companies want you to do:
Do you want to enable Feature X?
1. Yes
2. Ask me again later.
Somehow, "no" is just not in their vocabulary. Imagine if "Silicon Valley" was a guy asking a woman on a date, and the only options he understood were "yes" AND "ask again later".
I did not consent to being talked to by a stranger earlier today. I did not consent to any replies on this comment that might follow. I did not consent to the music from my neighbour this afternoon. I did not consent to being barked at by my other neighbour's dog yesterday. I did not consent to a list of things longer than I care to name.
It's all about expectations: consent for everything is unworkable.
Equating some basic telemetry to rape is idiotic and offensive.
The person who suggested the same thing in the Homebrew PR was blocked:
> devlinzed - We shouldn't accept a future where spyware is the norm and rely on this environment variable. We know how that turned out with HTTP Do Not Track. As long as spyware is bundled with Homebrew, Homebrew shouldn't be used or recommended or normalized. The solution is for Homebrew to remove its spyware by having explicitly opt-in telemetry only.
Why would a user ever opt-in to something that has zero immediate benefit? And for authors, that would be the same as permanently disabling tracking, nobody would adopt it. It’s the worst possible scenario.
What we should work towards is privacy-conscious tracking, used sparingly only for monitoring critical pieces of the software and not all user actions. Flag/reject software that violates this. Then there is no need to opt-out for privacy concerns.
Why would a developer be allowed to enable something that has zero immediate benefit for the user, yet erodes at the user's privacy?
Privacy-conscious tracking begins with asking for permission to disclose my personal information (eg. my IP address), before anything ever goes on the wire.
They don’t need your IP address to track usage or health metrics. Most of it can be collected anonymously. We should encourage software to simply not collect personal information at all.
I think you're probably not. If you explain the details to random people on the street then I bet that a majority would probably be fine with it.
I don't even like to use the word "tracking" any more as it's lost all meaning. Not all "tracking" is identical: some is highly problematic, some is a little bit problematic, some is just fine. When you lose light of any and all nuance and difference then the conversation becomes pointless.
It's just that these topics attract people on horses so high they need spacesuits asserting all sort of things in the absolute that it appears you're in the minority.
This is a great point actually - but it probably depends on if the data is regarded as "personal data" or not. IP-addresses is considered sensitive which would mean that if they're saving that it is probably not ok. I'm not a lawyer though :)
Storing IPs by themselves are not against the GDPR, and you do not really require consent for storing them for legitimate reasons, (Think nginx access logs, or rate limits on API endpoints/ banning IPs abusing your service). [1] Pairing IP addresses with other potentially identifying information can also be a little bit of a legal gray area (Look at Fingerprint.js) if done for legitimate reasons (Like fraud detection).
Though honestly most users do not really care about the check box that says "I agree to give you access to all my personal information and sell it to everyone" when they click install, and it's such a sad situation. GDPR had a great potential, it's sad it was unable to do it's best.
If you save IPs to use for fraud detection, then under the GDPR you can't use them for _ANYTHING_ else, and you need a sensible rule for how long you keep them.
Most of those checkboxes are not worth anything under GDPR, because people don't give a clear, informed consent when they have no chance of understanding what is being asked.
The law is not the problem. Lack of enforcement is.
In the cases you list, you have other legal basis for processing than consent – i.e. legitimate interest – but that doesn't mean it's not personal data.
Indeed, IP-addresses are considered [0] personal data in some cases – which only really means that you need to follow the GDPR: have a legal basis for processing, do not process the data for reasons other than that for which you have a legal basis, delete it as soon as you no longer need it, implement protective measures, etc.
> GDPR had a great potential, it's sad it was unable to do it's best.
Given the massive backlash against the GDPR and "cookie walls" by newspaper publishers, it's doing a pretty good job. Can you imagine a company like Apple whipping app vendors into shape regarding data collection without GDPR pressure?
I agree GDPR did make a lot of good change. I don't mean to say GDPR was a waste. It was awesome. What i mean is, Some things (like Cookie banners) kinda defeat half it's purpose, and at times made browsing more annoying.
I'd love too see them do something about it. Amd i hope they do.
> And don’t such tools require GDPR consent to allow tracking or processing PII?
They actually do, yes. I can see an auto-update "phone home" as justified interest to be able to quickly revoke insecure software, but usage analytics are clearly opt-in only.
> Consent should be given by a clear affirmative act establishing a freely given, specific, informed and unambiguous indication of the data subject’s agreement to the processing of personal data relating to him or her, such as by a written statement, including by electronic means, or an oral statement. This could include ticking a box when visiting an internet website, choosing technical settings for information society services or another statement or conduct which clearly indicates in this context the data subject’s acceptance of the proposed processing of his or her personal data. Silence, pre-ticked boxes or inactivity should not therefore constitute consent. [...]
I see where you are coming from, but realistically this will have better chances of being adapted. Also the "do not track" term is already a thing for browsers, so might just stick with it.
I support this fully and I'm very disappointed that the maintainers of these projects are hesitant to adopting it, many citing that they'd be interested only if other major players adopt it first.
Cmon, it's such a minor change. Getting these PR's in means _you are_ a major player (atleast to some) and you might be able to drive change in a positive way. Why not jump on the opportunity?
The best part of that discussion is how one Homebrew maintainer says "I'd like to see package managers implement this first", and when someone from Arch goes "we don't track at all" the big chief comes in and says "Run your package manager how you want/need and we'll do the same.".
The guy also seems quite rude for someone with "Rude people blocker" in his bio.
> I would rather see the different packager managers sit together and work on that standard.
And especially big players should get involved, like the debian/ubuntu apt people, the fedora people, the nixpkg people, pip, node and all the others.
It’s almost as if any of the mentioned package managers has opt-out tracking code embedded in their CLIs.
To be fair, Debian has popularity-contest. (You have to install the package manually – or via a prompt in the installer; if you don't, you don't even have the tracking code on your system.)
THIS is the way telemetry should be. A completely separate opt-in only add-on. I don't trust flags, I've accidentally forgotten to check them myself countless time.
1) It's not a standard at all. Some guy made a webpage and used the word "we". Even in the GitHub issues there was debate about what having this variable set would mean (e.g. should a call to a formatting service be blocked).
2) One maintainer pointed out that being an early adopter would force them into advocacy that they aren't prepared to do. Judging from how argumentative Sneak was being, there's probably an element of "we don't want to be the vanguard for this bully and his variable"
The maintainer just stated it without further explanation. Searching for "HOMEBREW_NO_ANALYTICS" does reveal some other references[1], and it seems that there are some other places where this may need to be added such as "Library/Homebrew/env_config.rb", but I don't know anything about homebrew.
I do think it's very much a minimal-effort PR. It's hardly a PR at all: at the very least the tests and documentation should have been updated, and actually writing out some more rationale would help as well. I certainly wouldn't send a PR like this, at least not without the text "this is a proposal, and if accepted I will update the tests and documentation as well".
Probably the homebrew maintainers have hard coded usage of “HOMEBREW_NO_ANALYTICS”. The PR likely only fixes one instance. Lack of unit tests in PR support this hypothesis.
Could be wrong though and the maintainer simply needed a vague excuse to not merge it. Haven’t completed a deep dive. I couldn’t be bothered to compare repository at the time of PR 2 yrs ago on mobile.
This is a great idea, but I feel like it needs an adoption and roll out strategy so that is starts to snowball.
I think that opening PRS makes a lot of sense, but smaller projects or projects that don't yet have an opt-out mechanism are the most likely to adopt it in the early days I think. Then once you gather some momentum around those bigger projects might follow suit.
This seems dead on arrival, because Homebrew (as they reference on-page) by necessity contacts a wide array of remote addresses, giving away the user’s IP repeatedly as part of normal operation. That conflicts with redefining “track” to be different from the original meaning in the DNT header, to now extend to include any remote network connections at all - not just “do not compile a user profile from this network connection” but instead “do not divulge the existence of a user”.
If this had restrained itself to the definition of tracking used by the original failed DNT effort, I might think very highly of it, but attempting to shoehorn the more stringent “offline by default” viewpoint into the framing stolen from DNT makes this proposal unlikely to be widely adopted.
It’s really unfortunate that this good technical idea was combined with that overreach, as once the idea is rejected by others for going too far, any similar idea will likely be rejected without consideration. Oh well.
Tangentially, do all the adblock extension users realize that their IP address is constantly divulged by all the rules updates their extensions are running in the background? This proposal would, if fully implemented through the entire software stack, result in browsers launched on the user’s machine refusing to allow any background network connections, which would break all ad blockers as they depend critically on that behavior. I’m not convinced that users prioritize “don’t divulge my IP address” over “block ads”, which seems to be a conceptual nail in the coffin of overreach here.
Expanding that to mean “do not reveal the user’s IP address” is a significant expansion of the original meaning, having little to do with the protection against being individually tracked that the DNT header was designed for.
Asking Homebrew to disable analytics has nothing to do with targeted advertising opt-outs, but supports the ideological goal of the authors in promoting their new idea that revealing the existence of the user is “tracking”. That expansion exceeds the scope of the definition of “tracking” as used to reference targeted advertising and marketing, something that is not applicable to the Homebrew project’s existing flag in any way.
Attempting to promote their beliefs by aliasing those two concepts together - tracking protection against targeted advertising, versus disabling telemetry and auto update - is therefore overreach to me, especially when presented without this context to projects such as Homebrew and others in pull requests, and I think it will ultimately lead to their effort’s failure.
To me, this is all a direct consequence of their attempt to ride the coattails of the failed DNT effort by reusing its key phrase “do not track”. They should have used a different name. In a few seconds, I can think of a pretty great one that still acronyms to DNT, so clearly creativity wasn’t the obstacle here. Instead they chose to anchor to “Do Not Track” and it’s too late now to change that. We’ll see what happens.
> the original mission of DNT, which was to request that sites not harvest a user’s data for targeted advertising
It was a lot more amorphous than that. Some of the people behind DNT did have that particular meaning in mind, but others had different sets of things they wanted to prevent. There was enough disagreement on this point that they didn't manage to get consensus on anything more than "this specification does not define requirements on what a recipient needs to do to comply with a user's expressed tracking preference" -- https://www.w3.org/TR/tracking-dnt/
That seems like particularly poor quicksand to build a new "do not track" effort upon, yes; adding yet another interpretation worsens things and ties this new effort to a failed effort's legacy.
This project looks especially relevant because of recent news about Audacity adding telemetry to their project [1] and because of the the community's reaction and a fork [2].
I feel like many people aren't aware that popular open source tools already collect usage data.
I'm sure everyone doing tracking of console programs is just chomping at the bit waiting for the creation of an easy-to-remember way for users to opt out...
more useful would be something like this that lets users opt in or out of the various TYPES of tracking that are done.
for example, I don't care about application usage telemetry. if you want to know that it's hard for me to remember a particular subcommand and am often looking up help for that subcommand, I don't care if you want know that.
if you want to show me "we're hiring" horsepoo, well you can go away. opt out. same for "please donate, I'm not rich enough yet" messages. and straight ads can just die.
options to tweak those things granularly would be welcome, if we're just daydreaming about things that will never get adopted.
Who do you feel entitled enough to judge someone’s humble request for a donation, despite them developing the free software that you use often enough to be annoyed by it?
if they wanted to be paid, would they not release as a product, instead of releasing as open source?
"hey I'm gonna give away my time and effort for free" is not how you get money. it's how you get geek cred or build up a resume or whatever. all fine, of course. do what you want.
if you want to charge for your code, charge for it and stop asking for donations every time I clone or build your stuff.
have a donation button somewhere so people CAN donate, of course. asking is very .. beggy and it drives me nuts, especially when it's from people who make $500k/yr at their day job.
This is a very steamed and unfair rant; it really seems like you don’t understand how little FOSS work makes, and how little money is ever donated to most or any projects.
I don’t know who hurt you in FOSS, but you sound really mean.
I've attempted to convince maintainers in the past to not do user-hostile things [0]. They seem to think they are entitled to a record of how I use their software on my own systems.
It's not clear to me why they would think this, but I think it is because you always get this if you offer software as a service and many people have been involved in working on these projects, despite the vast majority of software not being offered this way (indeed, 96% of IT is on-premises spend [1]). I think the people who have worked on these projects have carried this attitude of usage information is their information over to other domains, without realizing that they are different domains.
For example, it claims one of the banned things is 'automatic update phone-home'. However, for brew they are only trying to disable analyitics, brew will still "phone home checking for updates", and there is no way to disable that (at present).
Personally, I'd want to see something more fine-grained. Personally I want automatic updates (at least notifications that one is available), I can't imagine using a package manager that didn't provide that.
People say that crash reports & telemetry are useful for developers. But I imagine that they also create a lot of work for developers they would not otherwise have.
If a user cannot be bothered to open a ticket themselves or at least opt-in to telemetry to demonstrate the issue, then is it really an issue that developer's should fix? If a app crashes in the woods and no one bother's to report it then is it a bug? (Joking partly).
For commercial apps the dynamic is a bit different since customer's can & do expect things to be fixed without any action on their part.
A lot of bugs are hard to reproduce, occur only on certain OS's, etc.
In these cases, filing a ticket is often pretty useless. What's needed are widespread stats to determine severity, frequency, correlations, etc.
In fact, I'd say the Venn diagram of bugs that can be meaningfully reported in a ticket, versus those best detected through crash reports, is pretty small. So you need both.
Why do you think developers shouldn't fix crashes? Shouldn't developers themselves decide what they want to fix?
This website strikes me nothing more than an idea that someone seems to be passing off as a movement. I feel like the negative reactions of the devs of the FOSS projects who got PRs from this person aren't that surprising. To me this doesn't seem to add that much value and I would probably bounce it from my project as well.
The altertnative approach (and maybe more realistic from what I could see from the PR they opened) would be to compile a list of all those variables and build a file one could source to disable all known telemetry (it could include that variable and allow things to migrate progressively)
But buy in is worth it. Consider the related issue of every damn app not respecting a users home directory and polluting it with every file. But that's a problem the XDG Base Directory Specification was designed to solve. And things are better now, though you often have to explicitly set the XDG variables to get some applications to respect the standard. But just having a standard pulls people and organizations in to follow it.
It’s disappointing, although not entirely surprising, how some projects met this idea with outright hostility. It’s an extremely simple idea. It seems obviously correct to me and I’m not sure what benefit any further effort at discussion or standardization would bring.
Controversially id like them to collate what is being tracked and by who on the cmdline, a bit of naming and shaming (if warranted) might go a long way to making the need for this mute.
Having used crash analytics to fix a production tool in the wild I would say _all_ tracking is not bad. But projects really need to be a little more up front about it.
First, this is backwards. As many have pointed out, any and all telemetry should be opt-in, not opt-out, so the variable should be named YES_PLEASE_TRACK_THE_LIVING_SHIT_OUT_OF_ME, and if and only if is it equal to 1, should the data be collected.
But what irks me more is this measly list of unsolicited PRs. Going over a list of projects where you are a nobody and inject yourself with your "improvements" there, with the implication that if you don't accept the PR, you're somehow an enemy, is far from being a good way to advertise your cause. I remember Alex Gaynor spamming a lot of projects with PRs to amend the documentation to use gender-neutral language, and I remember Coralina Ehmke carpet-bombing projects into accepting codes of conduct. No matter how noble your ideas are, mass-harassing people into accepting them is a major dick move.
The thing is opt-in telemetry doesn't work. Few people will actively choose to enable it, even if they read the documentation and are aware of it.
I think we should distinguish benign uses of telemetry from tracking that collects invasive details about the environment the app is running in.
A popular CLI open source project I use has telemetry enabled by default. The authors are upfront about the data it collects and reference the source code for everyone to verify it. It collects generic things like the app version, operating system, CPU architecture, and a couple more datapoints about the app usage. The IP isn't logged, but is of course known at the time of submission.
This data is not sold or used for any nefarious purpose, but mostly to track which app versions and which features are most commonly used so that development efforts can be better planned. And there is of course a CLI flag and environment variable to disable it.
I'm perfectly fine with this level of telemetry, as long as the authors are open about what data is collected and how it's used. Making the app switch to opt-in would probably eliminate most submissions and would make it impossible to know how the app is used in the wild.
It's a good idea but it seems to me the name will hurt adoption.
Phoning home with telemetry data does not by definition imply tracking.
Yet a project implementing DO_NOT_TRACK is essentially saying that without it, it is tracking users. This isn't particularly attractive for a project that already has some OPT_OUT_TELEMETRY option.
all the pull-requests mentioned on that webpage are from 2019 and 2020 and all were rejected (well, technically the last one is still open, with a comment pointing to an another place where it was rejected).
the "get involved" link points to a page that does not exist.
This is under-specified without giving a definition of environment variable. For the purpose of this choice, "environment variable" should have the broadest possible interpretation, referring to any relevant space that supports named configuration parameters:
Basically any configuration variable that is relevant in the specific context. Lower-level system software which executes before there are environment variables should not be off the hook.
Their proposal is for console apps, and gives the example "export DO_NOT_TRACK=1". They are talking about your first bullet, and not any of the others.
Though the links on this site are now dead and incorrect for the repository on GitHub, it is good to see the author moved their repo to self-hosted Gitea (https://git.eeqj.de/sneak/consoledonottrack.com). It would be hypocritical to be against tracking and then host one's code on Microsoft's GitHub. Even better they released a blog post in 2020 about why the repos were moved (https://sneak.berlin/20200307/the-case-against-microsoft-and...).
Btw, currently I have a huge backlog of telemetry to add, so I wouldn't mind some help. If anyone interested, have a look at the GitHub issues in this repo. The repo itself contains all docs and examples to get you started.
I'm not sure "do not track" would even be the right terminology here. We want to prevent apps from calling home or calling some telemetry or error reporting endpoint, sure. But is that "tracking" in the same sense as we're being tracked from website to website? I'm not aware of any malicious use of that data yet so we might not want to throw them in the same group as google and other advertisers and user data aggregators.
How about "enable telemetry" to also make it opt in rather than opt out?
If a tool collects days on what command line arguments I use and how frequently (some people in the comments argued it's a useful thing), there is hardly a better word for it than tracking.
Does anyone else here run a commercial service with do-not-track functionality?
I have observed that users who enable do-not-track raise more support tickets and generate substantially less revenue than those who do not.
To the extent that those users are a net negative to my service. Even counting referrals of users who do enable tracking, even with generous assumptions about the metrics I can't see due to lack of data.
My tracking doesn't involve ads - this is pure analytics for reporting dashboards, crash analysis, etc.
Good lord. Please stop writing software that contacts servers I didn't ask it to contact and sends information I didn't ask it to send. It's extremely easy.
I've been wanting this for years. Why does this target console apps only? Everything should respect it. If I have it set, nothing should even nag me about opting in.
This is a great idea, but I want it to go one step further. Some projects I do want to send some telemetry. It would help if there was a standard way to name environment variables or config entries across projects so I can predict what I need to change for each one without digging through docs. Perhaps a per-account dotfile that lists projects with 0, 1, or 2 depending on no data, limited subset / anonymous only data, or full data.
I'm probably amateur but until today I did not know `aws cli` sends telemetry data... It's probably one of those "terms and conditions" that are way too long to read them, so I'm only half pissed because I did not read it.
Indeed, sending such data should be opt-in, NOT opt-out
From today on I will ensure each and every CLI I use plays fair game.
But actually what did I expect, since for Firefox telemetry data is on by default :/
But the developers of these tools probably don't want to do that, because now if it's opt-out it would be opt in by default. Also there it would be harder to allow tracking for individual tools only.
This proposal is an example of narrow thinking. However come up with this, would like to opt-out easily as a user, which is understandable, but the world is more complicated like that.
Restrictive firewall (or unsharing to an empty network namespace) helps with non-malicious apps - no network access except for apps that may need it.
Anyway, tracking on Linux by normal userspace apps is just creepy. There's just too much access if you run these kinds of apps under your main user or let alone root, unlike browser based tracking that happens inside a limited sandbox.
I doubt this will catch on. What's the incentive for the software companies to add it?
I like the `source do-not-track.sh` idea better which defines a bunch of `WHATEVER_NO_TRACK := 1` variables. It could even modify config files for software that doesn't use environment variables.
Or even better yet, just use a /etc/hosts file or a custom firewall.
Instead of hoping for various software projects to standardize on a single control, aggregate the various controls in one place, and make it easy to set them all at once.
The assumption should be that the user does not want to be tracked. A ENV stating that is pointless. Scripts that would adopt this are already respecting you by having you opt-in to tracking. It does nothing, and promotes opt-out as the preferred way of showing consent, which is decidedly wrong.
It is so depressing that we need this thing in the first place. What next, CONFIG_DO_NOT_TRACK so that kernel modules don't send telemetry to hardware vendors?
Adtech "ethics" (lack of thereof) has become a huge problem for the community.
Perhaps a more pragmatic solution would be a notrack package (probably just a shell script) that could be executed from your .profile and sets the app specific variables?
i would think the only ones using this would be people who don't want to be tracked so setting this globally would be be great for them. However, what incentives would the products people see to have this implemented? i guess the same as for browsers...
I am. The terminal is one of the few places I am not being tracked. It’s also a portal into my most private data and activity, so if there’s one place I don’t want to be tracked, it’s here.
I do not long for a future where the terminal ecosystem resembles the state of the greater internet with regards to privacy and tracking. We’ve collectively watched it happen to almost every other segment of technology in the past 20-odd years, so it’s not far fetched to believe it could happen here as well.
are there documented cases of these cli tools abusing their telemetry? are they entirely used to pinpoint performance issues and bugs within the tools that implement this telemetry tracking?
if it is the former, i can see there being cause for concern. if it is the latter, this is just pure fear-mongering.
opting out of checking for "updates" in homebrew would probably harm your homebrew use more than most users realize. You won't ever be getting new formulae etc.
I think that this approach kind of dissolves the power of the word "consent"
> plainly and unambiguously expresses LACK OF CONSENT by a user
in GDPR terms, consent can never be assumed. In order to be valid, the consent has to be a result of a conscious, explicit action from the user. Things that are opt-out cannot use "consent" as a legitimizing premise
The easiest "Do Not Track" is to stick with FSF approved software. I know it looks pseudo religious and a bit ridiculous but the more that happens the more I'm convinced they're actually right.
DO_NOT_TRACK is forever tainted by it's initial Microsoft introduction and the massive opposition of the tech community at the time to this idea. Apache webserver to this day strips the DNT header in it's default config.
Accepting DNT would imply that the tech community was wrong on this one, and that will never happen.
No, the real reason is because it's trying to use an optional technical flag to enforce a social/political policy, and as such was doomed to failure from the start. If I recall correctly, the "tech community" was largely in favour of DNT, and anti-advertising in any case. The advertising industry, however, was not, so why bother implementing something that goes against their own interests?
The only workable solution to that problem in practice is legal/political: legislation and sufficient regulatory abilities to effect compliance (i.e. GDPR/PECR and the like). In the end, the US stuck with permit-all-by-default and largely don't care, the EU/UK went deny-tracking-by-default with penalties to back it up.
The reality is that this proposal will fail for much the same reasons: either you have an opt-in mindset from personal beliefs or legal requirements, in which case the global off-switch is worthless to you; or you're pro-tracking, in which case the incentives are to ignore the switch. Or you're Debian etc. and you remove the tracking code outright...
This is a losing proposition that will only lend credence to the inherently violent opt-out approach to data collection.
Instead how about a do-not-hire-or-collaborate-with registry of the individual contributors participating in projects that employ those tactics and see how they like trying to opt out of it.
Negative punishment will not work for this, you would be going after the symptom instead of the problem. Don't make it harder for people that write these changes, make it harder for people to force others to. There will always be another developer, and there's no guarantee you'll know their identities. If you're looking to bring attention and make a statement at the potential expense of others that's one thing, but practically speaking this approach can't work.
Practically speaking, that's the only approach that can.
Although it's often hard to tell, most software developers aspire to being treated like professionals rather than specialized serfs, and part of being a professional is accepting responsibility for your work.
Of course it doesn't preclude holding their employers responsible as well.
Netlify deleted my issue comment naming the developer who put `force: true` in their cli (sending a telemetry event when disabling telemetry) even though his name is still available in `git blame` in their public repo.
I completely disagree with your definition of spyware. A crash report, absent any PII, is not spyware to me, and does not require my advance consent to be sent.
You're entitled to your own definition, of course, but I hope that for the good of society as a whole, you and people with your tyrannical/authoritarian attitude toward issues that can (usually) be resolved civilly with never hold positions of power.
Crash reports contain memory dumps, and private information.
Additionally, they disclose client IP when submitting, which is city-level geolocation of a user of a particular piece of software.
They're fine if you get advance consent from the user before transmitting. Sending the contents of memory (especially after a crash, where it contains by definition unexpected things) is a serious security issue/data leak, if done automatically.
My position is the opposite of authoritarian: it's that these sorts of interactions should happen only with the full, informed, advance consent of both parties involved. Authoritarian is a good way to describe devs who feel completely entitled to all information about their software running on computers which they do not own or have any rights to.
> Crash reports contain memory dumps, and private information.
No, they don't. You're ascribing characteristics to them that they do not have. Maybe your crash reports have memory dumps and private information in them, but that means that they're poorly-engineered - there's nothing about a crash report that requires it to have that information in it. This should be clear to you, because you're a programmer.
> Additionally, they disclose client IP when submitting, which is city-level geolocation of a user of a particular piece of software.
Most people, including me, do not care about city-level geolocation information - and for those that do, Tor or a pastebin service are options. Again, you're ascribing certain characteristics to crash reports that they do not have, and you should know better, because you're a programmer, and being a privacy...enthusiast, you're certainly familiar with Tor.
> They're fine if you get advance consent from the user before transmitting.
...which leads to far less crash report data, insecure systems due to disabled auto-update, and selection bias in your telemetry data.
> Sending the contents of memory (especially after a crash, where it contains by definition unexpected things) is a serious security issue/data leak, if done automatically.
More assumptions that are false. Who said that a crash report has to contain a memory dump? A stack trace, complete with annotations that allow private data to be redacted, is both a useful crash report and will not leak any personal information.
> My position is the opposite of authoritarian
I was referring to an earlier comment you made:
> That is another website on my to-do list: one that names and shames spyware developers who create these commits.
This is authoritarian. Or tyrannical, if you like. The syntax doesn't matter - what matters is that you're attempting to use your power as an individual to try to impose your own beliefs on the majority. Beliefs which, by the way, are not held by the majority.
> Authoritarian is a good way to describe devs who feel completely entitled to all information about their software running on computers which they do not own or have any rights to.
This is blatantly false. You're perfectly free to not run any of this software - there's no "authoritarian" going on. You are free to rewrite whatever software you want, or, because most of the interesting stuff is open-source, fork it and remove the telemetry bits. Let me state it again: developers building some functionality into a tool is not authoritarian, as long as the user isn't compelled to use the software in the first place (which is true the overwhelming majority of the time).
You, meanwhile, are trying to force other people to change the software that they have been writing, according to your own whims. That's authoritarian.
Telemetry should always be opt-in. Yes, that means vendors will get much less data. It's on them to deal with it.
On a related note, I wonder how long it takes until one of the vendors of popular CLI tools or desktop apps get fined for GDPR violation. I wonder how much of existing telemetry already crosses the "informed consent" requirement threshold. I'll definitely be filing a complaint if I find a tool that doesn't ask for it when, by law, it should.