I started poking around and noticed that Go 1.13 now defaults to the Golang Proxy to fetch modules. This means a proxy, governed by the Google Privacy Policy, is now capturing everyone's module usage by default. Unless you change settings this includes proprietary/corp stuff.
"go env -w GOPRIVATE=*.corp.example.com" was added to make it as easy as possible to configure private modules. If the environment is not set up, just the name of the module will reach the Google services, it will not be published, and an error will be returned.
Anything else would have make it possible to bypass the protections of the new Checksum Database (https://golang.org/design/25530-sumdb), a state-of-the-art authentication system that provides auditable proof that the operator is behaving honestly.
For an example of why the Mirror is valuable, beyond the 10x performance gains (depending on number of dependencies and connection latency) and security gains from not using git on the client, look no further than git.apache.org, which has been down for days breaking a number of builds (https://status.apache.org/incidents/63030p4241xj).
I really think this architecture brings together the best of the decentralized module ecosystem that Go had (with its lack of regularly compromised registry accounts, and no mismatch between code and package origin) and the centralized registries of other languages (with their better performance and reliability).
Particularly proud of how the Checksum Database provides even more security than the centralized registries (because it's cryptographically accountable for its actions, while registries can sign anything they want) without requiring authors to manage keys, register an account, or change their workflow at all.
[Note that I work on the Go team, and co-designed the Checksum Database. But of course I don't speak for the company.]
EDIT to add: I should also mention that the sumdb allows the use of any untrusted proxy, easily configured with "go env -w GOPROXY", for users that can't or don't want to use the Google provided one. The proxy can also opt-in to mirroring the Checksum Database, in which case no user requests at all need to reach the Google services, while still providing all the integrity guarantees that proxy.golang.org users benefit from.
First programming language that I've had to read and understand a privacy policy to use.. and consider that it may change in the future.
The idea that the language I'm programming in now reports anything back to google is distressing.. and I say that as someone who has been programming in Go for about 7 years. What right does google have to collect usage information from modules hosted on github (or elsewhere)?
I checkout the code for the modules, review it, and commit it into my own repository. I don't need you to speed up the rare git checkout of a module. Not that I think some minor speed improvement is worth my privacy.
You talk about decentralization... while centralizing the entire thing around Google. You talk about git.apache.org being down, while you're willing to bring down the entire ecosystem across the board with your own special server.
This "feature" should have an option to disable it.. at the very least.
This is the first language in my 23 years programming that reports my usage back to the language authors. It's unprecedented. And the truth is, if this is ok to you, then there's really no limit what the code I write today, may report back to google tomorrow.
> This "feature" should have an option to disable it.. at the very least.
The Go 1.13 Release Notes lead off with documentation links for how to disable and otherwise configure this:
"See https://proxy.golang.org/privacy for privacy information about these services and the go command documentation[1] for configuration details including how to disable the use of these servers or use different ones. If you depend on non-public modules, see the documentation for configuring your environment[2]."
> This "feature" should have an option to disable it.. at the very least.
You should be able to. It's not the ideal solution (or recommended for most use cases), but you can by setting GOPROXY=off. GOPROXY=direct will force it into its previous behavior.
The same should be true for the checksum DB with GOSUMDB=off. I don't remember specifically and can't find the page its options were documented on.
IMO, using an environment variable for that is not robust. If the environment variables have been cleared (for instance, due to "su -" or similar), or if they have never been set (for instance, because you just checked out the code on a new machine and forgot to configure the environment variables there, or because you did remember to configure the environment variables but forgot that startup scripts do not take effect on an already running shell), the defaults will apply. Using a configuration file within the source directory would have been more robust.
You can use a shell script that you check into your repos that explicitly sets these environment values instead of using the go command directly. As long as you habitually never run "go get" or "go build" directly it ameliorates the issues you mentioned.
Indeed; if you're working with multiple developers and you need these policies enforced, you should use e.g. a makefile. I'm not sure if it's possible to force the use of that over straight `go` commands though.
go env -w allow you to configure your local go installation and set the default value of the config entry when the environment variable is not set. The config entries have the same name as environment variables.
From what I understood from the manual, the configuration stored by "go env -w" is per-machine (actually per-user), not per-repository. So if you checkout on a newly installed machine, and forget to do "go env -w" beforehand (from my experience, people forget even the "git config --global user.email", so it's not unlikely to forget the "go env -w"), you'll be using the defaults.
I agree. I don't like the solution either, but it is what it is. What worries me is that if the envvars aren't set, there's a potential for leaking information.
But, it's still possible to disable these features, so it's not quite as bad as the OP suggested.
> This is the first language in my 23 years programming that reports my usage back to the language authors
I suppose that means you haven't used any of:
1. nodejs, which reports your usage back to npm Inc with the exact same amount of detail (names of dependencies, ip of caller) (and stores much more, since it publishes 'downloads per month', etc)
2. rust, which does the same with crates.io
3. perl, which does that with cpan
4. python, which does that with pypi
5. ... etc
All of those languages have a central registry of packages that has the same level of detail in the metadata it can potentially collect as the go proxy does.
> This "feature" should have an option to disable it.. at the very least.
It does. Multiple options. You can run your own proxy, disable it entirely, or opt out of the go mod experiment and never run go get, but rather vendor by hand with `git clone` or whatever.
> And the truth is, if this is ok to you, then there's really no limit what the code I write today, may report back to google tomorrow.
Very much a slippery slope. They've communicated clearly how to disable it and what it does, it's in-line with what other language ecosystems do in terms of metadata reported to a central package repository, and I think it's very easy for someone to be okay with this, but to not be okay with any step beyond this line.
> This is different. It’s a proxy in front of others repositories
It's not all that different.
It's perfectly possible to use gems in ruby without ever using the rubygems server by setting every gem to be something like 'gem foo, :source = "https://github.com/foo/bar"'
However, people want to use this centralized repository in the ruby community because it allows for easily updating gems and for convenient caching.
The go proxy is not just a proxy. It also pre-parses out go.mod/go.sum to provide metadata faster and to provide security guarantees.
In a similar way to how centralized repositories are helpful in other languages, the go proxy adds actual legitimate features.
If you look at the public information on its development of this feature, all the motivations cited are about security, performance, and useful features, not information collection.
> This is different
There is a difference. Other repositories, like npm, require someone to opt-in to code being available via it (e.g. type `npm publish`, whether the publisher owns the package or not), while the go proxy default opts-in everyone that publishes a go package publicly and uses the newer go toolchain.
In practice, with how the npm/rubygems/etc communities work, everyone does opt-in if they wish their code to be publicly used, so the difference doesn't seem meaningful to me.
Other than that, the difference seems vanishingly small. In both cases, a 3rd party exists that serves downloads, and it exists to provide performance and discoverability benefits (with the go proxy also does due to supporting version-related operations), and in some cases security benefits.
They're offering a cryptographically signed mirror - the first mainstream one available. You can either run your own personal mirror, run one for your company, or turn it off.
There is a difference between npm, perl, and the others mentioned and Go. Go is Google with a diverse set of products and services. Many people who use Go build competitive services to Google.
What company is using python and building something in competition to the python software foundation? Or something for the others?
This difference is worth taking into account. If Go were part of a software foundation like Python this would be a different story.
You can also pick your own proxy primary and fallbacks.
For example, this should in theory work to make the google-run proxy your fourth choice, with three other proxies attempted before the google-run proxy, and with direct (no proxy) access as the fifth and final fallback:
go env -w GOPROXY=gocenter.io,goproxy.io,goproxy.cn,proxy.golang.org,direct
There is some geographic diversity in that example as well -- I think two mirrors in that list are primarily run in China.
First, I think proxies have their place. Please don't mistake my comment for any issue with proxies themselves.
Second, the don't need Git part is a bit of misdirection. Other package manager (Composer for PHP comes to mind but there are others, too) can pull the source at a version right for GitHub. This is much faster than using Git and people have been doing it for some time. Typically it's pulled down as a tarball.
You don't need a proxy to have git out of the loop for downloads or install time stuff.
Third, when it came to making tradeoffs one was made here. I don't know if it was explicit or not. Protections from the global checksum database as a default were put ahead of privacy as a default.
I wonder what other designs could have kept a certain amount of the global checksum protection while keeping privacy. I came up with a few idea while considering my response here.
Fourth, to really speed up performance in things like CI it's much faster to cache the modules locally. This is something used across programming languages for dependencies and most of the major CI systems have this documented. I looked at using GoCenter months ago but didn't opt for it because caching in CI was so much faster.
We were looking at things holistically rather than just with a single tool.
As I said in the blog post, I appreciate the privacy policy being up front. It is the first link on the proxy website. And, I appreciate the configuration to customize the proxy.
A lot of this has to do with the default. Most people won't change it. A lot of people are going to leak proprietary information to Google without realizing it. Everyone here can decide if that is a good or bad thing for themselves. Hopefully, they will learn the config to customize things if they want.
> You don't need a proxy to have git out of the loop for downloads or install time stuff.
You do with go because go uses git tags for version information and must parse the `go.mod` file at various versions.
Go also supports git repositories hosted in places other than github, where there's no standardized way to fetch a single file at a tag (the go.sum file), or to perform a suitable search for a versoin and its metadata.
> This [pulling github archives] is much faster than using Git
That's only true if you only ever look at a single version of a library. If you have a 20MB git repo at v1.2.3, and ~100KB of files change for the v1.2.4 patch release, using the zip/tar archive means you have to download 40MB total to use it and update it to the new patch. Using the git repository in the first place means the update can only download 20MB since the new `fetch` will be effectively zero data, assuming you cache git stuff well.
I think this is a perfect example of a completely rational implementation that given all the facts almost all developers would agree to - except because of Google's reputation for pervasive tracking, abruptly sunsetting popular products, and general we-know-better approach to software, will become a point of flak for the core Go team.
To be abundantly clear, and because a lot of Google employees read this, the we-know-better comment isn't a jab - because often Google engineers DO know better, but it does occasionally mix with i.e. poor communication and create frustration for the many developers who use Google services and tools.
> This means a proxy, governed by the Google Privacy Policy, is now capturing everyone's module usage by default.
No surprise there, was even called a 'privacy nut' on the neighboring Android 10 thread for highlighting the nature of Google's 'privacy' stance. So essentially they turn on tracking / analytics on by default for module usage without consent or any notice except for the lengthy privacy policy. This reminds me of when Microsoft VC++ embedded analytics in the developer binary when using the compiler. [0]
But then again, Google can do what it pleases since Golang is authored by Googlers.
> "without consent or any notice except for the lengthy privacy policy"
There's also there in release notes:
> "The GOPROXY environment variable may now be set to a comma-separated list of proxy URLs or the special token direct, and its default value is now https://proxy.golang.org "
Had you read the linked release notes, you wouldn't have had to poke around to find that out: it's not some hidden thing. The second paragraph starts off with: As of Go 1.13, the go command by default downloads and authenticates modules using the Go module mirror and Go checksum database run by Google.
It's so that you can have a seamless module version checksum [1]. The checksum in proxy ensures that you don't "trust on first use" whenever you update a module.
The jFrog folks provided a good example with GoCenter. It was optimized for the performance use case. If you used their proxy, released at the end of last year I think, the time to pull a dependency was significantly less than to pull it from GitHub. This leads to things like faster CI builds.
Also, the dependency resolution really just relies on `go.mod` and `go.sum` - having to download the entire git tree just to access different versions of those files is a bit inefficient. (It also makes it harder for 3rd parties to serve different versions of a dependency to different people - and easier for Google to do the same)
The publish field (optional)
The publish field can be used to prevent a package from
being published to a package registry (like crates.io) by mistake,
for instance to keep a package private in a company.
[package]
# ...
publish = false
But I would be curious if someone more knowledgeable than myself would be interested in giving a quick summary of the approach?
Publish isn't necessary for the Go version, since the model is on-demand pull driven. godoc.org is the same way: there's no way for private code to end up there unless your private code is actually public.
The privacy concern is entirely that Google may see the name of some module you attempted to download.
I think they chose a default that was designed to provide a better user experience. If you can't even let the google know what your dependencies / stack is then I think you can run your own as well.
Would be interesting to test the security at some of these companies where this is managements worry. How do these companies worry about dependency leaks but then leave huge S3 / remote exploitable web holes totally open?
If google as an adversary is your worry:
What about all the folks your org emails with that use google in various ways (including perhaps their email stack that they've turned over to google?). How many users are using google single sign on lots of places? How many of their users are using android with telemetry? Browse websites using google chrome? Browse websites that themselves use google analytics.
Google / Apple / AWS / Microsoft are at the root of trust for a lot of what happens online these days.
With different companies there are different agreements. For example, when a company stores something in a private S3 bucket that company has an agreement between it and AWS on who has access to that bucket and for what reasons. The agreements allow companies to know and document risk, SLA, and many other things.
Many companies have not turned their email over to Google. Many companies are in competition or at least coopertition with Google. Many are trying to get a competitive advantage these days.
All of this was a big enough deal for Mozilla that they got a special contract with Google before using Google Analytics that says Google cannot use the data collected on Mozilla sites for other purposes.
Some won't worry or care about the implications of this. There are a bunch who will. It is not an all or nothing thing. It depends on your business if you are building stuff for a business.
https://codeengineered.com/blog/2019/go-mod-proxy-psa/