Hacker News new | past | comments | ask | show | jobs | submit login
Google has been DDoSing Sourcehut for over a year (drewdevault.com)
489 points by Tomte on May 25, 2022 | hide | past | favorite | 213 comments



If it were me, and I wasn't willing to just block the traffic, I might just set a 128 kbps limit on it and call it a day[1]. Eventually, the other side will figure out that their fetchers are all backed up and work out how to do their job without burning so much bandwidth.

[1] Yeah, that can be a bit of a pain to setup depending on the server settings, but some people have to pay for bandwidth and server resources, so it's probably worthwhile.


Why not just return a 429 (Too Many Requests) if the specific repo has been requested by google not too long ago ? (e.g. 1 hour earlier or).

It's a standard response code, and with a bit of luck google will scale the requests accordingly. If not, this will still allow the proxy to operate properly without burning too much server resources.

(I understand that this may leave some customers unhappy since the proxy may be up to 1 hour stale. If that's the case, the server could also add a check to see if there have been pushes since last request, though this is quite more complex)


> It's a standard response code, and with a bit of luck google will scale the requests accordingly.

Google already says in the comments [1] that it would be "a fair bit of work" for them to read the standard robots.txt for "boring technical reasons". I would not necessarily rely on them to respect HTTP 429 either.

In fact, given the facts at hand with the current situation, I would guess that they probably wouldn't handle it correctly, and would cite more "boring technical reasons" when asked.

----------------------------------------

[1] https://github.com/golang/go/issues/44577#issuecomment-11378...


> "a fair bit of work" for them to read the standard robots.txt for "boring technical reasons".

that's embarrassing. What other reasons should there ever be?

Stop taking their money, stop using their shit. Turn your attention to where it's beneficial.


Returning a 429 would not just to hope that the Go proxy would obey the response, but rather it would significantly reduce the workload the server has to do when it gets a request it doesn't want to serve.


Or you could mess with a random percentage of the requests: tarpit them, drop random packets, reply with malformed answers etc. If you keep the percentage low they might have a fun time debugging :)


While amusing, responding in this way seems overly antagonistic/escalatory. i.e. the behaviour of a poor citizen of the web.

It also would be likely to result in an increased number of requests?

Drew's actions so far (as far as I can tell) are not only inoffensive/reasonable but overtly good-willed and presumably result from a considerable amount of effort.


As if Google's response (that they, the largest tech company in the world, can do nothing about it) wasn't antagonistic…


Seems like there's no need for this. HTTP spec covers "too many requests" already. Why further antagonize things rather than clearly state the problem?


If the rate limit is high enough, and pooled across all matching requests, you'll get different effects depending on the (global) concurrency, which might be tricky enough. And it might lead people into the right direction anyway.


The rate limit is unlikely to cause a problem. Google has been crawling the web since its very start, and the internal services which fetch resources from external web servers are extremely resilient. Some request fails? Some request is slow? It's not going to slow down other requests. Maybe these services aren't being used for Go, but the expertise is on tap.

(These kinds services are also supposed to rate-limit their requests.)


> The rate limit is unlikely to cause a problem.

If that's the case, then great. It solves the DDoS issue without impacting the Google side, and everyone can be happy.


This is completely different architecture and services, largely open sourced. This is limited to Go and the dev team.

Most likely these errors would manifest down to the users of Go and those who've a dependency which lives on sourcehut


A big chunk of the Go core maintainers are Google employees, and they run Go services on top of Google infrastructure.


Being tied to Google has very little impact on the technical implementation of go modules fetching, proxy, & sumdb. It is intentionally isolated so that request data is not part of Google's data business. Go uses git under the hood for these requests, so there are more complications. How do you cache or respect robots.txt when you don't actually talk to the servers?


You would think so, but has someone made a zip bomb with Git packs yet? ;)



I guess this is the professional response ;) I would just blacklist the IP addressess, just to see what would happen.


Yeah, but that's not an option according to the article.

> I can’t blackhole their IP addresses, because that would make all Go modules hosted on git.sr.ht stop working for default Go configurations (i.e. without GOPROXY=direct)


That’s more of a “won’t” than a “can’t”


It would mean that his host is not an option for any Go developers packaging libraries. So it's not really an option for their business, I'm guessing.


A source code repo can't block access to code it hosts for a major language and maintain its user's goodwill. "cant'" sounds right.


It would mean his service would be useless for a significant portion of end-users. Not really an option for a site that wants to stay relevant.


You could easily proxy around it. It's inconvenient to set up, but it can definitely be a set it and forget it thing.


You can set GOPROXY=direct in your environment and not have to deal with Google's scumbag proxy service in the first place, but nobody seems interested in making this the default behaviour.


Keeping the rate-limited connection alive eats up resources. In this case, the organisation executing the DDOS attack has virtually limitless resources.


In theory what could Google SREs do to get back at OP for messing with their services? Could they be banned? Blacklisted?


Regardless of whether or not Drew is abrasive (I've never dealt with him so have no opinion), this is on Google.

If a single service (Go's module crawler) is doing this to multiple third parties (and it is, by design) then requiring Drew to implement a workaround is the same thing as relying on every (relevant to the issue) third party to do the same thing. Which is bad design.

This should be fixed at source or switched off.

As for the workaround itself it's true that going on a crawler exclusion list will not stop SourceHut being usable, only make it a little stale, but that doesn't matter. The problem is not at Drew's end. And once again for this workaround to be an effective solution every (relevant to the issue) third party needs to do the same thing. Which is bad design.

As a general rule, with excessive polling the system doing the polling is the one at fault. And as a general rule if one system impacts many fix at the originator. If Google devs don't know this, then why on earth are they getting the salaries they do?


Can you help me understand the problem with the workaround, which is precisely to have Google's proxy not excessively poll DeVault's service? It really seems like DeVault's real argument here isn't about the impact of this on his service, but that he doesn't like the design of the proxy. He has a lot of standing to complain about impacts on his service, but essentially no standing to complain about designs he finds sub-par.


I can't really speak for Drew's problem with the workaround, only my own from a design point of view - he may not even agree with my own issue at all for all I know, so I don't want to put words in his mouth.

From my own perspective, however, the issue is about impact and responsibility.

The excess traffic is their impact as it is caused by their design and is entirely of their choosing. Thus it becomes their responsibility.

The workaround they offer (as with so much in tech) means their stuff becomes opt-out and not opt-in. By which I mean their own decisions/designs cause an impact where they require everyone else suffering from it to specifically and individually opt-out of being excessively polled.

This is an outrageous requirement they can only impose due to the power imbalance.


It sounds like DeVault let them know their proxy was hitting his service too hard, and they immediately responded by saying they could dial it down. Seems like a pretty normal sequence of events? I feel like I have to be missing something here.


> It sounds like DeVault let them know their proxy was hitting his service too hard, and they immediately responded by saying they could dial it down

I'm not ignoring that Drew complained and they offered to dial it down. That's a reasonable response if it were an isolated case. The problem is that it isn't an isolated case. This is how the module cache works, which means it is doing that kind of load against other module sources too.

If you're going to impose these resource demands on others then they should be managed for everyone affected not just the one who complains about it.

- Repeatedly pulling an entire repo over and over again is an antisocial act that is unfairly imposing bandwidth/CPU costs on third parties

- They should not expect every impacted party to obtain an opt-out individually because of unasked-for extra resource demands

- The system that excessively polls is broken, not its victims


I can't speak for Devault, either, but I think that Sourcehut is intended to be selfhosted if you want. So it's probably too much pain to ask every language provider (might be a lot) to dial down the traffic. Imagine youre hosting modules from ~200 langs.


This is probably 80% peoples' existing animosity towards Google & Go, so they're ready to hate, regardless of how the facts land.

I think its primarily about the Go team having a different working philosophy than the standard HN soft. dev philosophy, and an intolerance of difference.


> This is probably 80% peoples' existing animosity towards Google & Go, so they're ready to hate, regardless of how the facts land.

Go is my second fave after C# and I've used it almost daily for many years. No animosity here.


The proxy _is_ pretty nasty design. Basically every time `golang` fetches modules, it uses Google's proxy out of the box.

This lets Google collect very granular data on who uses which modules, and also makes it opaque to webmasters who's using their services (it's all hidden behind google). It's honestly pretty nasty how Google has hidden this kind of "analytics" within a pretty dark grey area.

At the same time, the bot that handles the proxy seems to crawl everything, very often, and excessively, effectively DDOS'ing anyone hosting go modules.


If the workaround didn't degrade service for Drew's go users, google would just implement it for all repositories.


How exactly does this workaround degrade service for DeVault's Go users? I'm not saying it doesn't; I'm saying nobody has given a simple explanation of what the impact would actually be, except for some people who have given clearly false explanations.

It's OK to just not know, but if you don't know, it's weird to have strong opinions about it.


I'm more weirded out by the fact that you believe google is performing this DDOS for no actual benefit and are choosing to defend it anyway.


Is that an answer to some other question I asked? It doesn't seem to be responsive to what I just wrote.

I feel like what I'm sticking up for here is the practice of software development. Building an automated system that generates unexpectedly unwelcome load on someone else's service is... not exactly front-page news? It happens basically all the time? The idea that because the Go team is sponsored by Google, nothing like this should ever happen seems deeply unrealistic. "We can push a button to make this stop happening; the tradeoff is that other hosting services that don't care about this load might currently have fresher cache entries [whatever that means]" seems like a perfectly cromulent response.

He should just tell them to push the button. If you think he shouldn't, you should be able to say why.

I do have a mild rooting interest here: I think the Go module proxy is pretty neat, and does something interesting for the security of the ecosystem. DeVault disagrees. That's fine, disagreement keeps things interesting.


> He should just tell them to push the button. If you think he shouldn't, you should be able to say why.

Accepting that Drew should get them to just push the button is accepting that it is the responsibility of each victim individually to cope with the abusive load being sent their way by engaging directly with their abuser. Rather than the sender, who is truly the one responsible, fixing it for every target by reducing their polling at source.


More importantly, there is no threshold of usefulness where continuing the behavior is justified.

If the behavior is unimportant enough that drew should accept eliminating it, then its value does not justify the load.

If the behavior is important enough to justify the load, then eliminating it is no solution.


I don't understand how this logic is meant to hold. The behavior can be of enough value to justify it where the cost to source hosts is low, but not of enough value to justify it for all hosts. Which is where we are now.

It feels like some comments here are trying to reconstruct what happened axiomatically purely from comments on the thread, without any empirical input from, like, how the Go module proxy actually works, and ending up in weird places as a result.


Personally I'm trying to keep it tied down to an issue of responsibilities.

The abuser of resources should stop abusing. It isn't the fault of the victims and they shouldn't each individually need to address it. Stop the issue at source for everyone by stopping being anti-social/parasitical on the use of resources through excessive polling. I think that's the extent of my own argument here.


It's not a great argument? If Github and Gitlab are fine with the polling, and the polling has even marginal benefits for Go users, why should we care how they handle Github and Gitlab? It sure looks like nobody on the Go project knew that sr.ht would care about these module clones, and when they found out, they gave DeVault the option of stopping them. I'm still not clear what the issue is.

If they knew DeVault's host wasn't going to be able to handle repeated module clones, a priori, then doing it anyways would be a problem. It doesn't look like anyone expected this to be a problem. It turned out it was, and there's a fix.


> How exactly does this workaround degrade service for DeVault's Go users?

I think you don't always get the latest stuff due to caching or so.


You shouldn't have to opt-out of a DOS.


You don't; this isn't a DOS.


This is still occurring? I remember the original incident and just assumed that Google had sorted it out with Drew/Sourcehut.

This is absolutely shocking behaviour, and I’m mortified at the precedent that it sets.


They didn't sort it out, they just banned him from commenting. So kind of problem solved for Google?


> I was banned from the Go issue tracker for mysterious reasons, so I cannot continue to nag them for a fix.¹ I can’t blackhole their IP addresses, because that would make all Go modules hosted on git.sr.ht stop working for default Go configurations (i.e. without GOPROXY=direct). I tried to advocate for Linux distros to patch out GOPROXY by default, citing privacy reasons, but I was unsuccessful. I have no further recourse but to tolerate having our little-fish service DoS’d by a 1.38 trillion dollar company. But I will say that if I was in their position, and my service was mistakenly sending an excessive amount of traffic to someone else, I would make it my first priority to fix it. But I suppose no one will get promoted for prioritizing that at Google.

> [1]: In violation of Go’s own Code of Conduct, by the way, which requires that participants are notified moderator actions against them and given the opportunity to appeal. I happen to be well versed in Go’s CoC given that I was banned once before without notice — a ban which was later overturned on the grounds that the moderator was wrong in the first place. Great community, guys.


> In the meantime, if you would prefer, we can turn off all refresh traffic for your domain while we continue to improve this on our end. That would mean that the only traffic you would receive from us would be the result of a request directly from a user. This may impact the freshness of your domain's data which users receive from our servers, since we need to have some caching on our end to prevent too frequent fetches.

https://github.com/golang/go/issues/44577#issuecomment-85692...

> "EFAIL" is an alarmist puff piece written by morons to slander PGP and inflate their egos. [...]

https://github.com/golang/go/issues/30141#issuecomment-46427...

Disclosure: I was on the Go team at Google until earlier this month. Dealing with DeVault's bad faith arguments is one of the few things I won't miss of that job.


The second link is irrelevant to the issue at hand. I'm sure there are a lot of shitty Google devs who have behaved shittily with others. I don't think that's reason for, say, an ISP, to ignore any issues Google might face as an entity.

The relevant first link clearly lays out an implementation problem. It's not just git.sr.ht that's facing it, but another user also comes in to point out a tremendous amount of traffic. An amount of traffic that is fairly irresponsible to say the least.

The issue owner doesn't deny the problem. They simply say it requires work. It appears they are unwilling to do the work to resolve the unsavory behavior, and instead are asking the host of Go modules to disable the ability for their service to be used as a Go module host, or instead suck it up and deal with the cost and complications of Google not putting in the effort to fix their architecture's DDoSish behavior.

Google and Go have the market power to pull this off, but let's not pretend this isn't bad behavior. An appeal to Drew's terrible personal communication style does not change that.


> The second link is irrelevant to the issue at hand.

It's relevant to the issue of the ban, which is what the (sub)thread is about. I think the reason it was posted was to demonstrate a pattern, and I think that's pretty relevant.

Every single disagreement I've had with Drew escalated and I don't think that's my fault since it never happens with anyone else (Never? Well, hardly ever) and I've seen it happen with various other people too. Not that I'm perfect by any means or couldn't have done things better, but there's certainly a pattern here.

As for the actual issue at hand: I mostly agree with Drew, however, I don't really have enough information to be sure here, and Drew has misrepresented things in the past and does so here in this post (a reader unfamiliar with Go would be left assuming that Go "phones home" just for the sake of "phoning home" after reading this article, which is really a misrepresentation IMHO), so there's not a lot of trust here (again, a pattern).


> Disclosure: I was on the Go team at Google until earlier this month. Dealing with DeVault's bad faith arguments is one of the few things I won't miss of that job.

So does or does not the problem persist? Second was or was he not banned from the commenting issue tracker. Third does the CoC require that a person gets notified by the moderator and was DeVault notified?

If the answers are yes to all those problems I wonder who is makeing bad faith arguments?

Note: I have absolutely no skin in this game, except for being a sway and gmail user.


The subthread we're commenting on is about the ban, not about the proxy.


The OP specifically said:

> Dealing with DeVault's bad faith arguments is one of the few things I won't miss of that job.

They didn't say bad faith argument about banning, they said argument_s_. So he is not just talking about a single one. Which are the bad faith arguments? I asked if any of the things were not true, nobody said they were untrue. How can any of the arguments (I did not just talk about the proxy, I also talked about the banning) be bad faith if they are true?


The bad faith arguments being referred to are ones made on the issuetracker, not in the article.


How are his arguments in bad faith if he is the one that gets DDoSed by your software for over a year, and still tries to be helpful?

Not sure if you realize the absurdity of this, but he has to pay traffic and server costs. Like everyone else, except probably Google as it seems!?

I mean, you didn't even consider implementing a simple fetch of an already cloned repository in your mirroring server code. So yeah, I'd argue that the bad faith part is actually justified.


> I mean, you didn't even consider implementing a simple fetch of an already cloned repository in your mirroring server code. So yeah, I'd argue that the bad faith part is actually justified.

https://github.com/golang/go/issues/44577#issuecomment-11378...

> We did consider caching clones, but it has security implications and adds complexity, so we decided not to. It is certainly not trivial to do and not something we are likely to do based on this issue.

Drew continues to act as though he is always correct, and any viewpoint that isn't his is just moronic. I've repeatedly seen this behavior from him in multiple venues over the years, and I'm happy to see the wider community start calling this out as childish.


I don't particularly care for Drew, but the issue he's reported here seems totally valid. And if he requested that he be excluded from getting hit by the crawler, wouldn't that mean it would be impossible for people to use packages from sr.ht unless they change their config?

Plus, it does seem reasonable to think that only one of the crawlers needs to hit the site. The global replication can happen at the FS level or, heck, the crawlers can just perform pulls from each other.


No. According to the Go project, adding his site to the exclusion list would reduce traffic to his site at the cost of freshness of the data the proxy collects; it would not make it "impossible" for people to use packages from sr.ht.

This is all in the thread that DeVault linked to from his post.


Which would still be far from great for any kind of source hosting website.


In what way would it be "far from great"?


Well, why have that proxy/functionality in first place if the best option is to disable it?


I don't even understand the question you're asking. Nobody is suggesting the proxy be disabled, including for sr.ht.

It's OK not to know the specifics of what this is about, but it's weird to have strong opinions about it if you don't.


Drew can often be very abrasive, but does it really matter in this case? His site is basically being DDoS'd.

Yes, there are decent arguments why the golang infra doesn't cache or respect typical norms like robots.txt, but they don't change the unreasonableness of the underlying situation. Surely some mitigation could have been worked out in the year since the ticket was filed?


They offered to turn off refreshing of his domain it appears on Jun 8, 2021: https://github.com/golang/go/issues/44577#issuecomment-85692...


That doesn’t seem like a solution at all and is actually kind of punative as that would make srht bad for hosting go.

I think this is just an example of Google being a jerk and not caring enough to do proper software engineering.

Go seems really interesting but I have avoided using it because it’s so tied to Google. And I don’t trust Google to make good decisions for developers or users.


It looks like a solution to me: Google stops proactive refreshing, and so users get data that is fresh up to the cache timeout.

Users who can't wait that long can disable the proxy, and SourceHut can recommend users do that.


Perfect succinct response. It is a 100% viable workaround.


Can you articulate why it isn't a solution, and how it would be punitive? There are people on this thread who appear to believe Google's workaround would mean that repositories hosted on sr.ht would be unusable as Go modules, which is not at all the case.


drew articulated it very well why google's offer doesn't help at all.

https://github.com/golang/go/issues/44577#issuecomment-85693...

A full git clone just to DDOS a hoster to check if the user-experience is still first-class, and filling a proxy is not an acceptable solution for a module hoster who has the pay the hosting bills by himself.

If they want to know if their proxy is still uptodate, a cheap latest change request 8x/hour would be appropriate.

> Have you considered the robots.txt approach, which would simply allow the sysadmin to tune the rate at which you will scrape their service? The best option puts the controls in the hands of the sysadmins you're affecting. This is what the rest of the internet does.

> Also, this probably isn't what you want to hear, but maybe the proxy is a bad idea in the first place. For my part, I use GOPROXY=direct for privacy/cache-breaking reasons, and I have found that many Go projects actually have broken dependencies that are only held up because they're in the Go proxy cache — which is an accident waiting to happen. Privacy concerns, engineering problems like this, and DDoSing hosting providers, this doesn't looks like the best rep sheet for GOPROXY in general.


You didn't answer my question. What's the problem with the Go team's workaround? I get that DeVault would like to redesign the Go modules system to suit his own preferences, but that's not on the table.


The issue isn't the Go module system but rather their proxy which is not part of Go and should respect server resources but doesn't. The workaround makes any 3rd party host for go modules a bad choice as packages will always be stale.

The issue has nothing to do with DD other than him raising the issue and being ignored. The fault is with google.


How exactly does the workaround make packages "always be stale"?


What would be the problem with the go team doing a quick `git ls-remote` instead of jumping to a full clone? All it would take is tracking the last `ls-remote` result in any of the Google's many options for databases, and only doing a clone when the remote updates.


Maybe there's no problem? It's totally fair to critique the design of the current module proxy. The odds of them developing precisely the right proxy were low; of course we'll be able to come up with things they can do better. That's how open source works.

It's when we turn this into a morality play that we go off the rails.


Considering they don't want caching, la what is considered basic rudimentary politeness, (and noting that the change increased traffic, so the proxy was intentionally doing busywork that no-one wanted or needed), The odds of them developing an unacceptable proxy were 100%

Its akin to putting up an open exploitable DNS resolver in 2022 despite, with 2 seconds of research, the entire world telling you not to do that

no one needs to full clone 2 times a minute just to check if a security update exists

one of the top money generating machines does not need a devil's advocate


Yeah so basically none of this is true? "Caching" is not considered basic rudimentary politeness, and what they did is not at all like putting up an "open exploitable DNS resolver" (also: putting up an "open exploitable DNS resolver" is pretty much still an industry norm). The "money generating machines" stuff doesn't add to the credibility of this argument.

I get that you feel like you could design a better Go module proxy. It would be weird if you couldn't, because you have the benefit of seeing what happened when we deployed this one†. Congratulations? It's an achievement, I guess?

For my part, I'd be thrilled if just a single person could articulate exactly what the impact to DeVault's code hosting service would be if he took the Go project's offer up on just not having them clone modules hosted on his service so often. Anybody at all, if you could just explain what the problem would be here, I'd be forever grateful.

(to wit: Go got a lot better, and apparently just 2 source hosts on the entire Internet had a problem with it, one of whom was fixed directly and the other of whom is apparently still mulling whether they want to burn the extra bandwidth for a cacheing benefit that literally nobody on the Internet seems to be able to describe).


> "Caching" is not considered basic rudimentary politeness

You don't consider this a truth, I do. By google's standards it is a truth, https://cloud.google.com/web-risk/docs/caching for one example

> I get that you feel like you could design a better Go module proxy.

I add "sleep 1" at minimum to all my scraping loops, and refuse to use clients that don't keepalive and cache (if I'm hitting the same endpoint), I've done this always, so yes I could have designed a better proxy today, and a year ago, and two years ago, and three years ago, et, al.

> Money generating

They can pay people, I'm not doing it for free for a language I don't use

> if he took the Go project's offer up on just not having them clone modules hosted on his service so often.

Do you know that he hasn't tried? considering how well google responds to emails

Some people have principles that you might not understand, I too would not ask someone to not DDoS me, because I feel that is an absurd request.

So not a impact to the technical aspects but to the administrative aspects. People are paying devault to run a service as devault would, it would be improper for him to not run the service as expected of him.

> burn the extra bandwidth for a cacheing benefit that literally nobody on the Internet seems to be able to describe

bandwidth and CPU time both cost money

- - -

There are alternative approaches that have existed before 1.16, Deno with its pull once then pull only when told to approach, has not had a crisis of security, Every other language works on demand, NPM at most checks a dedicated security issues API designed for mass consumption. These things existed, none of them were referenced.


Do you know that he hasn't tried? considering how well google responds to emails

You can't reasonably build an argument by just making stuff up. Maybe, against all evidence, Google offered to stop hitting sr.ht with Go proxy cache refresh requests and then ignored DeVault when he took them up on it. But I'm not required to assume that very weird situation is what actually happened.


In this case, it's really hard to see thrashing other people's servers relentlessly to collect data you already have as anything but incredibly, incredibly poor engineering. Y'all should write him a check for that much resource waste.


This comes to mind: https://news.ycombinator.com/item?id=31496063

>At Google we were told to stop thinking about all this stuff, that the storage hardware and software people were responsible for hiding things like wearout from application developers.

Something tells me this team was told to "stop thinking about all this stuff, that the network people were responsible for hiding things like speed, latency and cost from application developers." aka network is infinite, keep pounding that repo and we will scale accordingly (our side of the equation, sucks to be other people)


without knowing anything about this situation outside of this thread and the post it links to, it comes across as willful negligence to screw over someone who was a bother in past community transgressions


That's a risible suggestion. Even DeVault doesn't say that.


It's less than $200 per month to send out 4G daily. If his business can't afford that, there is something else going on.

What is the total daily bandwidth that sourcehut uses anyway? What percentage is go module fetching?


The 4G daily was a different user who hosted a go module where he was the single user on his own server, this was not DeVault.

I'd be pretty pissed if I hosted a go module essentially for myself and suddenly I have a $200 dollar bill, because google decided to clone my repository 500 times a day. If it doesn't bother you, how about you donate $200 a month to a charity of my choosing, because it doesn't matter to you.


Self hosting costs money, for this one user it would seem the options of blocking or other options are more tenable

If money was a problem, I'd expect this individual to ha e rectified it on their end


So tell me why do people use DDoS protection? It's just money. If you run a server you should be able to eat all the cost!

Seriously do you follow through what your arguments actually mean if applied in general?


There was no actual DDoS, so no need to compare

Should every language be responsible for paying the bandwidth bills for dependencies?

You might look at the most recent comment from the Go team on the issue, there have been no additional requests or events since they last resolved it for both of the effected parties


Plenty of bootstrapped businesses have better things to spend $200 / month on, let alone the time spent trying to figure out where all the anomalous traffic is coming from. As I understand it, it's not simple file fetches either. It's cloning a repo, which involves two-way communication, consumes CPU and RAM, and causes disk seeks. You're not slapping it on CloudFront and calling it a day. Finally, it looks to me like the costs are going to scale the more people he has using sourcehut and writing Go modules.

I don't really understand turning this around on him. Why should he have to subsidize Google? If it's not a problem, why do we have robots.txt at all? Just let bots hammer your site and cope with it.

The current situation can't be the optimal solution. It wasn't even present prior to Go 1.16. Only one company has the ability to change that. What should he do differently here? Why should he have to spend any of his time or money working around an issue he didn't create?


That was a different user. The fact that a user not running a git hosting service is potentially eating $200 a month should queue you into the fact that the cost to Drew is likely drastically higher than that.

Google should be sending reimbursement checks for the damage done here on this issue.


Drew is running a code hosting business and this is a cost of providing a feature to the users. He can pass the costs on if it is a problem. He has lots of options and his competitors are not making a big deal out of this.

I suspect he's drawn his line in the sand and wants to keep it going rather than finding a solution that works without requiring upstream changes.


If I provide a paid service and someone abuses it I must deal with it because my larger competitors deal with it? It's good to know that small businesses have no place in the modern world.


> but it has security implications and adds complexity

Read: we prefer to use your servers for caching. Not good enough. Maybe the issue is people making silly evasive arguments like these while the server load piles on?


> and still tries to be helpful

"Assuming everyone else have exactly same design choice and architecture as yourself, making suggestion on this ground and calling other people crazy because they can't implement what you suggest them to do" is not trying to be helpful.

Well, or maybe I'm just frustrated reading his repeated "please keep a copy 'locally' somewhere and run git fetch". Just like how I'm frustrated arguing with him on HN about whether sending patch to mail list is better than GitHub pull request.

Don't get me wrong, I understand this is a real Google-scale system v.s. individual code hosting website issue, I just don't see how his "please stop being Google and instead try my works-fine-on-one-box solution" take is helpful.


I often don't agree with DeVaults opinions, however the design decision here is actually costing other people real money and I have not seen a convincing argument why it is necessary. Instead they acknowledge it's an issue, but then continue that behavior for nearly a year, which IMO shows complete disregard for smaller services or individuals hosting their own repositories.


Suggesting ideas is not the same as assuming everyone else should have the exact same design choice and architecture. The problem is that the actual issue owner on the Google side does not appear to have any solutions (or is burying solutions because they may be too complex).

It's pretty clear that the issue owner agrees there is something wrong here. That's why the issue is kept open and their only reason for not fixing it is that it's a complicated fix which requires effort.

As a workaround, they offer Drew the opportunity to essentially disable his service from being used as a useful source of Go modules.


Cloudflare manages to cache things and then serve them to lots of people, without having to request the same content 50 times an hour. Maybe Google should move some of their services over to a better content delivery platform? o.O

This is a scenario where anyone who isn't a Googler can see that "being Google" is not a justification for bad engineering. Do better.


Just about every CDN out there manages to cache at a global scale from a single hit, so I don't understand what's so vastly different about the go mirror system that it can't do the same under the hood.


bad faith refers to his behavior on other issue threads. also he (used to) spam the issue tracker with ads for his services


> also he (used to) spam the issue tracker with ads for his services

Do you have links to those ad spams? I couldn't find obvious ones when I was using GitHub's issue search field.

Edit: It seems ddevault opened and/or replied to only 10 issues, which I would hardly call "spam".


> also he (used to) spam the issue tracker with ads for his services

If there's any entity I'm totally ok with anyone spamming with ads, it's Google.


It's not "Google" you'd be spamming, it's the Go language community, most of which has nothing to do with Google. Please don't make high-drama arguments like this.


“DDoSed”

He complains about outgoing traffic of 4GB/day. On a server. In 2022.

If this is an issue he should not be hosting Go modules on his own servers.


4 GB/day is a different person facing the same issue who is hosting a single module they claim is likely only used by them.


Still whats 4GB a day?

I really really don't get the issue.

Is he io bound? Is there no reasonable way for git hosting services to actually cache git checkouts?

DDoS / DoS should only be used, in my opinion, when the server goes down complelty not if it has a little bit of load and i still haven't seen anything which indicates that this is a real issue?

And i don't want to come across negativly, i just don't get the issue.


Given that Google decides to do that effectively out of bad design.

What if your website hosted 100 Go modules and it scales linearly, that's 400GB, still not a problem? What if every programming language did that?

I don't really understand how that's not an issue. Sure it's not a big deal, if just one service does that, but it's still something that should be fixed. If you are Google it should be expected that this fix is not taking over a year.


I see the fairly civil communication you had with Drew in the first link, but your inclusion of the second (given it's an unrelated issue) just feels like you're throwing mud in order to minimize the technical concerns he raised.

It seems like the only solution suggested there is one that makes the "small fish" service less useful as a go repository. I'm not surprised he didn't like it.


It speaks to why he might have been banned, which is the context of this thread.


We shouldn't have to speculate as to why he was banned, as the CoC requires communication to him about why. If they're breaking CoC to get rid of the guy (and likely because the CoC prevented them from doing so in the past), it's hard to justify no matter the reason.

And to be frank, filippo's contribution to this thread places a clear reason why- drew was incredibly rude in the second link, but his arguments were not bad faith. He clearly meant everything he said, and I would even argue that he was right, although his method of engagement was unacceptable.

More importantly, the exchange was 3 years ago, and the more recent exchange showed him engaging exactly how I would expect a professional to. If the 3 year old exchange is the best filippo could find, it indicates that drew has changed his tactics for the better. So why the ban?


Because Google's full of assholes who just don't give a damn that the rest of the world doesn't have the unlimited bandwidth and storage of the Big G.


I do not understand why the issue is not discussing DeVault's straightforward robots.txt suggestion:

> Have you considered the robots.txt approach, which would simply allow the sysadmin to tune the rate at which you will scrape their service? The best option puts the controls in the hands of the sysadmins you're affecting. This is what the rest of the internet does.

The only explanation I see later in the thread is:

> For boring technical reasons, it would be a fair bit of extra work for us to read robots.txt, so rather than going to a bunch of work to do that, we implemented a trivial list and offered to add sr.ht to it.

This is...not a satisfactory technical explanation. Perhaps the Go team should consider providing more openness and transparency about why it's "a fair bit of extra work" to implement an internet standard like DeVault suggests.

Honestly, DeVault is an abrasive person and not necessarily someone I would go out of my way to work with, but I don't see how the Golang team isn't at least somewhat at fault here for brushing off a community member with "Eh, it's too complicated for you to understand". That's not how you build a thriving community around a language.


They didn't brush him off. They gave him an immediate workaround, which --- contra some messages on this thread --- did not entail making sr.ht unusable for Go projects. At the time he posted this, he had not taken the Go team up on that workaround; doing so appears to involve only DeVault saying "go ahead" to the Go team.


> They didn't brush him off.

I disagree, the "for boring technical reasons..." is as close to a brush-off as I can see. This is a technical issue tracker, why not be open and honest about the reasons? I feel like these days people just seem to take it on faith that "Oh, it's Google, surely they know best when they say it's a mysterious technical issue that's too hard to solve".

Also, banning someone from the issue tracker in violation of their own CoC does not seem like good faith behavior either.

Note that the message from 'FiloSottile does not clearly spell out whether DeVault's ban was following a CoC, it just obliquely quotes some nasty stuff that he's apparently said in the past. I think other commenters are correct to call it irrelevant, it's about as relevant as it would be if I started randomly quoting Rob Pike's notorious comments about syntax highlighting [1].

Overall, I totally acknowledge the Golang team is providing a workaround that will solve the (admittedly abrasively spoken) user's problems. What I'm saying is that that is not enough. If you want an open language that will not turn into .NET, you have to do better by your community and also be transparent as to how and when you're planning to solve your users' problems. And preferably also be transparent about CoC bans, especially if there is a technical discussion involved.

For example, this sounds like a reasonable response to me:

"We have written up a task to make our proxy code parse robots.txt correctly, but there are N other tasks above it, and it looks like we won't be able to work on it for approximately the next six months. Until then, a workaround is..."

----------------------------------------

[1] https://groups.google.com/g/golang-nuts/c/hJHCAaiL0so/m/E2mQ....


This is how every issue tracker works. It takes time to explain things (and more time to generate all the facts needed to put together an accurate explanation). What matters is that they got him a workaround --- apparently, almost immediately. He hasn't taken them up on it; it's unclear why.

There's no indication I can see that DeVault's "banning" from the issue tracker has anything to do with any of this.

There's always going to be some standard the Go team (or the Rust team, or the Clojure team, or the Scala team, or any other team down the line until you just decide to do your own language and hope for the day you get to occupy the gratifying job of fielding these kinds of tickets) doesn't meet, for someone on some message board somewhere.


This claim is outrageous- if disabling sourcehut is a valid workaround, then google should end the refresh behavior on the proxy since it isn't providing any valuable service. If the proxy provides a valuable service, then disabling sourcehut would degrade service for sourcehut users.


That logic doesn't even cohere. If Github and Gitlab can sustain more aggressive queries from the module proxy, they may have fresher cache entries in that proxy. That's what being more capable gets you. The Go team isn't obligated to operate at the lowest common denominator of all source hosts.

If the Go team's workaround was to make sr.ht not viable for Go modules, that would be deeply problematic. But it isn't; in fact, nobody who's upset about this seems to be able to describe what the Go team's workaround would even impact. Can you?


So what is your opinion on the proxy behaviour then?

I know is not like knowing what Google thinks about this, but I'm curious about how something like what is described in the post is allowed to happen.


I would be surprised if their opinion differs from that of the first link they sent.


Seems like recrimination to bring up unrelated items.


> Disclosure: I was on the Go team at Google until earlier this month. Dealing with DeVault's bad faith arguments is one of the few things I won't miss of that job.

Speaking of bad faith arguments, aren't Google and its employees the ones that:

- Claimed that scrolling screenshot on Android are "infeasible" [INFEAS] despite Samsung, LG, HTC, etc. already implementing the feature in their forks/distributions of Android?

- Claimed "[t]he generic dilemma is this: do you want slow programmers, slow compilers and bloated binaries, or slow execution times?" as an excuse for not implementing Generics in Go?

- After years of claiming this nonsense finally implemented Generics in exactly those ways? [GoLang_GenImpl]

- Even when implementing Generics in exactly those old and previously known ways claim that "Russ Cox famously observed that generics require choosing among slow programmers, slow compilers, or slow execution times. We believe that this design permits different implementation choices." [GoLang_43651]

- "Loses" or "loses track" of their customer's mobile phones, and when said customers cancel the credit charges for the phones they never received or otherwise disappeared, block their Google accounts, including their access to email? [Google_Criminals]

Drew DeVault certainly deserves some or even a lot of criticism, but Google and its employees attacking other for "bad faith"? This should be acceptable in a civilized society.

[GoLang_43651]: https://go.googlesource.com/proposal/+/refs/heads/master/des...

[Google_Criminals]: For example: https://www.reddit.com/r/GooglePixel/comments/7nrx07/google_... and https://www.reddit.com/r/GooglePixel/comments/84sysx/update_... There have been other documented cases of the same so this is not a single uncharacteristic incident.

[GoLang_GenImpl]: https://go.googlesource.com/proposal/+/e0113ba8479092562cf9d...

[INFEAS]: https://issuetracker.google.com/issues/80491647#:~:text=Stat...


Guy’s public behavior is constantly abrasive, I’m honestly surprised at the minor following he has.

Being right isn’t the only most important thing in the world, nor does it wash away all transgressions.


> The Go team holds that this service is not a crawler, and thus they do not obey robots.txt

This seems wrong. I guess I always assumed that robots.txt applied to non-humans.


I think they are right not to obey robots.txt in this case. If I tell Go to download a module it shouldn't follow robots.txt because I am a human and I requested it. This is similar to if you had private iCal URLs on your server the right think would be to deny them in robots.txt (in case a crawler found a leaked link) but a service that is monitoring a specific iCal calendar should still fetch it.

Basically robots.txt is more about what should be crawled than how. In this problem it appears that the traffic is desired in general, but it is being done far too often. robots.txt does have primative rate limiting configs but that seems to be a minor part of the file.

Of course like anything there is nuance and there is definitely some middle ground between crawlers and humans.


Precisely, here the service is fetching stuff regularly even if no one is asking anything, if I understand correctly.


It apparently pulled a repo only used by the author 500 times in a day.


Even the main Google search crawler does not respect Crawl-Delay from robots.txt (as of today), so it's a moot point anyway.

I suspect it's just a cultural thing inside Google.


I don't think so. When I worked at Google most things that talked to the outside world were carefully rate limited with quite generous backoffs if there was signs of the receiving side being harmed (errors or slowdowns). Especially crawling and mail sending. But Google is a big company and it is very likely that the Go team didn't think of these things that are probably top-of-mind for someone on the crawling, ads or mail teams.


I'm sure the reasoning would be something along the lines of these requests being made in response to user activity of some sort.


From https://github.com/golang/go/issues/44577#issuecomment-85107...

> Yesterday, GoModuleMirror downloaded 4 gigabytes of data from my server requesting a single module over 500 times (log attached). As far as I know, I am the only person in the world using this Go module.

From https://github.com/golang/go/issues/44577#issuecomment-78924...

> yes we make a fresh clone every time

I like golang as a developer, but this is a terrible implementation. I'm somewhat tempted to say that blocking the Google IP addresses is the correct answer in that it will force some sort of wider action (linux repos setting `GOPROXY=direct`, Google fixing their code, or unfortunately, golang modules moving off sourcehut).


From https://github.com/golang/go/issues/44577#issuecomment-11378...

> Anyone who's receiving too much traffic from proxy.golang.org can request that they be excluded from the refresh traffic, as we did for git.lubar.me. Nobody asked for sr.ht be added to the exclusion set, so as far as it's concerned nothing has changed.


sounds totally scalable especially coming from a company that is notorious for having zero support. /s


one can definitely see what people are promoted for at Google :)

> the Go Module Mirror runs some crawlers that periodically clone Git repositories with Go modules in them to check for updates.

>The service is distributed across many nodes which all crawl modules independently of one another, resulting in very redundant git traffic.

basically slapping together a very inefficient alpha, and obviously people aren't promoted for fixing such glaring inefficiencies to make it even into a half reasonable beta.

And that is just hilarious, like people in Google never heard of CDN, HEAD, git fetch, etc. - of course they know it, and it is really just an arrogance of an 800lb gorilla toward "small-fish" - "https://github.com/golang/go/issues/44577#issuecomment-85692... - with a passive-aggressive blackmail of a cherry on top :

>In the meantime, if you would prefer, we can turn off all refresh traffic for your domain while we continue to improve this on our end. That would mean that the only traffic you would receive from us would be the result of a request directly from a user. This may impact the freshness of your domain's data which users receive from our servers


It sucks having to work around something like this, but maybe the following would work: only allow the first checkout from a given Go node, blackhole later accesses. If the repository is modified or a certain time elapsed, reset and allow a download again.

Also, if you want to escalate, I wonder if there is a way to create a fake git repository that expands to a huge amount of data when cloning, but uses minimal bandwidth on the server side. Set up a bunch of those on some other host, and use them from go, and wait until google notices. Something like this: https://github.com/ilikenwf/git-zlib-bomb


The problem is that a) you want it to work properly for users of Go modules hosted on your Git forge and b) you don't want to effectively waste time on working around a problem that the GoProxy engineers could just solve with a more intelligent (but maybe more complex) design.


Malicious behavior would come back to byte the author and his users & business


Not neccesarily. I was thinking of this story: https://news.ycombinator.com/item?id=26074139

> At the height of the browser wars I once woke up to Microsoft hotlinking a small button for downloading our software from the MSN homepage. I tried to reach someone there for hours but nobody cared enough to do something about it. The image was small (no more than a few K), but the millions of requests that page got were enough to totally kill our server.

> Finally, I replaced the image on there with a 'Netscape Now' button. Within 15 minutes the matter was resolved.

And there was another case where some Indian app was using excessive traffic from Wikimedia, which was solved similarly. Sometimes you have to do something to get the attention of the right people. (And I could imagine if you do it right and are not too malicious, you might actually get a bug bounty on top.)


Yes, I also hope EU breaks Google apart.


I'm wondering how much load this is sending to github. Github has a ton of golang packages, including many non-popular ones that wouldn't otherwise get much traffic. A refresh job running full clones many times a day must be burning up bandwidth and compute over there as well. I suppose it's a drop in the bucket for Github's usage, but it's got to be a huge number.


I wouldn't be surprised if half the value of the Microsoft acquisition was to get GitHub's infra closer to the super-scale pipes and peering that Azure/Office/Update/etc leverage. I would expect they probably run tens to hundreds of PBs per month.


Or they have custom code for bigger services.


Probably the most egregious part is the footnote:

```

I was banned from the Go issue tracker for mysterious reasons[1], so I cannot continue to nag them for a fix.

[1]In violation of Go’s own Code of Conduct, by the way, which requires that participants are notified moderator actions against them and given the opportunity to appeal. I happen to be well versed in Go’s CoC given that I was banned once before without notice — a ban which was later overturned on the grounds that the moderator was wrong in the first place. Great community, guys. ↩

```

What happened to "don't be evil" I wonder? (And I know, it's that Google has now become a big grown up corporation)


I'm curious if the Go crawler would correctly back-off with "429 Too Many Requests" responses. Either way it could save the bandwidth.


Of course not, the crawler doesn't even read robots.txt because it's "too difficult to implement"


What I find shocking is not Google doing this or even how Go team didn't care about the issue and apparently banished him, but how much people here think this is an acceptable behavior and how much people take their time to defend Google and to justify this behavior blaming not Google but Drew because "it is hard to deal with him".


The fact that a programming language calls home to by Google by default should make it a non-starter for most sane developers. The fact that it calls home so it can DDoS other sites is low-key hilarious. And you'd think Google would know how to like... operate an efficient CDN, perhaps?

Like, if this was managed by a competent company, you'd think this service would be akin to putting Cloudflare in front of your servers: It should minimize the heck out of your traffic because Google is able to cache it and serve it to the masses more efficiently. But it's Google, so it Googles.

Being blocked from the issue tracker for not being okay with being DDoS'd is peak Google. I am sure you were accused of a Code of Conduct violation, because declaring a CoC violation is much, much cheaper than fixing their infrastructure.


You can phone anywhere you want; the proxy is configurable, and there are (from what I can see) several independent implementations of the proxy itself. Go defaults to Google's proxy; few people change the default because Google's proxy is very good.


> Go defaults to Google's proxy; few people change the default because Google's proxy is very good.

What percentage of users do you think actually know that the proxy is even there?


This article strongly suggests Google's proxy is not "good", even if it "works". It sounds atrociously inefficient.


It brings a lot of security to the Golang dependency system. It is more than just a dumb proxy or cache


Smarter than your average proxy, but apparently dumb enough it doesn't actually cache at all, according to the reports.


Obviously it does cache, because the simplest solution offered by the Go team is to exclude sr.ht from the cache refresh that DeVault is complaining about, and he hasn't taken them up on that.


If it's such a widespread problem that there is a process to exclude domains from the cache refresh, doesn't that point to it being an issue that should be fixed by the party responsible instead of everybody else separately?


So far as I know, DeVault and the one other person who chimed in on his ticket are the only people who have ever availed themselves of this process, which may well have been created specifically for them. So, no, I don't think that's a valid point. Also: it is being fixed by the party responsible.


was being fixed. They just marked the issue as resolved and blocked any more comments on the thread.


They closed for comments because they started getting toxic, such is the modern internet mob mentality


Go was originally advertised as Google "sponsoring" Rob Pike & Co. to create a programming language. For a while it even sounded like the Java, Python, C++ devs at Google didn't even like it.

The website focused on the programming language, was basically in Plan 9 style, no nonsense, the programming language sane and very cross-platform.

Things changed. Now one has to scroll through big company brands and marketing blabla to even get a glimpse of the language, there's big Google logos, everything is JavaScript heavy, really well written[1] tutorials got hidden and there is now tutorials with mistakes and bad code that got pointed out[2], but are "won't fix", a programming language where people used to admit that new() is kind of redundant and maybe wasn't the idea tries to pull in random programming language features[3]. On top of that it was a programming language starting out to embrace the fact that there are many different operating systems officially supporting even Plan 9, DragonFly, etc., but after already having started to hollow that out creating tiers it seems they now completely want to shift away from that[4]. And then of course there's things like the module proxy causing issues.

People got their hopes up, but it almost feels like Google is pulling an embrace, extend, extinguish on Go - or at least its original design. But I still hope that I am just seeing things that aren't there.

[1] https://go.dev/doc/effective_go

[2] https://groups.google.com/g/golang-dev/c/kC7YZsHTw4Y/m/u0_d9...

[3] https://github.com/golang/go/issues/21498#issuecomment-11322...

[4] https://github.com/golang/go/discussions/53060


> The fact that a programming language calls home to by Google by default should make it a non-starter for most sane developers.

How is it any different from Rubygems, NPM, PyPi, or any other package repository? In most you can bypass it by using git repos, but almost no one does that. And the GOPROXY does offer real benefits, such as preventing left-pad problems.

As others have said, if you really think this is a huge problem for you then it's easy to disable, which is actually easier than what most other package managers offer, but I don't really see the harm in the first place.


Are you going to get upset at node for calling home to Microsoft (npm owned by github owned by microsoft) when using the supplied package management too?


If you're pulling down a packge with npm that isn't part of the npm registry, the request isn't going through a Microsoft proxy.


Same for Go, only thsy Go has an opt-out "registry" rather than opt-in.


I mean, I strongly just feel the npm ecosystem is a dumpster fire of sadness. Though there is a fundamental difference in that if you use NPM, you are choosing to download packages from NPM. But if you specify a bunch of repos you want to download Go packages from, and you get them from Google anyways, that's pretty uncomfortable. Especially if Google is in turn also DDoSing the repos in return.


No I did not "choose to download packages from NPM". Choosing means there is an alternative. And besides that the registry part is invisible in your package file (hardcoded default like GOPROXY).

If you look at it, it's pretty similar. The part before the first slash is invisible:

npmjs.com/org/package

proxy.golang.org/org.com/package

The difference with Go is that you can actually change the first part if you want, or even disable it.


How could node not do that? NPM hosts all the packages so of course it does.


They could do it similar to Go. Instead of "facebook/react" it could be "facebook.com/react" in "package.json".


Well yes, but my point here is that NPM's entire design is around not doing that - whereas Go's design isn't


IPFS/bittorrent for P2P


Yes. I find all of the modern "always connected, always call home" behaviors quite annoying in my programming tools.

But, to be fair, Node was a dumpster fire long before being bought by Microsoft.


The problems with npm are completely orthogonal to what the author is describing here.


I mean, yes, it would be valid to complain about that as well


that's what big organisations do, it' their DNA to flood and exhaust smaller ones. They just don't care about anybody not their scale. Ignorant egoists.


I'm not a Go user so I don't really know how to Go proxy works, but if webmasters had a way to opt out via robots.txt or specific status codes (429) and then the proxy at Google sends a response to the client meaning "try it yourself" (i.e., try again in direct mode) then it could have been an adequate solution for this scenario.


I don't get it. Without the proxy, wouldn't all the (potentially larger) traffic just hit the server directly anyway?


My read is that there's a refresh job that's doing hundreds of full git clones even for packages that don't have users.

Package with approx. zero users: https://github.com/golang/go/issues/44577#issuecomment-85107...

Full clones: https://github.com/golang/go/issues/44577#issuecomment-78924...

They disabled refreshes (for someone else) to fix the issue: https://github.com/golang/go/issues/44577#issuecomment-86271...

So maybe there is some caching happening between the user and the mirror, but the mirror is hammering the origin with much more load than should be needed.


Athens, a self-hosted free Go proxy implementation, implements rate-limiting for GitHub (with a pretty horrific Github-specific behavior). This means GitHub does implement rate-limiting to cope with aggressive Go proxies.

https://github.com/gomods/athens/blob/723c06bd8c13cc7bd238e6...

Food for thoughts.


There is a very simple way to get them to stop sending the .5 qps that is described as a ddos.

The linked bug show another user successfully saying "please opt me out", and Google building the feature to do that in a week.

Drew has for some reason chosen not to ask for an opt out, even though it appears trivial and would probably be fixed by the weekend if he asked for it.


Calling it .5 QPS to downplay the severity is willfully ignoring the complaint in the article. It's not just a query, it's a full Git clone of the entire repo with it's history. That's a huge difference.


That's what a "Q" means in this context. It's a git hosting service.


True, though I don't think a "Q" from a client doing a fetch involves as much work as a "Q" from a client doing a full clone.

It seems like this Git hosting service's workload-per-Q has increased due to the Go proxy change, which (IMO) is a cause for concern.


I think he's been banned from the issue tracker.


Possibly, but he was actively participating in https://github.com/golang/go/issues/44577 up until the week it was fixed.

If that's the root of the issue though, most of the article ("no one will get promoted for prioritizing that at Google.", "the go team has not prioritized it", etc.) is wrong. They may not have addressed the issue in the precise way he wanted, but I also think it's rather unreasonable to expect someone to not only address your issue, but address it in the precise way you suggest.

That's rarely globally optimal and feels rather entitled.


So the solution is that everyone who does not want that behavior should request that they turn of refresh traffic for their domain? How long would it take until they would complain about people spamming their issue tracker if everyone who self-hosts a go module would start requesting this?

Maybe they should request 500 times a day because that's what the proxy is doing.


> How long would it take until they would complain about people spamming their issue tracker if everyone who self-hosts a go module would start requesting this?

This same question comes up for ~any CI service or any number of other services that regularly build from source. There's a reason that github maintains a cache and that repo clones are served from that instead of by invoking the git command on the server.

A git hosting service that can't handle a trivial clone qps isn't actually a git hosting service.


The guy who had 4G of traffic a day was self-hosting his go-module where he was the only user. Google downloaded his repo 500 times a day, that's some serious traffic for a single person, and no he was not a git hosting service.

Also traffic is not free, so even if they maintain a cache somewhere there is a cost to this.


> The guy who had 4G of traffic a day was self-hosting his go-module where he was the only user. Google downloaded his repo 500 times a day, that's some serious traffic for a single person, and no he was not a git hosting service.

Ok so let me answer your previous question more explicitly:

> So the solution is that everyone who does not want that behavior should request that they turn of refresh traffic for their domain?

Yes.

> How long would it take until they would complain about people spamming their issue tracker if everyone who self-hosts a go module would start requesting this?

We have an answer to this. It is, empirically, a very long time, as there seem to be a handful (possibly only 2, certainly less than 5) users impacted negatively enough to report an issue. It does not usually make sense to solve an issue that doesn't exist. Presumably dozens of users reporting the same issue would cause a reprioritization, but if no one is complaining, there's no reason to change priorities.


> We have an answer to this. It is, empirically, a very long time, as there seem to be a handful (possibly only 2, certainly less than 5) users impacted negatively enough to report an issue. It does not usually make sense to solve an issue that doesn't exist. Presumably dozens of users reporting the same issue would cause a reprioritization, but if no one is complaining, there's no reason to change priorities.

You have enough information to understand that it is a problem, regardless of how many people are actually complaining to you. I self host git, if Google was chewing up 120gb a month of my bandwidth I'd definitely feel it but I wouldn't figure out what was going on easily, let alone figure how how to complain and stop it.

The behavior is rude and bad engineering. Just because people aren't yelling at you for leaving dog poop in their yard doesn't mean it isn't a problem for them too.


I think the problem with the proposed solution (https://github.com/golang/go/issues/44577#issuecomment-85720...) is that the refresh is used to keep the cache from getting too stale. Would it be reasonable for golang modules on sourcehub to be served from a stale cache? The problem is that the refresh does a full clone which is too heavyweight.

The other user who accepted that proposed fix had a single module that had a single user, so they can tolerate some weird caching behavior for their toy project.


If I'm reading correctly, it would not be stale, but be directly proxied, which would mean that a popular module could result in a higher overall qps (but this is also true if people clone directly from sourcehut). On the other hand, Drew's suggestion would actually result in stale caches.

There comes a point (and it isn't a particularly far-off point) where disabling this would result in significantly increased traffic to sourcehut. That's a problem, but it's a problem with sourcehut's scalability.


> This may impact the freshness of your domain's data which users receive from our servers, since we need to have some caching on our end to prevent too frequent fetches.

They mention that requests still are served from the cache when refresh is disabled.


Wait... Why not black hole them? Make it their problem and they'll eventually do the correct thing.


" I can’t blackhole their IP addresses, because that would make all Go modules hosted on git.sr.ht stop working for default Go configurations (i.e. without GOPROXY=direct)"


This seems like a solution, right? People will ask why their stuff is breaking, and either drop Go for being unreliable or get google to fix it's behavior. Win/win


Or switch hosting providers. Lose.


Because I'm assuming that he does actually want his website to work with go modules.


Why not just stop using Go? Why not Nim? Crystal? why not even Haxe? Why not stop using things built by and built for EvilCorp (TM)?


So this has been going on for a year, Drew DeVault has actively tried and explored several sane options and channels. That's a quite a bit of patience and goodwill before making a public blog post.

There are basic mechanisms in place both on the git side and on the protocol side that can prevent this kind of wasteful, digital harassment.


Drew has built himself a reputation of complaining loudly about stuff that most of the general public of users don't care about and generally unnecessarily increasing the burden for Open Source maintainers. In this particular case I think he has a valid point but his history is probably getting in the way of successfully getting people to see it.

The lesson here, is that if you make everyone dislike dealing with you then you'll find it very hard to get them to work with you when you need them to.


[flagged]


This a consistent trend on threads that are critical of Google and Apple.


You cannot write comments like this on HN.


Stop this go madness.


Are there any ethical, reliable, and privacy-respecting implementations of Go? The reference implementation at go.dev is clearly a piece of malware with a built-in compiler.


With rubygems.org or hex.pm, the package maintainer must explicitly publish a package to the registry. The registry never poll the original source, so the whole issue would not exist in the first place. I found Go's package registry very strange and limited:

* What if I am using a source control program other than git? Mercury?

* What if I have files in my git repo that I don't want to show to the world, like tests?


I’m surprised. Most DDOS’s come from Microsoft these days. Azure Cloud is the #1 go to place for botnets to attack the Internet. I suspect it’s allowed on an official level, it’s too flagrant to be done without their knowledge or consent.


It's not DDOS. That's "rhetorical flair" from the author.


No offense, but you built a git hosting website. and now you don't like the (pre-existing) ecosystem of things that use it?

Maybe just block go packages altogether if it's that big of a deal


There's a difference between making something available and being okay with wasteful and excessive usage.


Have you not tried to get in concat with anyone else before? "mysterious reasons"

Did you try go get in contact with them directly and clarify it? Did you talk to a moderator?

Besides that, you are hosting code right? How is 70gb of traffic a DDos?

I have to say, traffic numbers like you posted, would not concern me at all. I'm very curious why your underlying git usage can't do better caching.


> Did you try go get in contact with them directly and clarify it? Did you talk to a moderator?

As noted in the article, he did, and then Google banned him without warning.


That's not what the article says. It says he doesn't know why he was banned; it doesn't establish causality.


The comment above does not claim causality, only a suggestive ordering of events.


I was not able to find anything in this regard. I only saw a quite pro active ticket thread




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: