Hacker News new | past | comments | ask | show | jobs | submit login
GitHub Says ‘No Thanks’ to Bots — Even if They’re Nice (wired.com)
130 points by cyphersanctus on Dec 29, 2012 | hide | past | favorite | 95 comments



I'm quoted in the article and I wanted to clear one thing up that was missing from it.

When bots get reported to us by people using GitHub our support folks reach out to the bot account owner and encourage them to build a GitHub service[1] instead. As a service, the same functionality would still be available to everyone using GitHub, but it would be opt-in instead.

A few months ago we heard from some developers of service integrations that beyond the existing API features, it would be handy to be able to provide a form of "status" for commits. We added the commit status API [2] in September to accommodate that. We're always open to feedback on how the API and service integrations can improve.

The point is, GitHub services are a much better way to build integrations on GitHub.

[1] https://github.com/github/github-services [2] https://github.com/blog/1227-commit-status-api


Well, devs are always going to create bots. They're a fact of life.

Why not establish an opt-out convention similar to robots.txt? The idea is that people who want to opt-out would create a ".robots" file in their repo, with "none" in it. Any bot that doesn't respect the .robots file is hellbanned.

The problem with opt-in is that people won't use it unless they (a) know it's available, (b) know how to get it, and (c) actually go get it. So people don't really do that. But establishing an opt-out convention like this solves the problem entirely, and it's simple.


Opt-out sucks because you are forced to deal with it. Opt-in makes the most sense because there are no hurdles to jump in order to not be bothered.

If no one finds or wants to use your service without it being forced on them, it can not be that great of a service.


I'd prefer opt-in, and a nice directory of robots you could essentially subscribe to. I did a little thinking about this a while ago when the Gunio robot was annoying some people about whitespace:

https://github.com/hartez/vcs-robots


you could have a single opt-in to bots-as-a-whole (as a github account setting rather than per-repo), and then blacklist individual bots if they end up being spammy. that seems like a nice balance between "do nothing by default" and frictionless adoption of the feature.


While this is a fine idea, why can't that be "opt-out to bots -as-a-whole (as a github account setting rather than per-repo), and then whitelist individual bots if you want to find out if they provide some utility"? That seems like the proper balance between "do nothing by default" and the lest friction required to use bots without requiring those who don't want to be involved at all needing to do anything?

If it's about discovery of bots, then come up with some meaningful way to make them discoverable (presumably better than the Apple App Store zing) without people needing to be exposed to them by default.


white listing individual bots is not low-friction. ideally i want the bots to discover me, not vice versa.


You do, but no one else does.

How about a header in email that the sender can include that forces the email to the top of your inbox and makes it undeleteable? You can opt-out by having an email address of user+optout@example.com all the time.


You do, but no one else does.

How small-minded. There are many of us who want to be discovered by bots. We just keep quiet because people like you have such strong opinions and aren't afraid to be mean about them.

Wouldn't it be ironic if your opinion was in the minority?


The people who want opt-in are not the bad guys here, and neither are the people who want opt-out. The bad guys are those who poisoned the well, tragedy of the commons, by abusing the system. Unfortunately, there is a greater chance of abuse than there is utility, and it's more difficult for everyone to manage opt-out on an individual occurrence basis than it is to manage opt-in centrally by trying to ban bots across the board. And neither are that actually that successful, despite there having been individual successes in some communities.


Yes, I want bots too.


Reddit's user made bots are an awesome addition and I'm very glad they are not opt-in.

I think there is room in every community for automated users, if the tools for managing misbehaving users is strong enough. Github however has been known to not have strong moderation tools, so perhaps a temporary opt-in policy could be used until those tools improve?


I don't want to create yet another file in my repo. I'm already plenty happy with a .gitignore and a Makefile/SConstruct file for my C projects, and a setup.py + associated files for my Python projects.

This "add a file to your repo" is starting to get annoying, there is even a project for it, called Filefile[1].

[1]: https://github.com/cobyism/Filefile


Someone should wrote a bot to submit pull requests with these metafiles.


Banning them and establishing services solves the problem for everyone but the bot developer as far as I can tell...


I'm with thwarted on this one. I don't want to put my repo on a service where I have to opt-out of things based on extraneous files in my repo. That's extra clutter in my repo for something that could easily be handled in other ways (not to mention the fact that repos would be opted-in by default).


I don't like this, because then it means I have to change my repo to make a Github feature work, when Github has whole configuration UIs on their site for both my account and my repos.

Also I don't see the problem with opt-in. If your bot/plugin/service is worth its salt you'll be putting some effort into marketing it (which includes explaining how to use it).

The alternative is Github marketing the opt-out, which seems a bit strange if they're launching a feature and advising users how to not use it so they don't get spammed.


One could regard it as the first implementation of a robots.txt-like standard for public code: the existence/contents of .robots in a repo/directory implies that the owner wants control over the type of mechanical contributions received, no matter where the code is hosted.

It could be opt-in (initially at least), i.e. a .robots with contents "all", "typo" or "whitespace" etc would allow bots of the given type (an empty or missing file would imply "none").


Why hellban? What's wrong with a normal ban?


Because then they'll know they need to use a different proxy and change the bot behavior to avoid detection.


What if you accidentally added a bug in your bot? What if the policy-guard has a bug? What if there's changed robots-policies you didn't have time to keep up with? What if there's an admin with a personal vendetta against some bot? (the three first bullets also show why opt-out is a bad idea, over opt-in)

You don't think it would be appropriate to warn the bot-owner about this first? Hellbanning must always be a last resort, not something you throw around as standard procedure for even the smallest misdemeanours.

The recent spread of hellbanning on internet forums really is a plague. I've been hellbanned on several sites for no obvious reason at all. On reddit for example it was because their spam-bot detected that i posted two posts containing the same link within too short time -.-


> The problem with opt-in is that people won't use it unless they (a) know it's available, (b) know how to get it, and (c) actually go get it. So people don't really do that. But establishing an opt-out convention like this solves the problem entirely, and it's simple.

I don't see any of those things as "problems". IMO, it should be opt-in, on a per-project basis. There could be another checkbox in each project's "Settings" tab. The default should be that people are left alone.


We got a lot of angry feedback about the whitespace bot that was roaming GitHub for a while. We tried to sit back and let people deal with it themselves (e.g. send feedback/patches to the bot owner).

We're not opposed to bots or services. We encourage it, and use one ourselves. The key is making it opt-in so it doesn't bother people that don't want it.

Travis CI is a popular addon, but they don't have a bot that runs tests and tries to get you to setup their service. They just focus on providing a bad ass service that you _want_ to setup.

Edit: You 'opt' in to a bot one of two ways:

1. You add their GitHub Service to your repository (see the Service Hooks tab of your Repository Settings). This is how Travis CI started out.

2. You setup an OAuth token with the service. Travis does this now, and provides a single button to enable CI builds for one of my repositories.


The whitespace bot was written by someone I know. He's an incorrigible troll and did not do it to be helpful.


As some have pointed out many of the requests they provide are quite helpful and it could be useful to have an interface to browse a selection of optimizations that have been offered up by bots.

What about splitting them off into a separate interaction lane to eliminate the noise they create elsewhere? A "Bot pull requests" tab. Make it passive, so it does not trigger notifications, emails or other active communication, but is available and streamlined.


I distinctly remember a Travis bot sending me like 4 pull requests that added a `.travis.yml` file...


That was a troll bot someone not affiliated with travis wrote, as I recall.


How do I opt in? I'm curious to see what will come up.


[deleted]


So I would have to add another file to my repository that has nothing to do with my repository's content? Thanks, but no, I think we have enough of those already.


> But here was a pull request from a GitBot. Bots don’t debate. “It’s like the first time you see a self-driving car on the road,” Michaels-Ober says.

Good thing he likened it to something we can all relate to.


In case any github people are reading this: you also have an annoying approach to web crawling "robots". Your /robots.txt is based on a white-list of user agents with a human readable comment telling the robot where to request to be whitelisted. Using robots.txt to guide whitelisted robots (like Google and Bing) is against the spirit of the convention. This practice encourages robot authors to ignore the robots.txt and will eventually reduce the utility of the whole convention. Please stop doing this!


Robots.txt is a suicide note.

http://www.archiveteam.org/index.php?title=Robots.txt

My personal server returns a 410 to robots.txt requests.


I have no clue as to why the author of that shit is as angry as he is, but I have zero interest in his opinion until such time as he learns to show me the issues, and not just blindly assume that anybody who is not as enlightened as him is a blind idiot.


Okay.


Git bots may not be that impressive right now. But imagine a future where an incredibly knowledgeable "programmer" is working with you on every project, doing lots of the busy work, and even code reviewing every commit you push. Except that programmer is a bot. This future is possible - but we need to encourage it, and not shut down the precursor at the first "sign of life".

If someone has a good track record of useful pull requests, would you mind if they contributed to your project? Would you care if it was really easy for them to write that helpful code because they've crafted the ultimate development environment that practically writes the code for them? So why do you care if the editor actually writes all the code for them?

That's essentially what's happening when someone writes a bot and it makes a pull request.

Sure, it sucks if there are unhelpful bots or people spamming up a storm of pull requests. But the solution to this problem is not to ban all bots or all people - it's to develop a system that filters the helpful "entities" from the unhelpful ones. This might be hard in some fields like politics and education, but in software development this is tractable, right now.

I sincerely hope that this is what actually happens. This is one of the first steps towards a world where common vulnerabilities are a thing of the past because whenever one is committed, it is noticed and fixed by the "army of robots". When an API is deprecated, projects can be automatically transitioned to the new version by a helpful bot. Where slow code can be automatically analyzed and replaced.

There are details to be figured out, an ecosystem to be constructed, perhaps more granular rating systems to be made for code producing entities (human or bot). Because it's "easier" for a bot to send a pull request, the standard of helpfulness could perhaps be higher. Communication channels need to be built between coding entities, and spam detection will become more important. But simple blocking and a cumbersome opt-in system is not a good solution.

This might be a stopgap until better systems are built, but it is not something we should be content with.


You need to clearly make the difference between:

1. real people (they can make regular pull requests)

2. bots advertising you a code/assets improvement service (this should never be in the form of pull requests you should see this as adds and you should have the opportunity to disable them and github could try to get some revenue by taxing the guys advertising through this)

3. smart "code bots" that could actually do what you say: maybe at first start by doing code reviews, then static code analysis, then even start refactoring your code or writing new code, who knows... but you would have these in a different tab, like "robots pull requests", at least until we have human level general AI :) ...for the same reason that you have different play/work-spaces for adults and children and animals (you don't want your son and your neighbors' pets running around your office or bumping into you in the smoking lounge of a strip-club!).

EDIT+: What the bot owner did in this case was to advertise without paying the guy on whose land he placed the billboard (and on whose land he himself stays without paying rent), except that it's much more intrusive than a regular billboard you can ignore!


Those categories are artificial. What about a bot finding patches to send but with a human review? And an army of humans sending spam PRs like they create fake accounts on Facebook?

The gp solution seem more adaptive and open to the unknown.


(3) is artificial, at least for now. But I will always want to see the difference between:

1. Pull request or issues file by real human being for non-advertising purposes (using the equivalent of a "spam filter" for them)

2. Any other stuff! - I want this labeled as "something else", regardless if it useful or spammy real bots or "human-bots" sending me adds.

It's a great future what the gp suggests, and I want it, but for now I want a clear distinction between "ham" and "spam", and for now it's probably better to separate "really human made content that's not advertising" and call everything else "possibly spam". If the need appears, they can start filtering the real spam. For now I just want everything that doesn't directly come from a human labeled as "bot pull requests" or "bot issues" or anything else, but labeled!


A bot which does lossless compression on images in open source projects and only submits a pull request (with all the relevant details) if there was a > X percent filesize savings? That's not spam, that's just helpful...


Potentially, yes, but what if the idea catches on and you have swarms of overlapping bots submitting pull requests? And what about bots that are well-intended, but dubiously helpful?

You might log in one day and find that your repo has pull requests from fifteen image optimizing bots, thirty-eight prettifying bots for different languages, four .gitignore patching bots, seven <!DOCTYPE inserting bots, eight JS semicolon removers, nine JS semicolon inserters, twenty-four subtly broken MySQL query sanitizers, and seventy-nine bots fighting over the character encoding of the readme file.


What about the well-intentioned but dubiously helpful PR from someone who just doesn't know what they're doing? What if that were to catch on and you have swarms of overlapping non-programmers submitting PRs?

These slippery slope arguments are a bit silly. If you're running an open-source project, you can either accept PRs or not, and if you're accepting them, you can review the code and approve it or not approve it. A PR from a bot is the same as a PR from anyone else, it's either helpful or not helpful. It's not currently a problem, and it's too early to speculate about worst-case future scenarios.


>A PR from a bot is the same as a PR from anyone else, it's either helpful or not helpful.

You're missing a key point here. The difference between PRs from bots and people is that there is a balance of effort on the part of the person giving the PR that bots do not have. PRs take time and effort to evaluate on the part of the repo owner. A PR from a person has a higher chance of being a meaningful change as the person had to spend their own time and effort to offer it. There is also a consideration of the nature of PRs from bots. They will necessarily be of a certain class of actions and in the vast majority of cases be low effort, low impact changes. Having these compete with PRs from actual people is not a good direction for github.

Also, the slippery slope retort is getting tired. We can and should use our reasoning skills and historical precedence to evaluate likely usages of new rules (you'd be negligent if you didn't). In the case of bots, the possibility of having repo owners spammed with low value changes is enough to disallow it.


>What if that were to catch on and you have swarms of overlapping non-programmers submitting PRs?

Humans can easily look at the existing pull requests and see if their work is redundant. Bots can't. And as hackinthebochs said, human-submitted pull requests involve effort which limits them.

It's not really a slippery slope argument. It's more an application of what we can see having happened with bots in the past. Email spam for legitimate offers is, after all, just about as annoying as email spam for scams.


Why can't bots look at existing pull requests?


They can, but it's unrealistic to think that they will do so one percent as intelligently as a human contributor.


They can, but it's unrealistic to think that they will do so one percent as intelligently as a human.


> It's not currently a problem, and it's too early to speculate about worst-case future scenarios.

Is that what they said about SMPT and spam?


A "mark as spam" button would work as long they could reflect the user's preference toward bots in general.


As article says, I wouldn't want hundreds of them. Imagine uploading a simple website, only to have your issue tracker harassed by JSLint bots, HTML validators and pull requests correcting your indentation, replacing <b>s with <em>s etc.


Honestly? That sounds amazing (as long as there's a way to manage the noise).


So some sort of "plugin" interface where you can opt in to certain bots on a per-repo basis? I could really get behind that! 100% automated and unsolicited? No thank you!


Hey, check the Service Hooks tab of your Repository Settings.


It would be really helpful if each had a short summary when you hover over them as its an awful lot of names which don't mean much. I came across Travis from elsewhere but never would have guessed it was CI from this list.


yeah, +1. I actually cloned the repo so I could peek at the source for the hooks to all these services I'd never heard of. Was easier (lazier) than googling.

https://github.com/github/github-services/tree/master/servic...


Exactly. I think bots are exactly the wrong shape for this task. What would be better would be a framework for automatically processing the files in a git repository and then generating a commit. GitHub could have an interface for uploading them and for applying them to repositories. You could then apply them yourself, or someone else could fork, apply, and then submit a pull request.


I'd much rather use those tools at my own discretion than have them clutter an area of the site where I expect to see feedback from my human users.

It would also make GitHub a much more intimidating place to submit code. I don't want those things for my smaller projects.


It's not always helpful. Consider code with a similar purpose on github (transforming images, compressing them) with an accompanying image test suite, where modifying these images will break tests.


The problem is that you will also get these: https://github.com/ajaxorg/node-github/pull/45

The account in question has been closed now, but it submitted thousands of pull requests to random projects to advertise their build service.


That was also a troll/spam bot that had no affiliation to Travis CI. Their GitHub integration has always been classy.


I'm pretty sure with a community like GitHub, those advertisements would turn around and hurt their service, instead of help it.

Even if it was the best build service in the world, as a hacker I wouldn't personally use it.


As technoweenie said:

That was also a troll/spam bot that had no affiliation to Travis CI. Their GitHub integration has always been classy.


Not if you don't want your images compressed, eg if they are test cases for a face recognition algorithm or indeed for an image compression service. Not everyone is using github for the same stuff you know.


That seems a pretty rare circumstance. I'm doubting a single pull request will cause that much bother, just deny it. Then hopefully the bot is smart enough to not resend, if it isn't then block the bot's user from communicating with you again.


Actually I think the instances where this bot would be useful are the rare ones. Most images on github are probably for the project's logo or gh-pages branch. It's simply not important that those be compressed and getting pull requests on things that are not core to the project's purpose are distracting.


Yes. A bot could try to identify what kind of project it was (are there html and css files etc). That might be an interesting project in itself trying to classify github projects into libraries, web sites, documentation, standalone apps etc.


Those images don't sound big enough to trigger the bot. So we're back to it being usually useful.


Explain to me how a better png is going to interfere with facial recognition? Are you suggesting some kind of algorithm that finds faces based on Huffman trees or related compression metadata? It sounds unimaginably fragile.


That wasn't a very good example. But you never want to modify any test cases.


That's certainly helpful but there's no doubt that someone would use a GitBot for nefarious purposes disguised as something useful.

Either they insert something in your pictures that you don't realize is there, or just good ol' spam (like Travis4all), or a backdoor that looks like a fix…


I'd like to optimise my images. (The images on my website.) I looked at https://github.com/imageoptimiser but didn't see which tool would do that, or any way to contact the author. Is there an image optimisation tool in there somewhere?


If you're on a Mac, ImageOptim is the perfect tool that combines a bunch of open projects to crush down images. I use it daily, it's an incredible (free) tool.

http://imageoptim.com/

If you're not on a Mac, the individual tools are still quite usable. Here's some of them.

http://advsys.net/ken/utils.htm (pngOUT)

http://optipng.sourceforge.net/

http://pmt.sourceforge.net/pngcrush/

https://github.com/kud/jpegrescan

http://freecode.com/projects/jpegoptim

http://www.lcdf.org/gifsicle/


Thanks a lot, @nwh!


Wouldn't that project also benefit from jpegtran?


ImageOptim includes it, I just couldn't remember the full name at the time.


I'm the guy who made the bot. I'm looking at relaunching this in a way that GitHub are happy with so that you can at least request an optimisation on your repos. In the meantime, check out http://imageoptim.com/


I wouldn't mind bots that fix spelling mistakes in comments or even actual bugs in code. But why not let github projects be configured to allow certain kinds of bots?


Consider bots as "plugins" that you activate on a per-repo basis. I like this idea!


Isn't that exactly what Service Hooks or OAuth-authed systems provide?


I think that's what jevinskie was implying, as in this is a non issue since service hooks are already established so there is no reason bots should be allowed, they should plug into the correct API.


> I think that's what jevinskie was implying

I'm not sure he'd have written that he "likes the idea" and would have failed to mention Service Hooks if he did.


I read his comment as either sarcasm or passive aggressiveness.

"Wouldn't it be a great idea if there was a 'hook' mechanism you could opt into that provides a way to add additional functionality to their site from third parties?"

Or maybe not. I don't know, text doesn't convey emotion and body language.


Right now they can be an annoyance, but this is something that could easily become a great feature of github, the same way that @tweets and #hashtags innovations came from the twitter community.

I would love for github to make bots something that you can subscribe to on a "bot subscription page". I think they can be incredibly useful so long as they aren't promiscuous, unwelcome and frequent enough to be seen as spam. You should be able to handle these the same way you handle permissions for third-party apps on Facebook or Twitter. The subscription page could also provide bot ratings and suggest bots that are likely to be useful for your project.

This approach would also create a way where these apps could be useful for private repos as well.


Sounds like a debate between opt-in and opt-out. Why not both? Do an AB test of a Bot vs. a Service. In some cases, opt-in is good (see: organ donors), in other cases it's bad (see: Internet Explorer).

What if there was a community-vote that turned a bot and a particular version of said bot from Opt-Out (app style) to Opt-In (bot style)?

I, for one, welcome our bot-coding overlords that clean up my code and optimize it on each commit. Might save me a lot of time and a lot of power and thought... if it's peer reviewed, like all open source software.


> opt-in is good (see: organ donors)

I prefer when people have to specifically say 'yes, I want my dead body to go to waste instead of saving lives'.


Hey, when you put in price-controls don't complain about a lack of supply.


I personally would use gists a lot more if they were indexed by google. As it is I feel like I'm putting code down a black hole when I create a gist.


Good idea; but maybe only the ones with a title and description; to index the ones that have a clear purpose and not some random code without any idea how to use it or what is for.


Question to the Github team:

Nuuton is currently crawling the web. The plans include crawling Github (actually, Github has a specific and exclisive crawler built for it). Is that permitted? If so, what are the rules? If not, to whom may I speak regarding it? I know DuckDuckGo does it, but I don't know if they are crawlin gyour site or just using what the Bing index currently has.


Not connected to github, but look at https://github.com/robots.txt, specifically the first 2 lines.


So yes, but only if you change your bot name.


I do think bots can be a great part of software development. I love the likes of travisci and codeclimate integrating with GitHub - GitHub just need to build a better app to deal with them. I assume private repos don't have bots bothering them, but maybe they want to allow some? Checkboxes for types of bot services you would like to allow per project?


We have GitHub Services: https://github.com/github/github-services. Anyone can submit one. We'll probably accept it as long as the code is decent, is tested and documented, and is for a stable service. If you're running some custom build on a personal hosting account, use the web hooks. You can attach web hooks or services to any of these events: http://developer.github.com/v3/activity/events/types/


I've been annoyed by GitHub bots and enjoyed their contributions. IMO, GitHub could/should have taken this opportunity to solve a problem and (once again!) change how people code for the better through collaboration.

Perhaps now that they've taken money, they aren't as interested in tackling new problems. Perhaps that's reasonable, since they'll need a lot of that money to hire and keep operations folks who can keep the site up.


I heard that Google is a "bot".

Do they say "No Thanks" to themselves?

Maybe the title should read: Google Says "No Thanks" to Other Bots


The title has nothing to do with Google.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: