Hacker News new | past | comments | ask | show | jobs | submit login
Tell HN: GitHub is blocking search unless you are logged in
131 points by nancyp 7 months ago | hide | past | favorite | 71 comments
Sign in to search code on GitHub. Before you can access our code search functionality please sign in or create a free account.



It's not only code search that is affected. Issue and pull request search are affected as well.

Here's a screen recording of that happening:

https://imgur.com/a/BT6uRIe

It's been quite a while since GitHub started gating code search behind a login. However, they recently started gating other types of search as well. The worst part is, you don't notice it immediately for two reasons. First, it doesn't happen on all repositories. I didn't experience this yet with NixOS/nixpkgs for example, possibly because the high volume of activity going on there prevented the switch. Second, the search results do show up right after you hit the search button. However, it shows up a login screen as soon as you start to navigate around the search results. To me, I can't help but feel like they're testing the waters by not making it immediately obvious that this is happening.

The inconvenience doesn't stop there. As you can notice from the screen recording, once you get shown the login page, GitHub will continue to show the login page even if you hit the back button. On top of that, the search query is sometimes not included in the URL, making search results difficult to share.

I get GitHub wanting to require logins for code search, since it takes up computing resources. However, there's something to be said about gating search for issues and pull requests of open source projects without the project maintainers being notified.


This has been the case since the new code search replaced the old one ~9 months ago. The new code search is more resource intensive so Github chose to only make it available to users. I agree that it sucks to not have code search available when logged out, but it's not a new change and I don't think it was done with malicious intent.


I don't actually care whether intent is malicious or not. GitHub has not cared about me or individual devs like me for some time now. Time to drop it.


How does that work? You don't want to sign in to the site so now you will replace it with another site. Presumably you don't want to sign in to that other site as well, so what are you using the service for in the first place?


> now you will replace it with another site.

Or you can self host! Github's changes pushed me to self host recently.

https://voussoir.net/writing/git_dot_voussoir_dot_net


Oh I'm happy to stay logged in to SourceHut!


Did they add code search now?


No, doesn't look like it. (I had to check.)


any requirement to log in is done with malicious intent.

what other purpose could it hold other than to harvest your data for their own undisclosed purposes?

Maybe anti-bots or something, but there are other ways to do that. Besides, a bot might just make an account.


> any requirement to log in is done with malicious intent

Wild!


I was very unhappy when they did this. The new search btw is shittier than the old one and this was a classic case of breaking the rule of "If ain't broke, don't fix it".


I miss the old search, it was able to sort results.


Despite their shitty rug-pull <https://github.com/sourcegraph/sourcegraph/pull/53345>, I do really like Sourcegraph and one doesn't (currently?!) need to be logged in to use it: https://sourcegraph.com/search and they have a handy rewrite pattern such that one can just plug the repo path into the URL for quick searching e.g. https://sourcegraph.com/github.com/JetBrains/intellij-commun...


Oh you better believe it's only currently. At least if they grow as successful as they would like to.


I’ve been running into this too and it’s very annoying. Although I don’t think it’s particularly new either.

Just feels like an unnecessary step. Sure I have an account and can log in, but why? I just want to know which file has X function, so i can read the implementation. I don’t want to have to download the repo or sign in.


Most likely anti-scraping measure. So they can detect and shut down bots or really anyone they feel like if activity looks nonstandard. Not suggesting it’s good, but it’s consistent with the in vogue trend to lock down recipient public APIs nowadays.


As an illustration, near the end of last year, bots from a renowned Email API provider spotted in less than 1 hour the leak of a public key from my public GitHub code repo. My account got suspended on their platform. It was stunning to see the speed at which they acted and automated the process to "lose" and "recover" reputation.



thank you for the enlightment. now it all makes sense in my mind.


Wouldn't a scraper just grab whole repos?

If necessary, let's focus on the use case of searching a single repo.


if you log in, you've lost.

they have to serve the source without being logged in, otherwise gpl projects would just move (and we know gpl projects are the opensource trend setters).

so they will always allow you yo download the source. and thats what i do all the time. git clone, grep, rm.

or if you are logged, do it anyway checking code out with ssh which is more expensive for them.

remember kids, after Microsoft bought it, a github account is a social network account.


I think this will be the new normal.

There are a lot more AI projects hungry for data to train their models on. This puts content companies in an uncomfortable situation: trademark infringement claims, loss of intellectual property, and more.


But Github is not a content company and they don’t really own copyright to almost anything hosted there.


That's true, but there's an interesting parallel with GitHub's corporate parent, Microsoft, and Microsoft's other platform company LinkedIn[1]. LinkedIn sued scrapers for retrieving data from the site.

LinkedIn isn't a content company either, nor do they really own any content posted there (they don't right?), but a large part of their business moat comes from the network of people posting content there. Scrapers and bots undermine this, something the AI boom facilitates.

1: https://en.wikipedia.org/wiki/HiQ_Labs_v._LinkedIn


There is a cost to serving up all that content, and if hundreds of AI start ups are all trying to pull in data, that can add up fast. It’s not typical user behavior.


If it’s just static content it wouldn’t really be that expensive. In reality egress traffic is extremely cheap compared to what Azure/AWS etc. are charging.


How often are you not signed in to GitHub? You’re presenting this as a practical pain, but I can’t remember the last time I wasn’t signed in to GitHub.


I don't have a work GitHub account and I keep personal stuff out of my work computer. The number of times I've wanted to search an open source codebase without cloning it yet is significantly nonzero.

I've also wanted to do code search on my phone, where I have no need to be logged in so I'm not.


So register a throwaway for work and your phone. Seems easily solvable in a few seconds!


Seems like a good time to mention that (despite anecdotal lack of enforcement) GitHub's terms of service continue to officially forbid having more than one free account for manual use. https://docs.github.com/en/site-policy/github-terms/github-t...


Can they let their business customers know that? Companies making their employees create new accounts for their org is not uncommon. I’m not talking about Enterprise instances.


My account through work uses SSO. I didn’t make my account, it authenticates through AD. If I leave the company, I lose access to the account.

Telling employees to make their own personal account to do company work seems like a bad idea.


That rule sucks. I want to keep work and personal separated.


Work account should be pro and paid by the company then.


If I'm considering reporting a bug I came across in a free-software project while at work, that's already something I probably couldn't justify with a strict cost-benefit calculation for my employer.

In practice my employer would be happy for me to spend my time doing it anyway, because it's the right thing to do. But asking them to pay Github for the privilege is pushing it.


I don't want to login to github.com when I'm on my work computer. That's going to take me one step closer to uploading company internal stuff by accident.


It's especially bad because I don't really remember my passwords, so I always have to reset password when logging in again, and that refreshes the dev keys so my terminal git push also stops works - a complete PITA.


This is nuts, dude! Get a password manager or something.


Which could happen to every other website has nothing to do with GitHub.


I have my cookies autodeleted on a regular basis because I prefer my browsing sessions and activity to not be linked for surveillance purposes.

The use of a password manager makes re-logging-in effortless.


I have a specific firefox container group for github and a few other applications so that it's not leaking cookies. No need to relogin.


GitHub still knows all of your individual visits to GitHub, and which repos you viewed. Most of the time I don’t want to be browsing GitHub itself whilst logged in. I don’t like or trust Microsoft with my location or waking hours or browsing history.


And it's not blocking search just for non-logged users, but also for users whose accounts got 'flagged'.

My account got flagged.


They required to be logged in for the code search for years now. The rest (repos, issues, etc.) is still searchable as is. At least where I am at.


For several months now I refuse to stay logged in to Github. I login on an incognito window when I need something, and log out when I'm done. Github javascript is disabled outside of incognito windows.

This is the same playbook I have followed with success to disconnect from Facebook, Linkedin, Google, Reddit and Twitter. (And Quora. And Medium..) When it's inconvenient it just makes me try to avoid these sites.


It's unfortunate but it's not as bad as Twitter/other forced logins for features. The difference is that making an account is free, you don't have to pay anything and there are no ads, whereas Twitter has incentive because it needs ad revenue and wants to look like it has more people on it


> not as bad as Twitter

Can't help but feel that's what we're moving towards.


Oh, we're at the extinguish phase.


Github was rate-limiting me from doing just basic searches. I hit them, paging through results too fast, and it times me out. My account is a few years old, but admittedly, it lacks engagement à la social media (favorites, repos, forks, etc).

Anyway, I don't buy Github's excuse that the new search 'takes more CPU power'; this must be to prevent scraping data for LLMs. Have you hit the new search rate limits?


On another computer that I was not logged in, something else happened. It didn't say log in, but the search results were displayed and then one second later it removed them and displayed an error message (which did not explain the problem, but it did not ask you to log in, either). I was able to view the search results by pausing execution of scripts using the debugger, though.


here's the real kicker to me: you cant sort the search results these days by date.

whoever fucked this up thinking he's doing good for humanity deserves to be hit with 65,535 lightning bolts in exponentially increasing amperages.


That’s why I stopped using it at work (on my company’s laptop I don’t want to log in to github using my personal credentials)


I’ve always found GitHub to be quite generous with what functionality they give free users, especially visitors that don’t have an account. They kind of paved the way in that regard…

Some sites would probably put everything behind a login-wall. I can imagine some alternative version of git hosting with the following message:

    Sign in to clone this repo


Cloning without logging can be seen as important:

- It makes it simpler for millions of CIs to check out the code unauthenticated,

- It enables usecases such as “Here is public data as a JSON, it’s our list of IP addresses, just integrate that to your firewall”,

- NPM. NPM entirely.

- Isn’t it required for open-source? If they restricted it behind a login, could we still say we deliver the code to our customers?


I just create a new account every time I want to search for something


Need to search on GitHub? Make an account! Don’t want to make an account? Leave!


This will greatly reduce Github ability to be an App Store. Reminder that GitHub isn't banned* or censored by Microsoft in China, so it's a good way to download VPNs and other banned apps.

*There is some frequent outages on an individual level, but if you try enough, it works.

EDIT : This doesn't affect search of repos. My bad.


This is old news


They also completely removed the activity from people you follow from your homepage, claiming everything should be under "explore", while promoting their recommendations provided by an algorithm. They claimed it "used too many resources".

Everyone complained about this decision, fast-forward 6 months, Microsoft doesn't give a fuck.

I used the homepage to learn about new projects people I follow are working on or interested it, but now it's nearly impossible to find out.

Enshittification is real.


It's time we all moved away from GH


self hosted gitea is really a pretty thorough drop in replacement IMO


We said that GitHub would become trashed once MSFT got their grubby little fingers into it.


GitHub considered harmful.


I believe that Cory Doctorow would suggest that the enshittification of GitHub is now in full swing.


what a shame


That's okay, GitHub's search is atrocious anyway.


At least for issues, their search got a lot worse recently - it doesn't do substring search in the same way (or maybe at all - I haven't checked in detail), so you can't search for non-initial words in compound-words. This makes searching through German issues an exercise in frustration... .


Agreed, it's really bad.

I think they really improved the code/repository search, but finding issues got really worse.

And IMO finding issues easily should be a top priority for a platform like GitHub.


The old search was much better than the new search. The new search can never find exact strings in my repos, even when I have copy-pasted those strings from my repo to the search bar!


It got a lot better. That’s kind of the point behind it being behind a login wall now.


What? I think it is one of the strengths of GitHub compared to other platforms.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: