While I'm a little annoyed by this (it seemed unlikely to me that GitHub would shut it down, so I actually used it), I can live with it. The "was only lightly documented" comment is pretty unnecessary and just adds insult to injury though. What's that supposed to mean? It feels like such an "it's kinda your fault for putting too much faith in us" insinuation. GitHub literally blogged about the feature [1], did they expect people not to use it because it wasn't "heavily" documented? Should their users ignore their blogs from now on, because apparently it's an irresponsible thing to take their blogs at face value?
Oh, and I don't think they've sent anyone any notifications about this, like the repo owners. Not sure how people are supposed to find out; are all users expected to stay up to date with GitHub's blog... which GitHub is telling them not to trust?
The way I understood "was only lightly documented" was they were blaming themselves like "we didn't put much work into it and didn't document it very well".
I’m in general not a fan of sunset ting services which are part of greater infrastructure. The demise of jcenter for instance is still hunting me because of cache misses etc. And I had the same thoughts about the documentation comment.
For me this means that GitHub has put itself in the long lines of service companies who have no issues with pulling the plugs (because this is mostly the cheapest solution) of services they don’t want to maintain for whatever reason.
But you don't need to double the problem by making a link rely on two domains and shortening its life.
What's more, if you have the complete original URL, you might be able to use an archive service that might have taken a snapshot, or at least try to guess what was behind the link by looking at it. Shortened URLs are usually inscrutable.
> What's more, if you have the complete original URL, you might be able to use an archive service that might have taken a snapshot, or at least try to guess what was behind the link by looking at it
This is the most important part. Now some smart crawlers figured it out and show you the archived version of the redirect if you got lucky, but so much has been lost because more or less everyone found it aesthetically unpleasing to look at long links. I suppose we weren't aware of the trade-off.
Both the Wayback Machine and archive.today handle redirects by recording the redirect under the redirecting URL and recording the actual snapshot under the final URL (neither perfectly, though—see below).[0][1] And a short URL that occurs many times in the web will be picked up by crawls in the same way as other URLs. Thus, the effect of short URLs on archiving isn't entirely disastrous. Still, the cost is paid in the long tail of short URLs that will never be archived.
Both archive services have shortcomings of their own related to URL shortening:
- archive.is ironically encourages the use of shortened URLs to its own archived snapshots![0] Its long URLs, which I've never seen used in the wild, are for some reason hidden away in the 'share' menu.
- archive.is stores the original submitted URL in the final snapshot and shows it in the header if is not the same as the ultimately archived URL,[0] but WBM does not.[1]
- WBM mysteriously doesn't work on bit.ly (returning only the message 'Job failed.'[2]), but does work on tinyurl.com. Unfortunately, opaque errors like this are not uncommon with the WBM, errors sometimes as serious as snapshots becoming completely blank years after being taken! Somewhat concerning considering its role as an archive, I'd say. If only the Internet Archive had open-sourced their server like Wikimedia did from the start…
Aside: It's a bit interesting, considering both current events and archive.is's notoriously unclear ownership (the only common info being the domain's registration to an individual in Prague) that two of the five social media sites in the 'share' menu are VK and LiveJournal. Are those commonly used outside of Russia in recent years?
(For the unaware, the Internet Archive's Wayback Machine and archive.today (they've got many TLDs) are well established as the two major general-purpose web archives.)
Only partially: to other archiving services, a shortened URL from this shortener which also happens to be an archiving service is the same thing as if the shortener didn't provide archival.
The only solution I see is that archive services resolve these shortened URLs when they archive a page and archive the result with it. The second best and probably more reasonable solution is that they maintain a directory (mapping) of shortened URLs to real URLs assuming the mapping does not change.
The problem isn't that web sites or domains expire. It's that adding another redirection like URL shorteners makes another point of failure. And URL shorteners create points of failure across many points. It also makes it more difficult for archival services to keep accurate links to those expired sites as it then needs to keep track of every shortener on every website.
It also hides what was being "shortened". If I find expired url science.com/astrologist-dicovers-alien-life I know who owned it and can guess what the content was. Doesn't happen with random characters or 3rd party short-names.
If url shortener is part of the site, that’s not an issue. Another point is: you can (in theory) change location of url shortener destination, so shortener link can actually outlive original link.
That is the theory. But who actually does that? Who actively monitors if link, which is directed to is dead and then tries to find correct location where the content have moved to?
The only possible solution right now would be to point to the dated content at the Web Archive.
And now you've got two problems.. If you see an upvoted short link, how do you know the party controlling the link hasn't changed the destination from a site everyone trusts? The person controlling the short link may not even be the person sharing it on a platform that trusts them.
With normal sites and content manipulation there's at least a reputational aspect that tends to stick.
most registrars will only let you buy a domain for a max of an expiration date 10 years from today, so after 10 years initial registration you'll be able to renew for 9 years at most.
Digital Object Identifiers (DOI) are nowadays the academia variant of URLs. DOIs are somewhat similar to URNs, they don't resolve directly somewhere in the web but are more a kind of identifier. There are various tools for resolving, see for instance https://www.doi.org/tools.html
Platforms such as https://zenodo.org/ allow for uploading digital artifacts such as publication attachments, supplementary codes or measurement/survey data and assigning them permanently with a DOI.
Zenodo may go down, but maybe some hard disk survives and resolving that DOIs will be possible at some future web system. DOIS are built to last!
I'm confused, what's the relationship between DOIs and git.io? Are you suggesting people should be using DOIs to link to source code from within papers?
The git.io link in at least the first paper that comes up on your search (https://git.io/sosr15-int-demo) appears to have been archived by archive.org, which shows the redirection and then follows it to the archived version of the final github page.
I don't know how common it is for these links to have been caught by the archive, but potentially all is not lost.
I have no idea how to submit shortened URLs to URLTeam, but if anyone can find out I'll be happy to scrape URLs out of that list of Google Scholar PDFs.
I mean that it is hard to scrape these git.io links used in the research papers to build the archive. Unless of course, if Github provides a DB dump, it would help everyone a lot.
It would be good to write a script to fetch all of those and log their destinations to a CSV file and put that online somewhere. You could even make PRs to all of those projects replacing links with un-shortened versions.
I hope github will replace the redirects with a holding page linking where they used to redirect to, rather than just delete them. That way people can report the broken links to the original site and still get to the destination they were intending.
> Out of an abundance of caution due to the security of the links redirected with the current git.io infrastructure, we have decided to accelerate the timeline. We will be removing all existing link redirection from git.io on April 29, 2022.
> Today, git.io is increasingly being used for malicious purposes.
So my guess is they observed an increase use of some number of existing git.io URLs (perhaps where the destination URL was squatted on) being actively used for "malicious" things.
Perhaps they are selling the domain, or planning to use it for a new service that allows users to create arbitrary URLs. Both of those scenarios can help an attacker take over old git.io URLs
More likely IMO the backend is some old hacky thing that nobody has been maintaining and someone found a vuln or some software stopped getting security updates.
It's basically unimaginable that they would sell the domain. Whatever amount of money they'd get for it wouldn't be worth it given the potential risks to their customers (since it was formerly an URL shortener), and owning git.io is pretty on-brand for GitHub.
Annoyingly, the CodeQL analysis github actions make use of git.io as a short link to relevant documentation inside action comments.
Even right now, the action is still using git.io so regardless of this advisory, that seems like something that should be have been fixed some time ago when git.io was initially put into read only mode
Presumably they were created some time ago and/or Github created them internally. They're hardcoded links that appear as comments in the actions yml file template and have probably existed for quite some time.
git.io has its place in the world, namely for creating code comments to Github immutable permalinks (that is, with SHA markers) that would otherwise surpass a conventional column limit, making linters complain.
I probably have created dozens of such comments over the years. Now people following those will find nothing.
Folks at github should really reconsider their position. Don't expect loyal consumers to keep their trust when you suck at your very job - namely keeping an immutable historic record.
The hard column limit and linter complaints are in the wrong here. Long, unbreakable strings exist and are reasonable (with URLs generally being the most common, but not the only), and any linter that insists they not exist is bad.
I would also note, for this particular style of URL, that the long URLs are much more useful, as you can read the ref and the path directly, and perhaps thereby bypass the web entirely. The short URL has to be followed to discern its target.
That’s a bit over the top as you did have a hand in choosing to do that and not to fix linters / ignore those self-inflicted problems, instead opting to use a URL shortener and eventually get burned like everyone else who has ever done this. Their job is not to create an immutable historic record. You were never a customer of git.io. The goal was to shorten links because it was trendy.
After a couple of issues with broken links in code comments, I think the best approach is to link to an archive for the page in question (if public) since that has the original URL and makes the original content at the time of the link available as well.
You can't escape the legal reality that you're demanding perpetual operation for no consideration merely by clarifying that you mean "contracts (in the API sense)". That was not part of your contract in either sense. If you wanted it to be forever, you should have paid for it.
Don’t most URL shorteners die in under 10 years? I see the value of not breaking API contracts, but some services are more known for breaking their contracts than others.
No. Use your own URLs or face the very obvious predictable consequences. There is absolutely no reason to add another layer of redirection just to get around your 2-bit linter config.
That sucks and I know it doesn’t help fix your current situation, but I’ve found it useful to include more context and long comments in commit messages than code comments. If your team knows how to use git blame for archeology, the information tends to stay more cogent to the code, while code comments often go stale.
Interesting, I have always viewed those as very different things.
Comments explain non-obvious things. For example, "this is a custom search because it was found to be 2.3x faster than .find() for this use case" or "this is sent as a string for backwards compatibility with ExternalRandomApi v1.2" or just walk through a complex algorithm.
I typically only go into archeology mode for two reasons:
1) I found the source of a bug and want to see when it was introduced and why. This is most helpful to prevent reintroducing an old bug, especially in old code lacking unit tests.
2) Someone wrote code that should have been commented, but wasn't.
I generally find code comments much more useful than commit messages. They are of course not mutually exclusive and you can (and should) do both.
9 times out of 10, "why are things the way they are now?" is what I'd like to know.
Occasionally, of course, I would like to know "why things were the way they were x commits ago" and that's when commit messages come in handy. But that's the minority use case for me.
It depends on what kind of commit messages your team writes, and how much "churn" a line of code gets with style changes, whitespace, etc. It requires making smaller, focused commits so the comment is directly relevant to the change, and writing good commit messages that tells you why something is the way it is _now_.
I always use `git add -p` to only stage relevant portions of my current work, and then usually write a detailed message like this:
Decrease reprojection error threshold to 1.0
With improved calibration, RANSAC has been able to find
a similar number of inliers even with a tighter
threshold, resulting in better triangulations.
<maybe attach some output showing improved reprojection error stats>
Now when someone comes upon
threshold=1.0
and want to know why this number is what it is, they can get blame it. The latest commit should be enough, but it might also be useful to know why that threshold has changed over the time.
The problem with churn can be mitigated by telling git to ignore whitespace changes, but it's not perfect. Maybe more helpful is to have a style guide and make it part of your review process so you don't end up with multiple developers doing things like reformatting a multi-line function call to their liking over and over.
Interesting. I'm basically the opposite. I use `git blame` heavily and find that I can rapidly understand the intent behind some code by doing so, assuming the commit messages are well-crafted. Comments are great, where appropriate, but to achieve the same density of information you'd have to add a comment with every commit, which would probably result in a difficult-to-read codebase.
I use `git blame` heavily and find that I can rapidly
understand the intent behind some code by doing so,
assuming the commit messages are well-crafted
Me too. Once editors started displaying inline git blame information, it was pretty transformational.
But, this bogs down pretty quickly for me. Great for a single line of code, but if I'm trying to understand something that spans multiple methods or multiple files that's a lot of spelunking around the commit messages, which are a mix of whys and whats.
At their best, code comments are a really focused distillation of the current whys and should never ever contain whats.
This is largely a problem of GitHub and other code UIs making it awkward to see anything other than the most recent commit in blame view.
With `git blame` you can use `-w` to ignore whitespace changes and `-M` to detect moved/copied code; a lot of the time this will help to rapidly find the relevant commit rather than a lint change.
You can also use `git log` on a range of lines of a file: `git log -L 15,23:src/main.rs` and if you add `-p` you'll see diffs as well.
Comments can go out of date, commit messages (if used properly) provide a genuine timeline of the evolution of the code.
They don't; they want to take down the 99% of git.io links that are pointing to malware / phishing, and they don't want to spend all their time manually culling them.
GitHub has been quite thoughtful about its deprecation timelines before, e.g. with long notice periods and brownouts. They're handling this quite badly.
Personally I don't see why they would care that it is used to shorten links to malicious redirects or whatever. But given that they do, the easy fix is to drop the github.io domain but keep the github.com domain. And in the interest of letting people update their URLs, returning the redirect URL as a response body instead of a 302.
This made me nostalgic, I remember when vanity URL shorteners were the coolest thing you could have. A new one popped up every few days and every company land grabbed for something custom they could use for social media.
Then Twitter updated to make URLs use fewer characters by shortening them on the fly (and showing the original URL to viewers). The URL shortener craze died quickly after.
Sourcegraph CTO here. Does anyone know someone at GitHub or Microsoft we could get in touch with to take over maintenance of git.io? There are hundreds of thousands of references in open source, blog posts, and academic papers that will 404 if git.io is taken down. Sourcegraph would be happy to take over maintenance to preserve these links.
Playing the ball back: I was trying to export all git.io links through your sourcegraph code search, but the CSV export doesn't work for those purposes (it times out). Is there a chance you can publish this dataset now such that crawling git.io is easier?
As an owner of http://git.io/way and https://git.io/no it's a good thing I noticed this on hackernews front page. Otherwise I would've never known! Just changed all of my git.io links where I could find them. I wish they would send out emails for this, but as I remember git.io is an anonymous service.
Glad I made the decision to use my own 5-character domain for URL shortening years ago. Anyone can write a simple URL shortening service in less than half an hour, and you gain flexibility and peace of mind with that little bit of effort.
Still, losing all git.io links hurt. I’m certain I have a few old projects with issue explanations, download links, etc. based on git.io. This is handled so poorly it’s ridiculous.
This is why you should never use url shorteners for publicly addressable content. Using url shorteners contributes to ‘link rot’ and after the domain expires or the service reaches its end of life the link is broken and cannot be resolved anymore.
imo, it’s fine for linking to ephemeral content such as submission forms or anything that will not make sense after a certain time. Do not use it for articles and other content, because it will make the resource inaccessible after the service is no longer operational.
I will continue my attempt to rescue the word “deprecate” from destruction.
> Git.io deprecation
> As notified in January, we shared our plans to deprecate the service.
This is not deprecation. This is discontinuation.
Deprecation is when you say “we recommend not using this, but it’s still working for now”; at time of deprecation, a schedule of when it will cease to work may be provided, or it could be that it will continue to work indefinitely.
The announcement in January was the deprecation. What they’re doing now is the final discontinuation.
If you want to be pedantic you need to also be fully correct.
This announcement in itself is a deprecation, just like the last one, that additionally announces a future discontinuation.
There is no requirement a deprecated service still works.
If something is half broken you can deprecate it for those relying on the half-broken mess and eventually discontinue it (which is what Github is doing)
Take both notices together and you can see that they’re clearly using the wrong sense of the word deprecate. From the January notice:
> Existing URLs will continue to be accessible, but we encourage using one of the many URL shortening services that are available, instead of git.io, as we will be deprecating the tool in the future.
Semantically, that was clearly a deprecation of the service: it keeps working, but they discourage its use as it will be discontinued in the future. (It was also notice of shutdown of the write parts of the service.)
It is not reasonable to consider today’s notice to be a deprecation (interpreting that as a change in status).
> There is no requirement a deprecated service still works.
This is flatly wrong. The agreed meaning of the word “deprecate” requires that it still works. See https://en.wikipedia.org/wiki/Deprecation for a good overview of the term’s proper usage and accepted meaning.
It's not flatly wrong, you're just stuck on a tautological use of the word "work"... after all how do you deprecate something that doesn't exist?
I used the example below: you can stub out a method with a print statement that says "don't use this" and still deprecate it.
The function still "works" in the most literal sense... it just doesn't do any useful work. It doesn't work.
> Semantically, that was clearly a deprecation of the service: it keeps working, but they discourage its use as it will be discontinued in the future.
You can't call the original a deprecation and say this one is not. Git.io still works (in reduced but extremely meaningful capacity... git.io links all still work) and they're recommending alternatives.
Your other hang up is said they will deprecate something during a depreciation.
That's perfectly allowed if you're actually going to be this pedantic? I will run while I'm running doesn't make it a false statement.
Also to be clear I'd just rather people not be as pedantic as the grandfather comment. Sometimes it's good to hold up a mirror to that end.
> you can stub out a method with a print statement that says "don't use this" and still deprecate it.
That’s not deprecation. That’s breaking the method.
To define “works” somewhat more specifically: deprecation means that functionality is not materially altered from how it was before the deprecation. (You might devise some way of notifying users—e.g. DeprecationWarning in Python, #[deprecated] in Rust—but this does not materially alter functionality.)
The January notice was a deprecation (even if they mislabelled it). This notice is not a deprecation, because the service was already deprecated. This notice defines a sunset timeline, but does not alter the parameters of the deprecation.
> That’s not deprecation. That’s breaking the method.
It's both. You can even break a feature and deprecate it for that reason. Deprecation comes down to communication.
> deprecation means that functionality is not materially altered from how it was before the deprecation.
No it doesn't. Maybe you really want it to, but it doesn't.
> This notice is not a deprecation, because the service was already deprecated.
That's not how that works... you can deprecate something and deprecate it again. Each notice is the deprecation if you don't realize this I don't know what you're still going on about...
The feature can be a stub that prints "no pls" and you're still free to mark it as deprecated (that's more common than one might expect because ABIs are a thing)
Off topic, but I have really grown to despise the phrase "out of an abundance of caution". It's grown so popular during the pandemic when people exercise the absolute bare minimum amount of caution.
Oh, and I don't think they've sent anyone any notifications about this, like the repo owners. Not sure how people are supposed to find out; are all users expected to stay up to date with GitHub's blog... which GitHub is telling them not to trust?
[1] https://github.blog/2011-11-10-git-io-github-url-shortener/