4 years ago, Archive Team backed up "nearly all" of Vidme (according to Jason Scott). It was uploaded to the Internet Archive here https://archive.org/details/archiveteam_vidme so you can fix your vid.me URLs by pointing them to the Archive.
(And you can help keep them there, too https://archive.org/donate But honestly, using the Archive more keeps the bigger donors involved, so don't feel guilty or anything. Just use the Archive!)
The search results counter is 12,385...is that really all (or "nearly all") of what Vid.me had for content? I mean I know it failed at dethroning Youtube, but 12,000+ videos is barely anything.
I read a significant percentage of HN submissions through archive.org. Saves me DNS lookups. Recent pages look exactly the same to me as their "live" versions, the retrieval time is rarely bad.^1
Its probably the best "CDN" as it has the largest amount and variety of content
There should be more than one archive.org (and more than one ArchiveTeam). These projects work and they are standing the test of time. archive.org is older than Google. Much, much older in "internet time".
1. I use a text-only browser though so not sure how tolerable it would through Javascript-enabled graphical browser
Was hoping there was maybe opportunity for corresponding donations that maximizes deductions.
Looks like openBSD isn't deductible either for Canadian donoUrs. Theo's registered address has lots of cool wifi links though. I count 4! (one is on the gutter)
I think I'm not understanding something. I believe https://archive.org/details/googlevideo2011 is supposed to be the videos they downloaded but I'm not getting any results no matter what search term I put in. Is there a searchable archive of Google Video content?
Wikipedia has a bot that the Internet Archive collaborated on [1], where rotten links are updated to point to the Internet Archive and new links are queued for retrieval [2]. There should probably be a similar effort for CMS systems like Wordpress and such. The code to do this is semi trivial.
But the links here haven't rotted, in that they will not return a 404 or whatever. They will still load videos, it will just be replaced with porn. So you need more code to detect all vidme links as something to fall back to the Archive for.
In a sense, all of the links for the domain in question have rotted, and must be replaced. To your point, your code could replace wholesale based on the domain.
We can't and shouldn't expect people to keep their old domains forever. We need a way for pages to be signed and hyperlinks to enforce authorship. When we link to stuff, we should have a way to say whose stuff we're linking to. It's no different from installing signed software and using trusted repositories.
This is one of the reasons I created a proof-of-concept web extension that verifies links and pages using PGP. On a mismatch, it flags the page and offers a web archive link instead.
It was pretty fun to make, but currently due to performance, Web Extensions API doesn't provide the features to do this perfectly. Firefox provides just about enough additional APIs to hack it together.
One idea that's been around for awhile is to identify files by their hash. That has pros and cons. The good side of that is that the file is immutable; you can't accidentally link to something else unless someone can manufacture a hash collision somehow. The down side is that if the file is corrected in some way, you don't get the fixes.
In a lot of the peer-to-peer distributed hash table designs, all you need to retrieve a file is its hash.
Problem is, you have to download the entire video to check the hash. That's not how video embedding works; the client browser is just handed some link, and it obtains pieces of the video, rendering it instantly.
Basically, little segments of the video have to have a signature which is continuously validated. Or something like that.
That's not really a problem. You don't hash the entire video, but do something resembling a merkle tree. i.e. look at a torrent, they're identified by a hash but you can download and verify a random chunk
Right, merkle tree! OK, so the embedding site only stores a single hash: the root one. This hashes the the remaining hashes. The first thing we fetch from the video is those hashes and if their hash doesn't match, we flag/ignore the video and refuse to play.
Multiple levels of the tree can be stored throughout the video file. The first level after the root can be for major sections, like 5 minute segments. The next levels are then at the start of each 5 minute segment, giving hashes for one second chunks.
If the root hash checks out, we get the 5-min hashes. If they check out, we get the hashes for the first 5-min block, and if those hashes check out, we start to play the video, validating every second of it against a one second hash from the 5 min block. Then we get the next 5-min hash block and so on.
In actual HLS streaming videos are served in 2-10 second segments, a size small enough that you realistically can hash each segment and verify each segment. Youd have to implement it as an extension of the HLS protocol, so probably as m3u8 with additional fields, maybe as comments.
I don't have expertise in video codecs or file formats, but couldn't you hash the first N bytes of a stream? Stream those N bytes to the client and if it matches start the video, else stop the download and not start the video.
The most egregious is if I'm an attacker and I have the file you request I can hash the appropriate portion you'd use to verify it but fill the rest with junk or exploits. You'd receive the file, it would emit the correct hash, yet be not what you were expecting.
For video especially what you receive isn't necessarily predictable by the client. With HLS or MPEG DASH streaming the video you receive could be one of a number of different encoding e.g. lower or higher bitrates to deal with changing network conditions. The actual m3u8/mpd file you might receive could change arbitrarily as the video provider adds or drops different encodings. The hash of such a file today isn't guaranteed to match the hash tomorrow for entirely banal non-malicious reasons.
Fun fact: the UUHash algorithm used by the FastTrack network (Kazaa, Morpheus, etc) only hashed the first bit of a file. Hashing a large file took forever on hardware of the day. Even hashing small files was non-trivial. The RIAA through various fronts would insert spoofed files where the first portion of the file was legitimate but the content of the file would be junk or annoying sounds. The files would be named like any other MP3 someone was searching for and even have seemingly good IDv3 tags.
That's already how it works in IPFS or Freenet. The hash is not for the data itself, but for a metadata document that contains a list of hashes to the data broken up into smaller blocks.
I had a personal project that I got bored with so I let the domain expire. Then I got emails from former users saying that the domain was now hosting malware. So yeah I would like it for all the old links to the site to somehow know that the owner has changed. Not sure what a reasonable solution would look like though.
One solution is to have a unique URI per file that is independent of DNS. Decentralized file storage (e.g. FileCoin / IPFS) might serve this purpose...
Definitely but I think we need something that will work with the web we currently have while these bigger ideas are fleshed out and adopted.
Also, while IPNS covers the issue of linking to dynamic content, it's worth mentioning IPFS will have similar issues with DNS as DNSLink and similar domain-driven solutions are used to cover its usability issues (long, random URIs).
IPFS works pretty well in the style of progressive-enhancement with the existing web for static content specifically. If you want to link to a resource that's available through IPFS, then you make the link point to an IPFS gateway that you trust and expect to stay online (possibly your own on your own domain), like https://example.com/ipfs/Qm_IPFS_HASH_HERE/. Anyone that has a browser with direct IPFS support (either because they're using an IPFS extension or they're using a browser with built-in support, which might get more popular if IPFS takes off) will have their browser recognize the URL format and just fetch the file by hash directly from IPFS, and it won't matter if example.com is still up and serving the file. For everyone else, the link will work as long as example.com is still up and acting as an IPFS gateway. If example.com ever goes down, then users can make the link work by installing an IPFS extension or manually replacing the example.com domain in the link with the domain of any still-active IPFS gateway, and any admin in control of the page could fix the link similarly.
SRI/hashing works for static content. Though it's worth mentioning it's a SUB-resource feature (images, scripts, etc.). It doesn't work for hyperlinks to other pages. Even if it did, it's a different use case.
Say I link to an article by Author A that has comments in it (or even a footer, relative timestamp, sidebar, etc.). Hashing won't work as the page is always changing. I want the link to always go to Author A but I don't care if the content changes. That's the sort of use case signed webpages and hyperlinks with enforced authorship covers. It's less about what's on the page and more about who created it.
Good points. I would guess that for something to be implemented, it would have to be easy for browsers and API tools to check once per domain and cache the response and should probably be something that already exists and has been adopted. Maybe a page could have a meta tag or header that contains a hash of the destination sites DANE signature? Something like "targetref:somedomain.tld expectsig:39726a2fe2bb052cf00e6b95a8385f7" based on tools like danecheck [1] or maybe DNSSEC but that is very poorly adopted.
The solution I was going for with WebVerify is more web-centric rather than domain-driven, which I think is a better fit for webpages. It can be enforced at the hyperlink-level for shared domains (like GitHub Pages, University web spaces) and works for static resources without needing to configure external resources. The only really complicated part is PGP but that can be solved with better tooling (as seen with Keybase).
Finding a way to embed the domain registration date might be sufficient, that would cover most of the expiry situation
There was a ietf or similar registry that used your domain and registration date to carve out your namespace, ie dns.2021.07.26.com.example would be your prefix. Pretty robust. Can’t remember what it was anymore
Could URL authorship confirmation be implemented on top of TLS? If someone takes over a domain, the final certificate in the chain will be issued to another entity, and that could be enough to trigger a notice. Could be achievable with a centralized registry/crawler like Internet Archive, but one that only keeps track of domain:certificate mapping.
Of course, the devil’s in the details (infrastructure/organizational changes can trigger false positives; shared hosting setup can cause false negatives; it presumes that if the original entity abandons a domain they’d revoke the cert; etc.), but IMO it wouldn’t be worthless as it is.
Most certificates (including all free ones) are domain validated only so the entity being certified is just the domain and will not change with a new owner.
If you’re talking about shared hosting, yes, it wouldn’t be covered by this model.
But other than that, if I have a cert fo xyz.com, and I abandon the domain, even if you buy it you’ll be forced to issue another cert for it.
If it was recorded somewhere that xyz.com = my cert, it could serve as a mechanism to verify that given URL is at least is supposed to be under my control and a warning could be shown if another certificate is being served now.
Kind of like HPKP, but with longer lifetime (longer than domain name registration term) and a centralized registry tracking certificate:domain mappings rather than each individual user agent cache.
Obviously, no one would adopt it due to being a devops nightmare.
Right, you can pin the certificate or the public key in it - but that's much more specific than the entity that the cert is issued to and as you correctly noted is not practical for browsers to do automatically since keys and certs do get rotated without a change in owner.
One of the services I've sold more than once was to handle the "offlining" of a domain. Basically provide a 307/404/410 service and make sure it works for a long time before the name gets released. Basically to help clean up on the way out.
why 307 (temporary) and not 308 (permanent)? (curious about this as when I used to work on a ton of company websites and migrate from old sites, there'd always be a big process of setting up a ton of 301 (permanent) redirects)
As both the post topic and the parent comment topic are about domain changes, wouldn't that basically direct users of the old site on the domain to a totally different domain when they want to access the new site on the same domain?
Very amusing but I take issue with the article's claim that "Here’s (yet another) argument against using third-party embeds" - this might be an oversimpified perspective when it's actually a good argument to use subresource integrity (essentially cryptographic pinning of third party embedded content). I am unsure if this extends to every kind of resource that you could have in a web page, (ideally it should) but I could imagine a JS shim being used to cover some edge cases.
Subresource integrity would prevent the porn from showing up, but the videos would still be broken. Hosting the videos yourself is the option that would keep things running.
True, and HTML5 video tends to "just work," however I've found that many people have practical issues with hosting their own video. For example, not understanding video conversion, hosting behind bandwidth or request rate limitations, or not having anycast/CDN set up to hold the site together when a page unexpectedly goes viral. There are reasons that hosting videos yourself might not be an effective use of the resources or time available to you, especially based on technical skills, in which case third party hosting with pinning remains a better option.
Video embeds tend to be especially fragile however since they're often implemented by gluing together a relatively large number of third party services. Any one of the services goes down or changes the API and the whole thing breaks.
Ask any uMatrix user how much fun it is to get a video to play on some websites and just how much external crap you have to allow before you see the first frame.
That and Facebook container. I could not get slack to work until I allowed slack access to Facebook because I had to re-authenticate. Even though I don’t use Facebook I have to allow slack to access it at least to be able to authenticate to a different authentication source.
I'd get it if it was Goatse, but this is just regular porn. If they simply wanted to earn money from ads, serving something more milquetoast would make more sense, because now there's a rush to remove old embeds.
> Here’s (yet another) argument against using third-party embeds on your respectable website
Well, it's an argument against using embeds without having any way to validate their authenticity.
This is analogous to having a software distro (e.g. package manager) which downloads upstream tarballs or git repos without checking any hashes.
Is there a solution for this? Say you want to embed a video from some third party sites; what tools are there for ensuring that the embedding will somehow lapse if the video at that URL has been replaced or altered?
Ideally, you'd be notified if that happened. While not showing porn is good, but videos not working is bad. You can't be checking an entire site all the time for non-working external content.
You could pre-pay for X years, where X is long enough to ensure you'll be dead when it expires.
Edit: Seems like there's a 10 year limit in many places? I wonder if that's broad convention or an actual rule.
Okay, apparently an ICANN limit for .com domains:
"The expiration date of the domain registration is extended by the number of years, up to a maximum of ten years, as specified by the Registrar's requested Extend operation."
10 years is the ICANN limit but Network Solutions offers 100 years. According to the fine print what they do is actually register it for 10 years and then every year for the next 90 years they register it for an additional year.
They are betting that they will earn more than enough from investing your up front 100 year payment to more than cover future increases in the cost of those one year extensions.
Their customers are betting that Network Solutions or some successor will be around long enough and that domain names will work like the do now long enough that this will be worth it.
Most registries limit domain lifetimes to 10 years. You can't renew a domain if doing so would put the lifetime beyond 10 years -- the registry will refuse to perform the operation.
The weird thing about this to me is, the company or person who scooped up the domain... in order to get their plan working so quickly, wouldn't they have had to set up a site perfectly beforehand so the embeds would work as desired and then just sat there waiting, hoping, ready to hit the button to scoop up the domain at just the right time, praying nobody beat them to it?
The embeds don't actually "work", per se. When the browser tries to load the embed into an <iframe>, it gets redirected to the home page of the porn site, and ends up displaying the upper left corner of that page in the space where the video embed was supposed to go. It all looks rather more accidental than purposeful.
What's the win for the new owner, do they make money from linking to the porn somehow? Sure it could also be just for lulz but there are many equally or more lulzy possibilities whereas porn often seems to be coupled with economic gain.
> What's the win for the new owner, do they make money from linking to the porn somehow?
They are the porn (the new owner is a porn firm), so, yes. I haven't seen the actual linked content, but I assume its something like free samples with directions telling people where to get more; its a move that gets porn ads placed for free in a lot of places that would never choose to allow porn ads.
Why isn't this considered criminal vandalism and hacking?
Intent matters.
Owning the domain doesn't give them a right to intentionally interfere with the requested content; they should simply decline to serve the expired URLs.
Based on another comment it's what they do. The dead URLs are redirected to their main page which gets embedded in place of the original videos. I don't think they should inherit the maintenance cost of the incoming links, as long as they are not maliciously swapping the content.
Domains are an area where blockchain technology could help a lot.
A domain could cost a fixed amount of X per month. So you could pay 100 years upfront and be sure to not lose it in that timeframe.
To move a domain,the registrar and the owner could have to sign the move. So it would not be possible anymore to lose a domain due to the regsitrar making a mistake.
“ethereum name service” is doing this and honestly aside from gas fees it’s not a bad price for permanent real estate (as permanent as the ethereum virtual machine at least)
Well, if it's solved via blockchain, there's an opportunity for speculation among early adopters, which I think is the major selling point for most blockchain "solutions"
Don't normal domains work the same - you pay a fee and you get the domain for X years? In both situations the domains expire and a porn site can nab them.
With normal domains, the registrar can move your domain without your approval.
So they could do so out of malice or because they get tricked into believing you gave them the go or because they got hacked or got ordered to do so by some governmental institution or or or ...
(And you can help keep them there, too https://archive.org/donate But honestly, using the Archive more keeps the bigger donors involved, so don't feel guilty or anything. Just use the Archive!)