Hacker News new | past | comments | ask | show | jobs | submit login
Old Vidme embeds turn into porn after domain purchase (theverge.com)
203 points by ibaikov on July 23, 2021 | hide | past | favorite | 121 comments



4 years ago, Archive Team backed up "nearly all" of Vidme (according to Jason Scott). It was uploaded to the Internet Archive here https://archive.org/details/archiveteam_vidme so you can fix your vid.me URLs by pointing them to the Archive.

(And you can help keep them there, too https://archive.org/donate But honestly, using the Archive more keeps the bigger donors involved, so don't feel guilty or anything. Just use the Archive!)


The search results counter is 12,385...is that really all (or "nearly all") of what Vid.me had for content? I mean I know it failed at dethroning Youtube, but 12,000+ videos is barely anything.


It looks like each of those 12,385 items is a big 10.4GB archive of thousands of videos.


That is the number of megawarc items. Each one of these items contains many videos.


That's roughly one hour of new uploads on YouTube as of 2016.


Like everything, a great majority are junk and/or niche.


That really is an amazing organization.


There should be more than one archive.org

Why not. Is it really impossible

I read a significant percentage of HN submissions through archive.org. Saves me DNS lookups. Recent pages look exactly the same to me as their "live" versions, the retrieval time is rarely bad.^1

Its probably the best "CDN" as it has the largest amount and variety of content

There should be more than one archive.org (and more than one ArchiveTeam). These projects work and they are standing the test of time. archive.org is older than Google. Much, much older in "internet time".

1. I use a text-only browser though so not sure how tolerable it would through Javascript-enabled graphical browser


weird flex, but I needed to get over a threshold to 'earn' a credit card promo for spending enough, and a donation here will do it!

(It's too bad the tax deduction isn't eligible in Canada :( )


I have the opposite problem... I give money to OpenBSD but can't claim it on my taxes since they're Canadian :-(.


Are they a registered non-profit now? I donated when I used OpenBSD but don't think I got a tax receipt, just my name on one of the releases.


Was hoping there was maybe opportunity for corresponding donations that maximizes deductions.

Looks like openBSD isn't deductible either for Canadian donoUrs. Theo's registered address has lots of cool wifi links though. I count 4! (one is on the gutter)

https://www.google.com/maps/place/812+23+Ave+SE,+Calgary,+AB...


Oh that's clever, I totally forgot about this. Archive is amazing.


Not all public goods are flashy, but most are necessary.


I wish they'd been doing that when Google Video shut down. Lost a lot of good content I didn't download.



I think I'm not understanding something. I believe https://archive.org/details/googlevideo2011 is supposed to be the videos they downloaded but I'm not getting any results no matter what search term I put in. Is there a searchable archive of Google Video content?


I helped download a few gigs of Google Video at the time, so I know some of it is up there.


Late edit: never mind, that was Yahoo video.


So uhhhh, anyone want to write the recursive replacement regex that'll do that across an entire filesystem?


find | xargs awk?


Wikipedia has a bot that the Internet Archive collaborated on [1], where rotten links are updated to point to the Internet Archive and new links are queued for retrieval [2]. There should probably be a similar effort for CMS systems like Wordpress and such. The code to do this is semi trivial.

[1] https://meta.wikimedia.org/wiki/InternetArchiveBot

[2] https://en.wikipedia.org/wiki/Wikipedia:Link_rot


But the links here haven't rotted, in that they will not return a 404 or whatever. They will still load videos, it will just be replaced with porn. So you need more code to detect all vidme links as something to fall back to the Archive for.


In a sense, all of the links for the domain in question have rotted, and must be replaced. To your point, your code could replace wholesale based on the domain.


We can't and shouldn't expect people to keep their old domains forever. We need a way for pages to be signed and hyperlinks to enforce authorship. When we link to stuff, we should have a way to say whose stuff we're linking to. It's no different from installing signed software and using trusted repositories.

This is one of the reasons I created a proof-of-concept web extension that verifies links and pages using PGP. On a mismatch, it flags the page and offers a web archive link instead.

https://webverify.jahed.dev/

It was pretty fun to make, but currently due to performance, Web Extensions API doesn't provide the features to do this perfectly. Firefox provides just about enough additional APIs to hack it together.


One idea that's been around for awhile is to identify files by their hash. That has pros and cons. The good side of that is that the file is immutable; you can't accidentally link to something else unless someone can manufacture a hash collision somehow. The down side is that if the file is corrected in some way, you don't get the fixes.

In a lot of the peer-to-peer distributed hash table designs, all you need to retrieve a file is its hash.

https://en.wikipedia.org/wiki/Content_addressable_network


Problem is, you have to download the entire video to check the hash. That's not how video embedding works; the client browser is just handed some link, and it obtains pieces of the video, rendering it instantly.

Basically, little segments of the video have to have a signature which is continuously validated. Or something like that.


That's not really a problem. You don't hash the entire video, but do something resembling a merkle tree. i.e. look at a torrent, they're identified by a hash but you can download and verify a random chunk


Right, merkle tree! OK, so the embedding site only stores a single hash: the root one. This hashes the the remaining hashes. The first thing we fetch from the video is those hashes and if their hash doesn't match, we flag/ignore the video and refuse to play.

Multiple levels of the tree can be stored throughout the video file. The first level after the root can be for major sections, like 5 minute segments. The next levels are then at the start of each 5 minute segment, giving hashes for one second chunks.

If the root hash checks out, we get the 5-min hashes. If they check out, we get the hashes for the first 5-min block, and if those hashes check out, we start to play the video, validating every second of it against a one second hash from the 5 min block. Then we get the next 5-min hash block and so on.

Kind of thing.


In actual HLS streaming videos are served in 2-10 second segments, a size small enough that you realistically can hash each segment and verify each segment. Youd have to implement it as an extension of the HLS protocol, so probably as m3u8 with additional fields, maybe as comments.


I don't have expertise in video codecs or file formats, but couldn't you hash the first N bytes of a stream? Stream those N bytes to the client and if it matches start the video, else stop the download and not start the video.


This has a number of problems.

The most egregious is if I'm an attacker and I have the file you request I can hash the appropriate portion you'd use to verify it but fill the rest with junk or exploits. You'd receive the file, it would emit the correct hash, yet be not what you were expecting.

For video especially what you receive isn't necessarily predictable by the client. With HLS or MPEG DASH streaming the video you receive could be one of a number of different encoding e.g. lower or higher bitrates to deal with changing network conditions. The actual m3u8/mpd file you might receive could change arbitrarily as the video provider adds or drops different encodings. The hash of such a file today isn't guaranteed to match the hash tomorrow for entirely banal non-malicious reasons.

Fun fact: the UUHash algorithm used by the FastTrack network (Kazaa, Morpheus, etc) only hashed the first bit of a file. Hashing a large file took forever on hardware of the day. Even hashing small files was non-trivial. The RIAA through various fronts would insert spoofed files where the first portion of the file was legitimate but the content of the file would be junk or annoying sounds. The files would be named like any other MP3 someone was searching for and even have seemingly good IDv3 tags.


> Fun fact: the UUHash algorithm used by the FastTrack network (Kazaa, Morpheus, etc) only hashed the first bit of a file.

The first 300KiB plus a series of 300KiB chunks at exponentially-increasing offsets, per Wikipedia. But still a small fraction of thw file.


Well explained. Thank you!


If you only hash/check the first N bytes of the video stream, the remainder of the video could be anything.


Merkle trees basically accomplish this with separate hashes over every N bytes, so that the content can be verified continuously as it's downloaded.


They will keep the first few seconds or minutes of the original video, bit-exact, and then switch to porn. The player needs to validate every section.


Presumably for SHA256 you only need to hash ~256 bits; what's anyone gonna do, try all possible combinations to find a collision?


256 bits is only 32 bytes, and most file formats have standard stuff right at the start. Collisions would be very common.


That's already how it works in IPFS or Freenet. The hash is not for the data itself, but for a metadata document that contains a list of hashes to the data broken up into smaller blocks.


Assuming there is no malicious intent behind the embeds you could have the hash in the header of the video.


I think that in the context of the topic of this submission "replacement by porn" is being regarded as malicious intent.


Sounds like something subresource integrity[1] could be expanded to include.

[1]: https://developer.mozilla.org/en-US/docs/Web/Security/Subres...


This is were a signature would come in handy, if you want the resource to be updated. Just keep track that it was signed by the same private key.


I had a personal project that I got bored with so I let the domain expire. Then I got emails from former users saying that the domain was now hosting malware. So yeah I would like it for all the old links to the site to somehow know that the owner has changed. Not sure what a reasonable solution would look like though.


One solution is to have a unique URI per file that is independent of DNS. Decentralized file storage (e.g. FileCoin / IPFS) might serve this purpose...


Definitely but I think we need something that will work with the web we currently have while these bigger ideas are fleshed out and adopted.

Also, while IPNS covers the issue of linking to dynamic content, it's worth mentioning IPFS will have similar issues with DNS as DNSLink and similar domain-driven solutions are used to cover its usability issues (long, random URIs).


IPFS works pretty well in the style of progressive-enhancement with the existing web for static content specifically. If you want to link to a resource that's available through IPFS, then you make the link point to an IPFS gateway that you trust and expect to stay online (possibly your own on your own domain), like https://example.com/ipfs/Qm_IPFS_HASH_HERE/. Anyone that has a browser with direct IPFS support (either because they're using an IPFS extension or they're using a browser with built-in support, which might get more popular if IPFS takes off) will have their browser recognize the URL format and just fetch the file by hash directly from IPFS, and it won't matter if example.com is still up and serving the file. For everyone else, the link will work as long as example.com is still up and acting as an IPFS gateway. If example.com ever goes down, then users can make the link work by installing an IPFS extension or manually replacing the example.com domain in the link with the domain of any still-active IPFS gateway, and any admin in control of the page could fix the link similarly.


I think long, random URIs are fine for embedded content, actually. Most embeds are short, random URIs prepended with a trendy domain name.

If you could "permalink" certain content for embeds, that'd probably solve the issues, right?



SRI/hashing works for static content. Though it's worth mentioning it's a SUB-resource feature (images, scripts, etc.). It doesn't work for hyperlinks to other pages. Even if it did, it's a different use case.

Say I link to an article by Author A that has comments in it (or even a footer, relative timestamp, sidebar, etc.). Hashing won't work as the page is always changing. I want the link to always go to Author A but I don't care if the content changes. That's the sort of use case signed webpages and hyperlinks with enforced authorship covers. It's less about what's on the page and more about who created it.


Good points. I would guess that for something to be implemented, it would have to be easy for browsers and API tools to check once per domain and cache the response and should probably be something that already exists and has been adopted. Maybe a page could have a meta tag or header that contains a hash of the destination sites DANE signature? Something like "targetref:somedomain.tld expectsig:39726a2fe2bb052cf00e6b95a8385f7" based on tools like danecheck [1] or maybe DNSSEC but that is very poorly adopted.

[1] - https://github.com/vdukhovni/danecheck


The solution I was going for with WebVerify is more web-centric rather than domain-driven, which I think is a better fit for webpages. It can be enforced at the hyperlink-level for shared domains (like GitHub Pages, University web spaces) and works for static resources without needing to configure external resources. The only really complicated part is PGP but that can be solved with better tooling (as seen with Keybase).


Finding a way to embed the domain registration date might be sufficient, that would cover most of the expiry situation

There was a ietf or similar registry that used your domain and registration date to carve out your namespace, ie dns.2021.07.26.com.example would be your prefix. Pretty robust. Can’t remember what it was anymore


Could URL authorship confirmation be implemented on top of TLS? If someone takes over a domain, the final certificate in the chain will be issued to another entity, and that could be enough to trigger a notice. Could be achievable with a centralized registry/crawler like Internet Archive, but one that only keeps track of domain:certificate mapping.

Of course, the devil’s in the details (infrastructure/organizational changes can trigger false positives; shared hosting setup can cause false negatives; it presumes that if the original entity abandons a domain they’d revoke the cert; etc.), but IMO it wouldn’t be worthless as it is.


Most certificates (including all free ones) are domain validated only so the entity being certified is just the domain and will not change with a new owner.


If you’re talking about shared hosting, yes, it wouldn’t be covered by this model.

But other than that, if I have a cert fo xyz.com, and I abandon the domain, even if you buy it you’ll be forced to issue another cert for it.

If it was recorded somewhere that xyz.com = my cert, it could serve as a mechanism to verify that given URL is at least is supposed to be under my control and a warning could be shown if another certificate is being served now.

Kind of like HPKP, but with longer lifetime (longer than domain name registration term) and a centralized registry tracking certificate:domain mappings rather than each individual user agent cache.

Obviously, no one would adopt it due to being a devops nightmare.


Right, you can pin the certificate or the public key in it - but that's much more specific than the entity that the cert is issued to and as you correctly noted is not practical for browsers to do automatically since keys and certs do get rotated without a change in owner.


It's not a solvable problem.

Domains need to be short to be memorized, which makes them scare and valuable.

And having two forms of URLs is undesirable; just look at AMP.



One of the services I've sold more than once was to handle the "offlining" of a domain. Basically provide a 307/404/410 service and make sure it works for a long time before the name gets released. Basically to help clean up on the way out.


why 307 (temporary) and not 308 (permanent)? (curious about this as when I used to work on a ton of company websites and migrate from old sites, there'd always be a big process of setting up a ton of 301 (permanent) redirects)


Sorry, just a typo from the brain. It's a config option - with a few in the 300 series, mostly choosing a permanent option


Ahh cool, I see! All good, was curious if there was some new special technique maybe I hadn't heard about, haha :)


As both the post topic and the parent comment topic are about domain changes, wouldn't that basically direct users of the old site on the domain to a totally different domain when they want to access the new site on the same domain?


It could.. it all depends how the redirects are set up.

You can use 301/308 redirect for either new page/path or new domain, or both. It's pretty flexible. If you're interested to learn more, Google has a pretty good "best practices" page at https://developers.google.com/search/docs/advanced/crawling/...


Very amusing but I take issue with the article's claim that "Here’s (yet another) argument against using third-party embeds" - this might be an oversimpified perspective when it's actually a good argument to use subresource integrity (essentially cryptographic pinning of third party embedded content). I am unsure if this extends to every kind of resource that you could have in a web page, (ideally it should) but I could imagine a JS shim being used to cover some edge cases.


Subresource integrity would prevent the porn from showing up, but the videos would still be broken. Hosting the videos yourself is the option that would keep things running.


True, and HTML5 video tends to "just work," however I've found that many people have practical issues with hosting their own video. For example, not understanding video conversion, hosting behind bandwidth or request rate limitations, or not having anycast/CDN set up to hold the site together when a page unexpectedly goes viral. There are reasons that hosting videos yourself might not be an effective use of the resources or time available to you, especially based on technical skills, in which case third party hosting with pinning remains a better option.


Video embeds tend to be especially fragile however since they're often implemented by gluing together a relatively large number of third party services. Any one of the services goes down or changes the API and the whole thing breaks.

Ask any uMatrix user how much fun it is to get a video to play on some websites and just how much external crap you have to allow before you see the first frame.


That and Facebook container. I could not get slack to work until I allowed slack access to Facebook because I had to re-authenticate. Even though I don’t use Facebook I have to allow slack to access it at least to be able to authenticate to a different authentication source.


Don't get me started on comment section plugins that are basically just Facebook threads on a webpage.


I'd get it if it was Goatse, but this is just regular porn. If they simply wanted to earn money from ads, serving something more milquetoast would make more sense, because now there's a rush to remove old embeds.

Why?!


> serving something more milquetoast

Never gonna give you up...


They just bought the domain they are not actively serving the embeds URLs.


Now everyone know about it and vidme’s ad space is more valuable

Genius


Or 3rd party JS libraries. instead of https://code.jquery.com/, your admin 'accidentally' added https://code.jqeury.com/ which embeds a crypto miner


I use Domainy

https://www.domainy.io

to monitor interesting domains being expired and to also find interesting domains that are available.

twitter.accountant? what about twitter.beauty?

Domains are not exactly available when they expire, but this helps me to check if any of them become available.


> Here’s (yet another) argument against using third-party embeds on your respectable website

Well, it's an argument against using embeds without having any way to validate their authenticity.

This is analogous to having a software distro (e.g. package manager) which downloads upstream tarballs or git repos without checking any hashes.

Is there a solution for this? Say you want to embed a video from some third party sites; what tools are there for ensuring that the embedding will somehow lapse if the video at that URL has been replaced or altered?

Ideally, you'd be notified if that happened. While not showing porn is good, but videos not working is bad. You can't be checking an entire site all the time for non-working external content.



Genuinely curious, how does a company manage their domains after they shut down their services to prevent this kind of stuff


You could pre-pay for X years, where X is long enough to ensure you'll be dead when it expires.

Edit: Seems like there's a 10 year limit in many places? I wonder if that's broad convention or an actual rule.

Okay, apparently an ICANN limit for .com domains:

"The expiration date of the domain registration is extended by the number of years, up to a maximum of ten years, as specified by the Registrar's requested Extend operation."

https://www.icann.org/en/registry-agreements/com/com-registr...

Though, I can see that navy.us has an expiry in 2053, so it's likely per registry.


Network Solutions will sell you service for 100 years.

Edit: CSC, Markmonitor and similar shops will contract to keep your name available for long periods, too, but that's a bit different.


Or for the life of their company, whichever is shorter.


That's why I said 'sell' instead of 'provide'.


For me I feel like a 10 year pre-pay would be worse. I'll forget about the fact that I need to pay for it again in 2031.

Yearly at least keeps me on my toes a bit.


You still have to renew it yearly to keep the expiration 10 years away.

Also most registrars will send you an email when it's about to expire. If it does get dropped, there's a grave period to recover it.


I can at most pay for 10 years in advance. Is this a limit of my registrar?


On mine, I can pay for up to 10 years, but I think I can just pay twice and get 20.

Edit: It seems to vary. Some places cite an ICANN limit of 10 years.


10 years is the ICANN limit but Network Solutions offers 100 years. According to the fine print what they do is actually register it for 10 years and then every year for the next 90 years they register it for an additional year.

They are betting that they will earn more than enough from investing your up front 100 year payment to more than cover future increases in the cost of those one year extensions.

Their customers are betting that Network Solutions or some successor will be around long enough and that domain names will work like the do now long enough that this will be worth it.


Most registries limit domain lifetimes to 10 years. You can't renew a domain if doing so would put the lifetime beyond 10 years -- the registry will refuse to perform the operation.


Clearly there are companies that offer extension as a service, you pay 20 times as much and they promise to extend your domain 10 times.


That just moves the problem though, how likely is it that that company will be around in 90 years to perform the last extension?


That may vary by registrar. My .com domain is about 2 years out from renewal and the most it will let me add is 8.


Likely the registry, not registrar. When I worked at one 10 was the limit. Unsure if that was handed down by ICANN, though.


Aren't most domains limited to 10 years max?


I would think the same way a person would posthumously: let it expire and let natural order take over or delegate ownership via legal/custodial means.

Sorry that doesn’t answer your question more than “it depends on the entity”.


They don’t. They have shut down. Why would they care what happens?


The weird thing about this to me is, the company or person who scooped up the domain... in order to get their plan working so quickly, wouldn't they have had to set up a site perfectly beforehand so the embeds would work as desired and then just sat there waiting, hoping, ready to hit the button to scoop up the domain at just the right time, praying nobody beat them to it?

Isn't that a lot of work for almost no gain?


The embeds don't actually "work", per se. When the browser tries to load the embed into an <iframe>, it gets redirected to the home page of the porn site, and ends up displaying the upper left corner of that page in the space where the video embed was supposed to go. It all looks rather more accidental than purposeful.


Ah, good to know. That explains it. I feel like this aspect was lost in reporting this. It was all made to sound much more deliberate.


depending on how vidme stored their videos it could just be as simple as returning any request for a video file with whatever they were showing.


What's the win for the new owner, do they make money from linking to the porn somehow? Sure it could also be just for lulz but there are many equally or more lulzy possibilities whereas porn often seems to be coupled with economic gain.


> What's the win for the new owner, do they make money from linking to the porn somehow?

They are the porn (the new owner is a porn firm), so, yes. I haven't seen the actual linked content, but I assume its something like free samples with directions telling people where to get more; its a move that gets porn ads placed for free in a lot of places that would never choose to allow porn ads.


It might also be an accident, if all the embeds are iframes that redirect to the homepage.


Thanks, I didn't find it clear from the article and had no desire to go looking for content.


The trend toward embeds rather than screenshots or direct copying has long struck me as ill-conceived.

At least it's only pr0n. As a vector for malware / spyware injection, this could be even more interesting.

Relevant xkcd, of course: https://xkcd.com/1698/

h/t Elda King @ Mastodon https://weirder.earth/@eldaking/106626603001624730


That’s kind of hilarious but gross all at the same time


There were non-porn videos on the original vid.me? TIL.


I can't even imagine the amount of exposure to minors this has caused. But that's nothing new for the porn industry.


Why isn't this considered criminal vandalism and hacking? Intent matters.

Owning the domain doesn't give them a right to intentionally interfere with the requested content; they should simply decline to serve the expired URLs.


Based on another comment it's what they do. The dead URLs are redirected to their main page which gets embedded in place of the original videos. I don't think they should inherit the maintenance cost of the incoming links, as long as they are not maliciously swapping the content.


Any discussion beyond "lol damn" is unnecessary.


I guess I submitted too early :-) https://news.ycombinator.com/item?id=27925605


Domains are an area where blockchain technology could help a lot.

A domain could cost a fixed amount of X per month. So you could pay 100 years upfront and be sure to not lose it in that timeframe.

To move a domain,the registrar and the owner could have to sign the move. So it would not be possible anymore to lose a domain due to the regsitrar making a mistake.


“ethereum name service” is doing this and honestly aside from gas fees it’s not a bad price for permanent real estate (as permanent as the ethereum virtual machine at least)


Can you explain in detail what problem blockchain solves here and why it can't be solved without blockchain?


Well, if it's solved via blockchain, there's an opportunity for speculation among early adopters, which I think is the major selling point for most blockchain "solutions"


The problem that someone else can move your domain.

The same problem it solves for finance: That someone else can move your money.


Don't normal domains work the same - you pay a fee and you get the domain for X years? In both situations the domains expire and a porn site can nab them.


With normal domains, the registrar can move your domain without your approval.

So they could do so out of malice or because they get tricked into believing you gave them the go or because they got hacked or got ordered to do so by some governmental institution or or or ...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: