Hacker News new | past | comments | ask | show | jobs | submit login
YouTubeDrive: Store files as YouTube videos (github.com/dzhang314)
756 points by notamy on May 24, 2022 | hide | past | favorite | 298 comments



Hey everybody! I'm David, the creator of YouTubeDrive, and I never expected to see this old project pop up on HN. YouTubeDrive was created when I was a freshman in college with questionable programming abilities, absolutely no knowledge of coding theory, and way too much free time.

The encoding scheme that YouTubeDrive uses is brain-dead simple: pack three bits into each pixel of a sequence of 64x36 images (I only use RGB values 0 and 255, nothing in between), and then blow up these images by a factor of 20 to make a 1280x720 video. These 20x20 colored squares are big enough to reliably survive YouTube's compression algorithm (or at least they were in 2016 -- the algorithms have probably changed since). You really do need something around that size, because I discovered that YouTube's video compression would sometimes flip the average color of a 10x10 square from 0 to 255, or vice versa.

Looking back now as a grad student, I realize that there are much cleverer approaches to this problem: a better encoding scheme (discrete Fourier/cosine/wavelet transforms) would let me pack bits in the frequency domain instead of the spatial domain, reducing the probability of bit-flip errors, and a good error-correcting code (Hamming, Reed-Solomon, etc.) would let me tolerate a few bit-flips here and there. In classic academic fashion, I'll leave it as an exercise to the reader to implement these extensions :)


One more thing: the choice of Wolfram Mathematica as an implementation language was a deliberate decision on my part. Not for any technical reason -- YouTubeDrive doesn't use any of Mathematica's symbolic math capabilities -- but because I didn't want YouTubeDrive to be too easy for anybody on the internet to download and use, lest I attract unwanted attention from Google. In the eyes of my paranoid freshman self, the fact that YouTubeDrive is somewhat obtuse to install was a feature, not a bug.

So, feel free to have a look and have a laugh, but don't try to use YouTubeDrive for any serious purpose! This encoding scheme is so horrendously inefficient (on the order of 99% overhead) that the effective bandwidth to and from YouTube is something like one megabyte per minute.


As far back as the late 1970s a surprisingly similar scheme was used to record digital audio to analog video tape. It mostly looks like kind of stripey static, but there was a clear correlation between what happened musically and what happened visually, so in college (late 1980s) one of my friends came into one of these and we'd keep it on the TV while listening to whole albums. We had a simultaneous epiphany about the encoding scheme during a Jethro Tull flute solo, when the static suddenly became just a few large squares.

Can see one in action here

https://www.youtube.com/watch?v=TSpS_DiijxQ


Nice thanks, this answered my biggest question, which was "will it survive compression/re-encoding." (yes it will). Very cool idea!


Do you have any idea how many more bits you'd be able to use if you applied any of the encoding transformations?


I'd estimate that there's an easy order-of-magnitude improvement (~10x) just from implementing a simple error-correction mechanism -- a Reed-Solomon code ought to be good enough that we can take the squares down to 10x10, maybe even 8x8 or 5x5. Then, if we really work at it, we might be able to find another order-of-magnitude win (~100x) by packing more bits into a frequency-domain encoding scheme. This would likely require us to do some statistical analysis on the types of compression artifacts that YouTube introduces, in order to find a particularly robust set of basis images.


> that we can take the squares down to 10x10, maybe even 8x8 or 5x5

16x16, 8x8, or 4x4 would be the way to go. You'd want each RGB block to map to a single H.264 macroblock.

Using non order of 2 numbers means that individual blocks don't line up with macroblocks. Having a single macroblock represent 1, 4, or 16 RGB pixels would be ideal.

In fact, I bet modifying the original code to use a scaling factor of 16 instead of 20 would produce some significant improvements.


There's also the chroma subsampling issue. With the standard 4:2:0 ratios, you'll get half the resolution for the two chroma channels, and if I'm not mistaken, they are more aggressively quantized.

It would be better to use YUV/YCbCr directly instead of RGB.


I'm not sure if your examples are sticking to 0 or 255 RGB. If it is you might get a win by using HSL to pick your colors. If you change the lightness dramatically every frame maybe colors won't bleed across a frame. Then perhaps you can encode 2+ bits in hue and another 2+ in saturation getting a win and another minor one using 1+ bit on brightness (ie first frame can be 0 or 25%, next frame is 75% or 100%. I'm not too familiar with encodings though and how much it'd interfere with the other transforms


YouTube's 1080p60 is already at a decimation ratio of about 200:1, then you have to consider how efficient P and B frames are with motion/differences. if your data looks like noise you're gonna be completely screwed since the P and B frames will absolutely destroy the quality.

There's a bunch of other things too, like YUV420p and TV colour range: 16-235, so you only get 7.7bits / pixel.

If anything you would want to encode your data in some way that abuses the P and B frames, and the macro block size of 16x16.

Coding theory for the data output at your end is only one side of the coin, the VP9 codec stupidly good compression is a completely different game to wrangle.

And I kinda doubt you'll get much better than your estimate of 1% from the original scheme.

https://www.youtube.com/watch?v=r6Rp-uo6HmI


-Back in the day when file sharing was new, I won two rounds of beer from my friends in university - the first after I tried what I dubbed hardcore backups (Tarred, gzipped and pgp'd an archive, slapped an avi header on it, renamed it britney_uncensored_sex_tape[XXX].avi or something similar, then shared it on WinMX assuming that as hard drive space was free and teenage boys were teenage boys, at least some of those who downloaded it would leave it to share even if the file claimed to be corrupt.

It worked a charm.

Second round? A year later, when the archive was still available from umpteen hosts.

For all I know, it still languishes on who knows how many old hard drives...


Poor guys, still looking for the right codec to play the britney tape they downloaded 28 years ago.


Disturbing, she'd have been 12 at that time.


Hm, I thought she was popular in the mid '90s, but maybe it was more like the '00s?


No I'm pretty sure you're right on the timing. I was a teenager in the mid to late 90s and britney tapes were extremely common then, to the point that it was often used as a joke (much like in your story!)


WinMX was release 21 years ago, and Britney Spears definitely didn’t break out until around 1999 around the same time as Napster. The difference between 1994 and 1999 is quite a bit in show biz/pop culture and an absolutely huge difference in public uptake of the internet.


That's right, 1999 was the release of her first album (Baby One More Time). I think the reason some feel like it must have been the mid-90s is that she went from that to the first major scandals (marriage, divorce and a shift in her musical style) as well as her first Greatest Hits album by 2004.

Her next album after that (along with the head shaving incident and being placed under the custody of her manager-dad) was 2007, so most people's memories of her as a "sexy teen idol" are likely from her 1999-2003 period, which in retrospect probably felt a lot longer, especially with the overlap of other young women pop stars in the same period (e.g. Christina Aguilera started in 1998).


I think she also did some kind of Disney kids singing and talent show before that right?


The Mickey Mouse Club, yes. Alongside Justin Timberlake, Christina Aguilera and others.


For the record, neither I nor the person I replied to said 1994 (unless they edited their post). I was thinking 97 or 98, so was still off, but at this point in my life being off by a year or two feels pretty damn close ;-)


The person you replied to said in the prior post: “ to play the britney tape they downloaded 28 years ago.”


Which brittany are they referring to?


Britney Spears


Your story reminds me of a Linus quote.

"Real men don’t use backups, they post their stuff on a public ftp server and let the rest of the world make copies." -Linus Torvalds


You devil! I'm pretty sure I remember running into a file that looked like that and a quick poke around showed it wasn't anything valid.

Funny how these things work since I'm pretty sure I remember running into it around 2008 (i'm a few years younger).

I think i just deleted it though since I was suspicious of most strange files back then; I was the nerd who didn't have friends so i used to troll forums for anything i could get my hands on.


"running into it"... Yeah. Right. ;)


Not sure how WinMX works, but back in the old DC++ days, you not only searched for things but could navigate directory structures directly from users connected to the same hub. It was uncommon for me to browse through some of the biggest/interesting users and see what they were sharing.

By doing that, I'm sure I stumbled upon my fair share of sketchy stuff unintentionally, so it's not hard to imagine the same for others :)


Oops my finger slipped on the download button and I accidentally waited 2 days to let it finish. Clumsy me


That's a perfect college CS story. Beer and bastardized files - what a combo!


ah hell, you're the one who made my computer crash trying to open that and make me panic? damn you man


Before broadband was widely available, TiVo used to purchase overnight paid programming slots across the US and broadcast modified PDF417 video streams that provided weekly program guide data for TiVo users. There's a sample of it on YouTube https://www.youtube.com/watch?v=VfUgT2YoPzI but they usually wrapped a 60-second commercial before and after the 28-minute broadcast of data. There was enough error correction in the data streams to allow proper processing even with less-than-perfect analog television reception.


That is really interesting. I wonder if there were any other interesting uses of paid programming to solve problems like these around that time?


Videonics DirectED, a video editing system, conveniently loaded its software from... VHS tape. Here are some details: https://twitter.com/foone/status/1325945997160165376 and apparently you can still buy it well-preserved as new old stock, complete with the VHS software tape: https://www.ebay.com/itm/124380109086

A little less crazy and more straightforward (software on audio tape was super common after all): Radio stations and vinyl discs that transmitted programs to the microcomputers of the time (C64, TRS-80 etc.) have quite a long tradition. Some examples:

http://www.trs-80.org/basic-over-shortwave/ https://www.youtube.com/watch?v=6_CZpFqvDQo&t=2s


Not paid for programming but essentially the same tech: VHS games used to encode data in exotic ways so that the content was both viewable on regular TVs with a regular VHS player, but also had some kind of playable content.

https://youtu.be/WI133HNGNfk


Not quite paid programming, but Scientific Atlanta had a Broadcast File System that would send data to set top boxes over coax QAM channels used for digital TV. It would loop through all the content on the "carousel" repeatedly so all the boxes connected to that head end would eventually see the updates.


i made something like this for live streaming encrypted audio/video, but for the web, if you are interested: http://pitahaya.jollo.org


If I was to gamble I would say that Analog TV can store more data, compression algorithms usually work at say 1:200 compression ratio, they're extremely destructive, a raw 1080p60 in yuv420p is about 187MB/s, on the other hand a decent equivalent video on YouTube is about 1MB/s


I remember seeing this first discussed at 4chan /g/ board as a joke wether or not they can abuse Youtube's unlimited file size upload limit, then escalated into a proof of concept shown in the repo :)


This is a tangent. I must have been maybe 15-16 at the time, so somewhere around 20 years ago: One of the first pieces of software I remember building was a POP3 server that served files, that you could download using an email client where they would show up as attachments.

Incredibly bizarre idea. I'm not sure who I thought would benefit from this. I guess I got swept up in RFC1939 and needed to build... something.


On my first job (in the beginning of the millennium) there was a limit on files you could download, something around 5Mb. If you wanted to download something bigger, you had to ask sysadmins to do that and wait... That was really annoying. So I and my colleague end up writing a service, that could download a file to local storage and chop it into multiple 5Mb attachments and send multiple emails to requestor.

After some time the limit on single file was removed, but daily limit was set up to 100Mb. The trick is that POP3 traffic wasn't accountable, so we continued to use our "service".


That sounds suspiciously similar to how I used to download large files on a shared 2GB/month data plan. My carrier didn't count incoming MMS messages towards the quota, and conveniently didn't re-encode images sent to their subscribers via their email-to-MMS gateway. So naturally, I'd SSH into my server, download what I wanted to download, and run the bash script I wrote, which split the downloaded file into MMS-sized chunks, and prepended a 1x1 PNG image to them, and then sent them sequentially through my carrier's gateway. This worked surprisingly well, and I had a script on my phone which would extract the original file from the sequence of "photos". It may still work, but I've since gotten a less restrictive data plan.


I couldn't download .exe files at some $CORPORATION. They had to be whitelisted or something, and the download just wouldn't work otherwise. But once you had the .exe you could run it just fine. You just had to ping some IT person to be able to retrieve your .exe.

Of course it was still possible to browse the internet and visualize arbitrary text, so splitting the .exe into base64-encoded chunks and uploading them on GitHub from another computer was working perfectly fine... I briefly argued against these measures, given how unlikely they are to prevent any kind of threat, but they're probably still in place.


We still cannot email each other .py files where I work. But .py.txt is of course fine...


apparently e-mail is not much reliable for storing/keeping files. there have been cases where an old email with an attachment would not load correctly because the servers just erased the attachment file.


This was a custom email server though, there never were any emails, it just presented files as though they were so that a client would download them.

Actually caused some problems for email clients, as they usually assumed emails were small. I got a few of them to crash with 200 Mb "attachments" (although this was in the early 00s, 200Mb was bigger than it is today).


I'm still confused on how this worked, did you email some address and get a reply with the attachment ?


Since GP says it was a POP3 server, I suppose you would set up an email account in your client with its inbox server pointing to that POP3 server. When the client requests the content of the inbox, the server responds with a list of "emails" that are really just files with some email header slapped on; so your email client's inbox window essentially becomes a file browser.


Yeah, that's basically it.


Interestingly, if you take a look at your emails from a few years ago, most of the non attached images will fail to load now.


They also experimented with encoding videos and arbitrary files into different kinds of single (still) image formats, some of them able to be uploaded to the same 4chan thread itself, with instructions on how to decode/play it back. Examples:

https://dpaste.com/HFTKAPM5V

https://github.com/fangfufu/Converting-Arbitrary-Data-To-Vid...

https://github.com/rekcuFniarB/file2png

https://github.com/nzimm/png-stego

https://github.com/dhilst/pngencoder

https://github.com/EtherDream/web2img


I only looked at the example video, but is the concept just "big enough pixels"?

Would be neater (and much more efficient) to encode the data such that it's exactly untouched by the compression algorithm, e.g. by encoding the data in wavelets and possibly motion vectors that the algorithm is known to keep[1].

Of course that would also be a lot of work, and likely fall apart once the video is re-encoded.

[1] If that's what video encoding still does, I really have no idea, but you get the point.


Agree it would be cool to be "untouched" by the compression algorithm, but that's nearly impossible with YouTube. YouTube encodes down to several different versions of a video and on top of that, several different codecs to support different devices with different built-in video hardware decoders.

For example, when I upload a 4K vid and then watch the 4K stream on my Mac vs my PC, I get different video files solely based on the browser settings that can tell what OS I'm running.

Handling this compression protection for so many different codecs is likely not feasible.


Yes, but nothing is saying this has to work for every codec. Since you want to retrieve the files using a special client, you could pick the codec you like.

But (almost) nothing prevents YouTube from not serving that particular codec anymore. This still pretty much falls under the "re-encoding" case I mentioned which would make the whole thing brittle anyway.

But it's indeed cool to think about. 8)


How about Fourier transform (or cosine, whichever works best), and keep data as frequency components coefficients? That’s the rough idea behind digital watermarking. It survives image transforms quite well.


Just as an aside, it's absolutely astounding how much hardware Google must throw at YouTube to achieve this for any video anybody in the world wants to upload. The processing power to reencode to so many versions, and then to store all of those versions, and then make all of those accessible anywhere in the world at a moments notice. Really is such an incredible waste for most YouTube content.


Keep in mind that they can very well predict which videos will have any meaningful amount of views <99% and just encode on demand the ones that won’t.


I didn't know that, and am glad that they do transcode on the fly when appropriate. Very impressive that it's seamless to the end user, I've never sat waiting for a YouTube clip to play even when it's definitely something from the 'back catalog' like a decade old video with a dozen views.


I'm not sure if it's a waste, as the video needs to be reencoded somewhere.


What if you have an ML model that produces a vector from a given image. You have a set of vectors that correspond to bytes - for a simple example you have 256 "anchor vectors" that correspond to any possible byte.

To compress data an arbitrary sequence of bytes, for each byte, you produce an image that your ML model would convert to the corresponding anchor vector for that byte and add the image as a frame in a video. Once all the bytes have been converted to frames you then upload the video to YouTube.

To decompress the video you simply go frame by frame over the video and send it to your model. Your model produces a vector and you find which of your anchor vectors is the nearest match. Even though YouTube will have compressed the video in who knows what way, and even if YouTube's compression changes, the resultant images in the video should look similar, and if your anchors are well chosen and your model works well, you should be able to tell which anchor a given image is intended to correspond to.


Why go that way. I’m no digital signal processing expert, but images (and series thereof, i.e videos) are 2D signals. What we see is spatial domain and analyzing pixel by pixel is naive and won’t get you very far.

What you need is going to frequency domain. From my own experiment in university times most significant image info lays in lowest frequencies. Cutting off frequencies higher than 10% of lowest leaves very comprehensible image with only wavey artifacts around objects. You have plenty of bandwidth to use even if you want to embed info in existing media.

Now here you have full bandwidth to use. Start with frequency domain, set expectations of lowest bandwidth you’ll allow and set the coefficients of harmonic components. Convert to spatial domain, upscale and you got your video to upload. This should leave you with data encoded in a way that should survive compression and resizing. You’ll just need to allow some room for that.

You could slap error correction codes on top.

If you think about it, you should consider video as - say - copper wire or radio. We’ve come quite far transmitting over these media without ML.


We started with that approach, by assuming that the compression is wavelet based, and then purposefully generating wavelets that we know survive the compression process.

For the sake of this discussion, wavelets are pretty much exactly that: A bunch of frequencies where the "least important" (according to the algorithm) are cut out.

But that's pretty cool, seems like you've re-invented JPEG without knowing it, so your understanding is solid!


That's essentially a variant of "bigger pixels". Just like them, your algorithm cannot guarantee that an unknown codec will still make the whole thing perform adequately.

Even if you train your model to work best for all existing codecs (I assume that's the "ML" part of the ML model), the no free lunch theorem pretty much tells us that it can't always perform well for codecs it does not know about.

(And so does entropy. Reducing to absurd levels, if your codec results in only one pixel and the only color that pixel can have is blue, then you'll only be able to encode any information in the length of the video itself.)


It's not guaranteed to perform well with unknown or new codecs - true. But, the implicit assumption is that YouTube will use codecs that preserve what videos look like - not just random codecs. If that assumption holds then the image recognition model will keep working even with new codecs.


That's the thing though, "looks like" breaks down pretty quickly with things that aren't real images. It even breaks down pretty quickly with things that are real, but maybe not so common, images: https://www.youtube.com/watch?v=r6Rp-uo6HmI

So one question would be: Does your image generation approach preserve a higher information density than big enough pixels?


Why would you assume that the images in my algorithm aren't real images? For example, you could use 256 categories from imagenet as your keys. Image of a dog is 00000000, tree is 00000001, car 00000010, and so on.


I'm not assuming they are not real images, I'm questioning whether to get to any information density that even out-performs "big enough colorful pixels", you might get into territories where the lossy compression compresses away what you need to unambiguously discriminate between two symbols.

And to get to that level of density I do wonder what kind of detailed "real image" it would still be.

If your algorithm were for example literally the example you just noted, then the "big enough colorful pixels" from the example video already massively outperform your algorithm. Of course it won't be exactly that, but you have the assumption that the video compression algorithm somehow applies the same meaning of "looks like" in its preservation efforts that your machine learning algorithm does, down to levels where differences become so minute that they exceed what you can do with colorful pixels that are just big enough to pass the compression unscathed, maybe with some additional error correction (i.e. exactly what a very high density QR code would do).


Or, film pieces of paper in succession, in a clear enough manner that they're still readable even when heavily compressed.


OH, i get it :)


Back in the day, VCRs were commonly used as tape backup devices for data.

Now studios are using motion-picture film to store data, since it's known to be stable for a century or more.


YouTube let’s you download your uploaded videos. I’ve never tested it, but supposedly it’s the exact same file you uploaded.[a] It probably wouldn’t work with this “tool” as it uses the video ID (so I assume it’s downloading what clients see, not the source), but it’s an idea for some other variation on this concept.

[a] That way, in the future, if there’s any improvements to the transcode process that makes smaller files (different codec or whatever), they still have the HQ source


They may retain the original files, but they don't give that back to you in the download screen. I just tested it by going to the Studio screen to download a video I uploaded as a ~50GB ProRes MOV file and getting back an ~84MB H264 MP4.


YouTube doesn’t serve ProRes though. What if you try uploading an h264 video and re-download?


YT might still recompress your video, possibly using proprietary algorithms that are not necessarily DCT based


As said, falls apart with re-encoding. But is a bit more interesting than what is more or less QR codes.


I find it a bit more interesting to have something that actually works on youtube, even if only as a proof of concept.


Could youtube-dlp and YouTube Vanced now be hosted on.. YouTube?

I wonder how long it'd take for Google to crack down on the system abuse.

Is it really abuse if the videos are viewable / playable? Presumably the ToS either already forbids covert channel encoding or soon will.


It's one of those problems that resolves itself.

The process of creating and using the files is prohibitively unusable and so many better solutions exist that YT doesn't need to worry about it


Probably breaks TOS under video spam


Just gotta add some good 'ol steganography


This brings up an interesting question: what is the upper-bound of hidden data density using video steganography? E.g. how much extra data can you add before noticeable degradation? It's interesting because it requires both a detailed understanding of video encoding and also understanding of human perception of video.


I've seen drone metal videos where the video and audio could both be 90% steganography and I wouldn't know the difference.


I'd expect you could store more data steganographically than the raw video data.

You can probably do things like add frames that can't be decoded and so are skipped by a decoder; that effectively allows arbitrary added hidden data. That's maybe cheating.

If you stipulate that you can't already have a copy of the unaltered file, and the data has to be extractable from a pixel copy of the rendered frames ... that becomes more interesting, I think.


Youtube doesn't give you the raw video back, it does transcoding to their given standard bitrates/resolution sets.

You'll notice this if someone has just uploaded a video to Youtube and the only version available for playback is some 360p/480p version for a few hours until Youtube gets around to processing higher bitrates.

So whatever you're encoding has to survive that transcode process.


A pretty massive amount I imagine. I attended a lecture on single image steganography and they were able to store almost 25% of the image's size and it was barely visible. Even 50% didn't look too bad.

Extending that into video files and it would likely be pretty massive, although you'd have some interesting time with youtube's compression algorithms


Good luck preserving it through YouTube's video compression. It's super lossy with small details, in bad cases the quality can visibly degrade to a point it looks more like a corrupted low-res video file for a few seconds (saw that once in a Tetris Effect gameplay video).


I mentioned it in another comment, but while that does lower the bandwidth of a single frame, its not actually an issue. There's several DRM techniques that can survive a crappy camera recording in a theater.

"compression resistant watermark" turns up some good resources for it. QR codes are another good example of noise tolerant data transmission (fun fact - having logos in a QR code isn't part of the spec, you're literally covering the QR code but the error-correction can handle it).

The best way I can describe it is that humans can still read text in compressed videos. The worse the compression/noise the larger the text needs to be, but we can still read it.


Add a music track, it is now a psychedelic art video.


A music track in which the music happens to be FSK data disguised as chiptune.


Then how is Roel Van de Paar allowed to be on youtube?


yeah wonder how long until the ban, also bans all of your descendants for 10 generations?


If you put youtube-dlp on youtube as a video, make sure to use youtube-dlp to it up.


>Is it really abuse if the videos are viewable / playable? Presumably the ToS either already forbids covert channel encoding or soon will.

If creators start encoding their source and material into their content Google would probably be fine with that because it gives them data but also gives them context for that data.

Edit: I meant like "director's commentary" and "notes about production" type stuff like you used to see added to DVDs back in the day. Not "using youtube as my personal file storage". Why is this such an unpopular opinion?


> If creators start encoding their source material into their files Google would probably be fine with that

it'd depends, as I don't think people using YT to store files would watch a lot of adds


If creators use it like the appendix in a book I can see people watching ads on their way to it.


> If creators start encoding their source material into their files Google would probably be fine with that

Not true at all, lol. Google has a paid file storage solution. YouTube is for streaming video and that's the activity they expect on that platform. I couldn't imagine any service designed for one format would "probably be fine" with users encoding other files inside of that format.


I think the parent comment is limiting themselves to the embedding of metadata specific to the containing file. It would be like adding a single frame, but would potentially give useful information to Google. In those limited circumstances I think the parent is correct.


This reminds me of an old hacky product that would let you use cheap VHS tapes as backup storage: https://en.wikipedia.org/wiki/ArVid

You would hit Record on a VCR and the computer data would be encoded as video data on the tape.

People are clever.


Early games and software would be delivered on audio cassettes that would then have to be 'played' in order to load your software temporarily into the device, which could take minutes

edit: Video from the 8-bit Guy on how this worked - https://www.youtube.com/watch?v=_9SM9lG47Ew


Wow, 2GB on a standard tape. For the time, that's incredibly efficient and cheap.


Yeah. Video, even old grainy VHS, had a pretty high bandwidth. Even much more so with S-VHS, which did not become super popular though. (I'm actually wondering whether the 2GB figure was for S-VHS, not VHS. Didn't to the math and wouldn't be surprised either way, though.)


A normal VHS encodes about 100 million scan lines over 2 hours. 20 bytes per scan line sounds feasible, since there's somewhere around 200-300 'pixels' of luma available in each scan line.


Thanks, that's a very reasonable back of the envelope calculation.

There are many fun details about VHS, its chroma resolution, and especially some weirdness around the PAL delay line, but they all don't really matter for this.

Wikipedia says there is about 3MHz of bandwidth, so ~200-300 "pixels" seems like a very good ballpark (just going by the fact that a normal PAL signal has about 6 MHz and is commonly digitized as 720x576, 3MHz about halves the horizontal resolution and taking some pixels off for various reasons makes sense).


This is old school. When I first wrote code back in the Stone Age we used to store our stuff on cassette tape.


You had cassette tape?? Lucky... I had to write my 1's and 0's in the dirt with a stick.

Damn rain.


You guys had dirt?


You guys had atoms? When I was a lad, there were only photons.


You guys had ? When I was, we weren't.


You guys will have spacetime and causality? I wont have that when I will be a young lad.


I still have my Atari 400 and tape drive!


My family had Atari 400 with a tape drive. I remembered buying a tape with a game. We also use it for basic programming language and the Astroids game using a cartridge.


Yep, I had the BASIC cartridge and used the tape drive almost exclusively for that. Coded up all sorts of little projects on that machine. I hated the membrane keyboard, but it worked!


Ha ha, when I was a kid with my C64, I used my moms old reel-to-reel tape deck to store data.

I still have a C64 and tape drive.

There was a magazine in the 80’s where you could scan in the code with a bar code scanner.


The Alesis ADAT 8 track digital audio recorders used SVHS tapes as the medium - at the end of the day, it's just a spooled magnetic medium, not hugely different conceptually than a hard drive.


That's not really that hacky, audio cassettes were used forever, it's just a tape backup.


Yes! There were many such systems, LGR made a video for one of them, also showing the interface (as in: hardware and GUI) for the backup: https://youtu.be/TUS0Zv2APjU


I remember a similar solution that was marketed in a German mail order catalogue in late 1990s. It could have been Conrad, but I'm not 100% sure. I recall it being a USB peripheral, though. (Maybe I could find more about it in time...)


Reminds me of a guy who stored data in ping messages https://youtu.be/JcJSW7Rprio


Back in the day, when protocols were more trusting we would play games by storing data archives in other people's SMTP queues. Open the connection and send a message to yourself by bouncing it through a remote server, but wait to accept the returning email message until you wanted the data back. As long as you pulled it back in before it times out on that queue and looped it back out to the remote SMTP queue you could store several hundred MB (which was a lot of data at the time) in uuencoded chunks spread out across the NSFNet.


I watch these things and I begin to realize I'll never be as intelligent as someone like this. It's good to know no matter how much you're grown there is always a bigger fish.


I agree that there will always be smarter fish, but you can definitely be this smart it just takes the proper motivation ( or weird idea ) to wiggle its way into your brain.


What part of the video discusses this? :D So far it’s about juggling chainsaws

Edit: OK, I see where this is going. Lol


This reminds me of SnapchatFS[1], a side project I made about 8 years ago (see also HN thread[2] at that time).

From the README.md:

> Since Snapchat imposes few restrictions on what data can be uploaded (i.e., not just images), I've taken to using it as a system to send files to myself and others.

> Snapchat FS is the tool that allows this. It provides a simple command line interface for uploading arbitrary files into Snapchat, managing them, and downloading them to any other computer with access to this package.

[1]: https://github.com/hausdorff/snapchat-fs

[2]: https://news.ycombinator.com/item?id=6932508


How much data can you store if you embedded a picture-in-picture file over a 10 minute video? I could totally see content creators who do tutorials embedding project files in this way.


Back of the envelope estimate:

4096 x 2160 x 24 x 60 is your theoretical max in bits/second, 127 billion.

Assume that to counter YouTube's compression we need 16x16 blocks of no more than 256 colors and 15 keyframes/second; that reduces it to

256 * 135 * 8 * 15 = 4.1 million bits/sec.

That's not too awful. Ten minutes of this would get you about 300MB of data, which itself might be compressed.


To do PiP (picture in picture), you would be restricted to a much smaller size, but otherwise good calculations.


4k video is almost always 3840x2160


4K consumer video is 3840x2160, 4K Cinema video is 4096x2160.

Just like 2K consumer video is 1920x1080 and 2K Cinema video is 2048x1080


sure, youtube videos are consumer though


Not necessarily — YouTube supports DCI 4K as well and videographers sometimes upload in DCI 4K.


“hope you enjoyed this video. btw, the source code used in this tutorial is encoded in the video.”


Would storing data as a 15 or 30 FPS QR code "video" be any more useful? At a minimum one would gain a configurable amount of error correction, and you could display it in the corner.


Yeah seems way easier than adding a link in the description


Links die. As long as the video exists, the files that the video uses will always exist.


As if videos don't die...


well the point is the video and the file are linked - the video cannot exist without the file


This is a classic case of overengineering a solution to a nonexistent problem.

On YouTube, the video and the description are also linked. They exist on the same page always.

And even if the concern this solution is covering is what if the video is somehow shared without the description, away from YouTube, then the video could just as easily contain the description or URL or QR code pointing to the file.

This is a just horribly unusable QR code.


The description is not big enough to hold practically any data. You would need to link to it from there.... at which point the two are no longer linked into existence. Links go down insanely often.


I'm not saying embed the file I the description. Add a URL that points to the asset like we've been doing forever... This solves nothing.


>add a URL that points

This works really well until it doesn't. I have seen so, so, so many videos have linked content in the description that links to sites which don't work anymore.


> links to sites

Wait so your expectation is that instead of Youtubers using URLs to link to websites, you would prefer and expect that they download and embed those websites into their videos for you? Like, a zip file of the whole site? For.... convenience?

Have you considered using archive.org or mirrors instead?

There are an infinite number of better solutions for this...


Turns out any site that allows users to submit and retrieve data can be abused in the same way:

- FacebookDrive: "Store files as base64 facebook posts"

- TwitterDrive: "Store files as base64 tweets"

- SoundCloudDrive: "Store files as mp3 audio"

- WikipediaDrive: "Store files in wikipedia article histories"


I wrote one of these as a POC when at AWS to store data sharded across all the free namespaces (think Lambda names), with pointers to the next chunk of data.

I like to think you could unify all of these into a FUSE filesystem and just mount your transparent multi-cloud remote FS as usual.

It's inefficient, but free! So you can have as much space as you want. And it's potentially brittle, but free! So you can replicate/stripe the data across as many providers as you want.


I was an eng manager on Lambda for a time, and we definitely knew people were doing this, and had plans to cut it out if it ever became a problem. :D


Yeah, you'd need to find some sort of auto-balancing to detect this kind of bitrot from over-aggressive engineering managers & their ilk and rebalance the data across other sources. I think the multiple-shuffle-shard approach has been done before, maybe we could steal some algo from a RAID driver, or DynamoDB.


Back in the day when @gmail was famous for their massive free storage for email, ppl wrote scripts to chunk large files and store them as email attachments.


I used this as a backup target for the longest time. Simply split the backup file into 10 MB chunks and send as mails to a gmail account. Encrypted so no privacy problems. Rock solid for years.

And as it was just storing emails it was even using gmail for it's intended purpose so no TOS problems..


Yup, did the exact same thing to back up all of the Wordpress installs on a free server I ran for friends.


People did this on AOL in the 90s as well!


Did you manage to get on the latest Mass Mail going out tonight?


With AOL, in the early 90’s you didn’t even need to do that. You could just reformat and reuse the floppy disks they were always sending you for free storage.


I know someone who published an academic paper on doing exactly this.


Doesn't sound very noteworthy tbh. It's obviously possible and the implementation is straightforward.


sounds like 99% of academic papers


Most papers at least sound like they're notable!


The less jam you have, the more you spread it out.

The opposite is also true. Brilliant ideas have lead to papers that can read obvious and terribly unremarkable.


See also https://github.com/qntm/base2048. "Base2048 is a binary encoding optimised for transmitting data through Twitter."


Still need around 30,000 more unicode characters for this to work.


Sorry, I edited the post concurrently with your comment - it now points to Base2048, the link I meant to post, which actually should work - rather than https://github.com/qntm/base65536 (which I think you're commenting on).


> For transmitting data through Twitter, Base65536 is now considered obsolete; see Base2048.

Source: https://github.com/qntm/base65536


My friends and I had a joke called NSABox. It would send data around using words that would attract the attention of the NSA, and you could submit a FOIA request to recover the data. I always found it amusing.


There's a feature in Emacs that does that (unsurprisingly.)

It's called `M-x spook'. It inserts random gibberish that NSA and the Echelon project would've supposedly picked up back in the 90s.


spook.el was "introduced at or before Emacs version 18.52". And 18.52 was released in 1988. And spook.el in a comment says

    ;; Created: May 1987
So the things that the NSA and ECHELON would have picked up on back in the 1980s, not the 1990s :)


I've heard of the loic ion cannon dos tool described as a shortcut to getting sent to jail. This sounds similar.


Big difference. LOIC actually impacts a target.


This is pretty tame compared to some actual, practical ones such as https://github.com/apachecn/CDNDrive

For people who don't read Chinese: it encodes data into ~10M blocks in PNG and then uploads (together with a metadata/index file as an entry point) to various Chinese social media sites that don't re-compress your images. I knew people have used it to store* TBs after TBs data on them already.

*Of course, it would be foolish to think your data is even remotely safe "storing" them this way. But it's a very good solution for sharing large files.


What a great time to write botnets


I made a tool that lets you store files anywhere you can store a URL: https://podje.li/


Is there an import URLs button? Otherwise, how does one reassemble the original?


Click them, it's really for things that fit into one or two urls like small text files. I've used it for config files that were getting formatted incorrectly over corporate email that ate it as a attachment.


Github repos makes for a pretty good key-value store.

It even has a full CRUD API, no need for using libgit.


I wonder if we could use this technique at place which gov will censored senstive data upload to streaming site like mainland china or North Korea(they do have streaming site right?)

although for propganda use, shortwave / sat tv is a much much simpler way to distribute information to place like that, but I belive now its hard to get one SW radio for anyone.


Reminds me of when I tried to Gmail myself a zip archive, and it was denied because of security reasons iirc. I then tried to base64 it, and it still didn't work, same with base32, until finally base16 did work.


I found some pirates uploads video to Prezi so they get free S3 video hosting.


At one point there was a piece of software called deezcloud which exploited Deezer's user uploaded MP3 storage, allowing it to be used as free CDN cloud storage for up to 400GB of files. I don't think it works anymore, and I'm not sure if it ever worked well (I never tried it).


I wonder if access permissions would be easier to maintain using Facebook...


Until one day your base64 ciphertext just so happens to contain a curse word and you get banned for violating "community standards"


I think I've seen similar blog posts about doing the same with the DNS and BGP networks



We need an HNShowDeadDrive


also Telegram


I remember my friend did something like this on an old unix system.

Users were given quotas of 5Mb for their home directory. He discovered that filenames could be quite large, and the number of files was not limited by the quota, so he created a pseudo filesystem using that knowledge, with a command line tool for listing, storing and retrieving files from it. This was the early 90s


Years ago when Amazon had unlimited photo storage, you could “hide” gigabytes of data behind a 1px gif (literally concatenation together) so that it wouldn’t count against your quota.


They still do if you pay for Prime. I was surprised to see that even RAW files (which are uncompressed and quite large) were uploaded and stored with no issues. Not the same as "hiding" data but might still be possible.


In the interest of technical correctness, RAW files are frequently compressed and even lossily compressed. For example, Sony's RAW compression was only lossy until very recent cameras.

Given that there are the options for uncompressed, lossy compressed and lossless compressed, I'd say RAW files differ in the stage of the data processing where capture is being done and doesn't imply anything about the type of compression.

What is relevant is that the formats vary widely between manufacturers, camera lines and individual cameras, so unlike JPEG, it's really hard to create a storage service that compresses RAW files further after uploading in a meaningful way. So anything they do needs to losslessly compress the file.


Interesting, so are you saying that the RAW signal coming from the hardware is already often compressed even before hitting the main software compression?


Oh, no. What I'm saying is that cameras often take the raw signal from the hardware, but then the camera software frequently compresses that signal before writing it to a raw file (.cr2, .arw, .dng, whatever). This compression can be lossy or lossless. It's important not to confuse the raw signal with the RAW file (an actual format, often specific to the camera manufacturer). Just by saying RAW file, assuming it's lossless or uncompressed is false. So it should be specified - uncompressed RAW (lossless almost by definition), lossy compressed, lossless compressed.


I guess you can store 24 bits of data as the R,G and B components of a pixel of an "image", and store it as a lossless image...


Shhhh, I still do this with encrypted database backups.



This is great. I did something very similar with a laser printer and a scanner many years ago. I wrote a script that generated pages of colored blocks and spent some time figuring out how much redundancy I needed on each page to account for the scanner's resolution. I think I saw something similar here or on github a few years ago.


Reminds me of "Cauzin Softstrip", the format some computer magazines used back in the day to distribute BASIC programs, or even executables.

Random example from an issue of Byte:

https://archive.org/details/byte-magazine-1986-05/page/n432/...


Searching HN for "paper backup" gives a lot of existing solutions, in fact too many that I don't know which one you saw.


So you invented QR codes?


Overly complicated, color QR codes.


Seems like a great way to get your account closed for abuse!


You'd be surprised how much YouTube lets you upload.

I've been uploading 2-3 hours of content a day every day for the past few years. On the same account too.

I have fewer than 10 subscribers lol.


Lucky you. I just posted my first two videos from a conference that were banned within a day for violating "Community Guidelines" without appeal.


They let you sometimes get away with a lot more[0] ;)

[0]: https://www.youtube.com/watch?v=Olkb7fYSyiI


How MUCH - yes - as long as it's videos, and it's not violating copyright, you're probably not violating any Terms of Service.

But I guarantee there is some clause in the ToS that this project violates.


What kind of content do you upload? (Should "content" be in air quotes? :P)


Lol yeah.

It's just recordings of myself when I'm doing deep work. I use OBS to stream my computer screen and a video recording of myself (mostly me muttering to myself).

It helps me avoid getting distracted (I feel like I'm being watched lol) and it's also interested to check back if I want to see what I was working on 3 months ago.

All the videos are unlisted or private.


Are you screensharing while recording? What tooling do you use to do this if so?

Also, any potential issues with Google having access to proprietary code? I know the chance of any human at Google interpreting your videos is near-zero but still


Isn't that what Twitch is for?


AFAIK twitch deletes your saved streams after some time.


wow, curious, are you keeping these videos there, or will delete them after several months?


You could make it much harder to detect by synthesizing a unique video with a DNN and hiding the data using traditional stenography techniques.


I think that video compression might make this not a viable technique. Artifacts would destroy the hidden data, right?


Compression will limit the bandwidth of a given frame but you can work around it.

Some forms of DRM are already essentially this, compression - and even crappy camera recording from a theater - resistant DRM that is essentially stegonagraphy (you can't visually tell its there) exist.

EDIT: "compression resistant watermark" is a good search phrase if anyone is curious


Unless you tuned the NN on the files you get back from YouTube, so that it learns to encode the data in a way that is always recoverable despite the artifacts.


Couldn't you also embed data through sound? Upload a video of a monkey at the zoo but you insert ultrasound with encoded data.

something like this but far more mundane

https://www.youtube.com/watch?v=yLNpy62jIFk


> but you insert ultrasound with encoded data

Others in these comments have also suggested steganography in both the video and audio streams. The problem with that is that when you retrieve a video from YouTube, you never get the original version back. You only get a lossy re-encoded version, and the very definition of lossy encoding is to toss out details that humans can't (or wouldn't easily) perceive, including ultra-sonic audio.


It might be ridiculous, but how about uploading a computer-generated video of a human saying 0 and 1 very quickly, to encode binary file.

Or better yet, the file could be one third the size if the human says the numbers 0 to 7.


That is what redundancy and error correcting codes are for. It will reduce your data density, but I am sure you can find parameters that preserve the data.


Another thread posted today makes it seem like they don't really care

https://news.ycombinator.com/item?id=31488455


Then the whole HN crowd would have enough outrage materials for weeks. Seems like a win-win situation to me.


If it becomes prevalent, I think YouTube would do something like slightly randomize the compression in their videos to dissuade this kind of use.


isn't the point here that the sub-pixels being produced are so large that it would take a tremendous amount of artifacts to reduce them to an unreadable state?

in other words; if YTs compression was affecting it so badly that it prevented the data from being re-read, wouldn't that compression scheme render normal video-watching impossible?


Does YouTube store and stream all videos losslessly? How does this work otherwise?


No, YouTube is not lossless.

The video that is created in the example in the README is https://www.youtube.com/watch?v=Fmm1AeYmbNU

We can see that data is encoded as "pixels" that are quite large, being made up of many actual pixels in the video file. I see quite bad compression artifacts, yet I can clearly make out the pixels that would need to be clear to read the data. It looks like the video was uploaded at 720p (1280x720), but the data is encoded as a 64x36 "pixel" image of 8 distinct colors. So lots of room for lossy compression before it's unreadable.


Imagine a QR code that changes once every X milliseconds.


That's an excellent analogy, thank you.


The data is represented large enough on screen that compression doesn't destroy it.


e.g. similar to a QR code stored as a JPEG will still work fine.


things like redundancy and crc checks I assume


The code looks not too big (a single file). But it requries a paid symbolic language (Mathematica) to be used. Anyone with better Mathematica knowledge explain if it can be ported to another symbolic (Sage, Maxima) or non-symbolic languages (R, Julia, Python)


Yep! I'm the creator of YouTubeDrive, and there's absolutely nothing in the code that depends on the symbolic manipulation capabilities of Wolfram Mathematica -- you could easily port it to Python, C++, whatever. However, there are two non-technical reasons YouTubeDrive is written in Mathematica:

(1) I was a freshman in college at the time, and Mathematica is one of the first languages I learned. (My physics classes allowed us to use Mathematica to spare us from doing integrals by hand.)

(2) I intentionally chose a language that's a bit obtuse to use. I was afraid that I might attract unwanted attention from Google if YouTubeDrive were too easy for anybody to download and run.


Cool thanks. This is an ingenious idea in the true hackers spirit. Well done.


I remember seeing years ago a python library called BitGlitter which did the same thing. It would convert any file to a image or video. You could then upload the file yourself. https://pypi.org/project/BitGlitter/


Looks like the github page is deleted. Its better not to use it anymore.


Seems like it may be a decent "harder drive". https://youtu.be/JcJSW7Rprio


Are there any services out there that combine all of these “Store files as XYZ” into some kind of raid config?

Would be interesting if you could treat each service (Youtube, Docs, Reddit, Messenger, etc) as a “disk” and stripe your data across them.


Makes me wonder how many video and image upload sites are now used as easily accessible number stations these days


Probably not many. The advantage of plain old-fashioned radio is that the station doesn't keep track of the receivers. Whoever watches a YouTube numbers station is tracked six ways to Sunday.


Rename the project to VideoDrive or something. With the current name Google can get GitHub to take it down on the basis of trademark infringement.


Here I am trying my best to get my favorite videos OFF YouTube given that they could disappear at any second because of an account block, or just "reasons", and this link suggesting storing stuff with YouTube? By god, why? Sure, it's free, practically "limitless" slow file storage, but what a bad idea nonetheless....


Back in the 90’s I considered storing my backups as encrypted stenographied or binary Usenet postings, as a kind of decentralized backup, postings which would stick around long enough for the next weekly backup. (Usenet providers had at least a couple of weeks of retention time back then.)


Reminds me of the old Wrapster[1] days

[1] https://www.cnet.com/tech/services-and-software/napster-hack...


I'm a GOOGL investor and I find this offensive.


I can't wait until malware uses this as C2


Seems pretty fragile. Google taking down your channel would be enough to disarm your malware.


they worked around this years ago by generating the username (domain name) based on some property of the current time

(plus using more than one tld)


Ipfs is decent enough or better with free pinning services


This gave me a flashback of VBS on amiga… video backup system, record composite video on a vcr, and simple op amp circuit that would decode black and white blobs of video pixels, could backup floppies at reading speed. Was really impressive until, well, vhs… ;)

Just did a google and saw it had evolved over the years, used only the 1.0 implementation back in the days. For those on another nostalgic trip : http://hugolyppens.com/VBS.html


I wonder if something similar could be useful for transmitting data optically, like an animated QR code. Maybe a good way to transmit data over an air gap for the paranoid?



What does the OP have against “Google Drive” when seeking file storage via a Google Service?

Horses for courses… this is how we end up with pictures clogging transaction ledgers


Reminds me of the movie Contact where the alien civilization encodes the whole design of a traveling machine inside Olympic telecast video.


Popularity of such projects is the reason of imposing more and more constraints on systems that are somewhat open (at least open to use). Maybe instead of figuring out how to abuse an easy-to-use system, people should figure out how to abuse hard-to-use systems, like e.g. creation of open protocols for closed systems. That would be an actual achievement.


Upload videos as data, then build an app that streams and decodes these files back into videos. Voila, popcorn time.


Reminds me of this similar tool that exploited GMail the same way: https://www.computerworld.com/article/2547891/google-hack--u...


Yes we have all done or used something similar when we were younger, but really, should this be on the front page of HN? This is abuse of a popular service and if it becomes popular it will only make YouTube worse and YouTube is getting worse without any additional help.


I remember a project that was doing this with photo files and unlimited picture storage.


This ones not the best but it works. I would recommend zipping everything and then using that as a single file. (file size limit is ~2GB fyi) https://github.com/Quadmium/PEncode


BEWARE: Until they clamp down and delete the files, you lose your data.

Good technical experiment though!


Since he's made a ready-to-use software, yeah Google will probably ban this quite quickly...


I suspect people in my office who send everything as a Word attachment with an image, PPT, Excel workbook, etc., embedded, are doing this unknowingly.

There are even Word files I've found that have complete file path notation to ZIP files.


I think my favorite part of this is that the example video linked to this has ads on it. It's a backup system that pays you. Well, until someone at Youtube sees it and decides to delete your whole account.


This reminds me of Blame! where humans are living light rats in the belly of the machine. Lol, also reminds me of the geocities days where we created 50 accounts to upload dragon ball z videos.


I absolutely love this idea. I need to dig more into the code, but its almost like using twitter as a 'protocol' using youtube as a storage.

So many ideas are flying to mind. Really creative.


I love that this is like tape in that it's a sequential access medium. It's storing a tape-like data stream in a digital version of what used to be tape itself (VHS).


I believe YouTube supports random access, or otherwise you wouldn’t be able to jump around in a video. Youtube-dl also supports resuming downloads in the middle, I believe.


True... But guessing YouTubeDrive 'decoder' needs whole video to get you back anything close to what you put in.

Otherwise each frame would have to have a ridiculous amount of encoded overhead.

Ahh, NM cant even see that working.

edit: Maybe a file table at built from from specified first N frames, that delivers frameset/file map ...

Still nothing like skipping spots in a video. That relies on key frames and time signatures.

Cool stuff nonetheless...


Why would you need a map or overhead?

Each frame gets the same amount of the file, about a kilobyte. So each frame is basically a sector. You need to read in a few extra frames to undo the compression, but otherwise it's just like a normal filesystem. And reading in a batch of sectors at once is normal for real drives too.

Even if you did need the frames to be self-describing, you could just toss a counter/offset in the top left corner for less than 1% overhead.


I like this. The last wave of Twitter users into the fediverse caused my AWS bill to go up 10 USD a month. Might have to start storing media files on youtube instead ;)


Reminds me of the other post that used Facebook Messenger as transport layer to get free internet in places that internet is free if you use Facebook apps.


This seems like something Cicada 3301 would use

I wonder how many random videos like this are floating around that are encoding some super secret data...


I’m thinking maybe we can divide files into pieces and turn each pieces into a QR code then turn each QR code into a single frame?


Wasn't there more or less recently on HN something like "Store Data for free in DNS-Records"? Reminds me of this.


Imagine a Raid6 of four youtube 11-digit IDs

Bet google isn't happy with this idea and will definitely try to break it asap


Very cool. I wonder how difficult it would be present a real watchable video to the viewer. Albeit low quality, but embed the file in a steganographic method. I think a risk of this tech is that if it takes off, YT might easily adjust the algorithms to remove unwatchable videos. Perhaps leaving a watchable video could grant it more persistence than an obvious data stream.


Sure, but the more structure your video has to have, the harder it becomes to hide information stenographically within it. Your information density will become very low I think.


Are the premium files stored as 4K?


This would be a good way to backup your YouTube videos to YouTube while avoiding Content ID.


How will you prevent youtube from re-encoding the video and data getting thrashed?


Make the boxes bigger.


I was literally thinking of something like this a couple days ago. Good timing!


Could be a good and sneaky way to obfuscate encrypted message transmissions?


It's all fun and games until your files start getting DMCA takedowns.


Are there any examples? I'd love to see such a YouTube video... :p


How much kilobytes would be possible to store per minute video?


Can't you upload lossless captions to youtube?


I believe this is the backend for AWS Glacier


there was a story on HN a while ago in which someone stored unlimited data in Google Sheets!


Another "Harder Drive"!


Evil genius.


I also “invented this idea” from scratch in a series that exists solely in my mind where I abuse a variety of free services for unintended purposes.

I could seemingly never explain the concept to other developers in a meaningful way or cared myself to code these out.

Anyway my quick summary in this is just think of a dialup modem. You connect to a phone line and you get like a 56k connection. That sucks today, sure, but actually it’s kind of mind blowing for how data transfer speeds worked at the time.

You know how else you can send data via a phone line without a modem? Just literally call someone and speak the data over the phone. You could even speak in binary or base64 to transfer data. It’s slow, but it still “works,” assuming the receiving party can accurately record the information and hear you.

That seems to be what this main topic is. Using a fast medium (video player) to slowly send data over the connection, like physically speaking the contents of other data. But there could be some problems with this approach.

Mainly, YouTube will always recompress your video. For this method, that means your colors or other literal video data could be off. This limits the range of values you can use in an already limited “speaking” medium.

if this wasn’t the case, we would like to use a modem connection. Just literally send the data and pretend it’s a video. However, where I left off on this idea, we appear to be hard blocked due to that YouTube compression.

We can write data to whatever we want and label it any other file type. (As a side note, Videos also are containers like zip that could be abused to just hold other files)

But YouTube is an unknown wildcard that changes our compression and thus our data which seems to invalidate all of this.

If we somehow convert an exe to an avi, The YouTube compression seems to just hard block this from working like we want. If we didn’t have that barrier, I think we could otherwise just use essentially corrupted videos to become other file types if we can download the raw file directly.

(steganography is a potential work around I haven’t explored yet)

Without these, we’re left to just speak the data over a phone which compresses our voice quality and in theory could make some sounds hard to tell apart. This leaves us in the battle of what language is best to speak to avoid compression limiting our communication. Is English best? Or is Japanese? What about German? Which language is least likely to cause confusion when speaking but also is fast and expressive?

This translates into what’s the best compression method for text or otherwise pixels in a video where data doesn’t get lost due to compression? Is literal English characters best? What about base64? Or binary? What if we zip it first and then base64? What if we convert binary code into hex colors? Does that use less frames in a video? Will the video be able to clearly save all the hex values after YouTube compression?


so cool


This works on the same principle as the video backup system (VBS) which we used in the 1980's and the early 1990's on our Commodore Amigas: if I remember correctly, one three hour PAL/SECAM VHS tape had a capacity of 130 MB. The entire hardware fit into a DB 25 parallel port connector and was easily made by oneself with a soldering iron and a few cheap parts.

https://www.youtube.com/watch?v=VcBY6PMH0Kg

SGI IRIX also had something conceptually similar to this "YouTubeDrive" called HFS, the hierarchical filesystem, whose storage was backed by tape rather than disk, but to the OS it was just a regular filesystem like any other: applications like ls(1), cp(1), rm(1) or any other saw no difference, but the latency was high of course.


That's how digital audio was originally recorded to tape back in the 1970s and 80s: encode the data into a broadcast video signal and record it using a VCR.

In the age of $5000 10 MB hard drives, this was the only sensible way to work with the 600+ MB of data needed to master a compact disc.

That's also where the ubiquitous 44.1 kHz sample rate comes from. It was the fastest data rate could be reliably encoded into both NTSC and PAL broadcast signals. (For NTSC: 3 samples per scan line, 245 scan lines per frame, 60 frames per second = 44100 samples per second.)


130 MB for the whole tape is not a lot. It equals to a floppy disk throughput, which is probably not a coincidence. However, basic soldering implies that the rest of the system acts like a big software-defined DAC/ADC.

Dedicated controller could pack a lot more data, as in hobo tape storage system: https://en.wikipedia.org/wiki/ArVid


Dedicated controllers were absolutely out of the question because nobody could afford them, which is why Amigas were so popular: a fully multitasking, multimedia computer for 450 DM. That's 225 EUR! Somebody that cost sensitive won't even consider a dedicated controller; back then wasn't like it's today.

This was at a time when 3.5" floppy disks were expensive (and hard to come by), and hard drives were between 40 - 60 MB, so 130 MB was quite practical. The floppy drive in the Amiga read and wrote at 11 KB / s.

And yes, this was a DAC and an ADC in software, with added Reed-Solomon error correction encoding and CRC32. The goal was to be economical. The end price was everything; it had to be as cheap as possible.


"one three hour PAL/SECAM VHS tape had a capacity of 130 MB"

This reminds me of the Danmere Backer.

"The entire hardware fit into a DB 25 parallel port connector and was easily made by oneself with a soldering iron and a few cheap parts."

This reminds me of the DIY versions of the Covox Speech Thing: https://hackaday.com/2014/09/29/the-lpt-dac/


Imagine a free cloud storage, but you need to watch an ad every time you download a file.


I read that you did not download shady files from the interwebs when that was a thing sane people actually did?


Wasn't that basically megaupload its ilk


imagine not using an ad blocker


Fascinating


Not immediately obvious from the Readme, but does this rely on YT always saving a providing download of the original un-altered video file? If not, then it must be saving the data in a manner that is retrievable even after compression and re-encoding, which is very interesting.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: