Hacker News new | past | comments | ask | show | jobs | submit login
Itty Bitty: Sites contained within their own links (bitty.site)
420 points by Pulcinella on July 4, 2018 | hide | past | favorite | 108 comments




It's like an image containing its own SHA2 hash. How is it possible?


Clever use of Javascript.

  <a><script src="https://itty.bitty.site/lzma/lzma_worker-min.js"></script> 
 <script>a=document.querySelector('a');LZMA.compress(a.outerHTML,9,function(r){a.innerText=a.href='https://itty.bitty.site/#/?'+btoa(String.fromCharCode.apply(null,new Uint8Array(r)))})</script></a>


This makes the whole "Sites contained within their own links" untrue. You need a server to host the javascript.


A quine is more similar to (but not quite the same as) an uncompressed image containing a QR code with a compressed copy of the same image, which has its own challenges; but like, a cryptographic hash function (such as SHA2) is designed explicitly to make this hard... if you had chosen a simple checksum instead it would be trivial to create an image containing the text of a checksum of the image: a compression function (particularly a tiny custom one designed for this one time use, that removes barely any entropy at all) is predictable enough to be manipulated. (edit: I hadn't looked at this one, though; but now that I see its code, apparently it is just cheating by accessing its own code via the DOM; it doesn't need to do that, and it doesn't even really feel like a "quine" to me given this implementation.)


You can create videos on YouTube too that contain it's own URL.


For anyone wondering how is this possible I've made a quick research and the method is in the description part of this youtube video: https://www.youtube.com/watch?v=ufq2Eb78kSU

In short, Youtube API's resume upload function can be abused to achieve this.


An example of a functional quine: https://tiddlywiki.com/#Quine


nice! how did you do it?


As others have pointed out, it's not quite a quine and just some silly JavaScript that reads out its source from the DOM and feeds it back into a copy of Itty Bitty's URL generation algorithm.


I discovered data URIs in 2014, when trying to make an app that doesn't need to be distributed through an app store. They're fun.

Nihilogic made a 14KB Mario clone that can be fit into a data URI.

https://web.archive.org/web/20090310220414/http://blog.nihil...

I also know about the way Cemetech's jsTified TI-83 emulator parses pictures to load a calculator ROM file (steganography).

https://www.cemetech.net/projects/jstified/

I put those two together, and imagined a new form of app distribution, which I call "Fondant".

1. Click a data URI bookmarklet. That is a bootloader, with the parser code for a picture. It presents a button to upload a picture.

2. Select a picture from the camera roll, which is the app.

3. To save, generate a picture and save that to the camera roll.

This could be used to transfer other file types (e.g. a music player with a mixtape) over universally-supported photo sharing platforms.

Unfortunately, the iPhone recompresses JPGs when clicking Save to Camera Roll, and the lost quality means the photo isn't parseable the next time. I did some experiments, and turned a blue/green checkerboard pattern into grey after 50 repeated Saves.

I still think there's potential in the idea; if someone wants to work together on it then please get in touch. I'll also write up my experiments if you comment and ask for it.


The PICO-8 "fantasy console" is a playful old-school computer style emulator that passes it's programs through images. You might even look into it's algorithms [1]. I think it'd be perfect.

[1]: http://pico-8.wikia.com/wiki/P8PNGFileFormat


Also the Spore Creature Creator from 2008 could export and import your creatures as PNGs. It used the least-significant bit in each color channel.



I just get a page with the URL shown.


your first clue


OK so you keep following the links (by copying and pasting into your address bar) and after about four links it says "oh hi ".

Explain yourself! How did you do this?!

Impressive!


Simple!

0. Write oh hi :D, get link0

1. Paste link0, get link1

2. Paste link1, get link2 ...

Encoding goes E(T), E(E(T)), etc. let K be the per-char encoding overhead, so E(T) = KT: complexity is O(K^N * T) where N is number of levels and T is length of text at level 0!


D'oh! You're right! That IS simple!


It seems that due to the extra text from https://itty.bitty.site each iteration increases the size of the next link. I wonder how the urls are generated and if this is a linear or exponential increase.


My guess is 1st-degree polynomial. There’s a constant addition with each iteration, plus the contents appear to be base64(ish)-encoded into the next URL, which multiplies the length by a constant.


It's original_size * growth_factor ^ iterations

I would say it's exponential in iterations. Growth factor is probably <2


Hi to you too :-)


Very clever implementation of this technique


oh hi

(I will say I'm disappointed that it wasn't a technique to make circular URLs.)


That's what it started out as, I was going to look at making a quine but then quickly gave up as I am tired


I think a quine would be impossible since there's no possibility of indirection or data compression.


hoi m8 nicely done o7


This is neat!

I did something similar with https://tinysite.adamdrake.com where I have a website that fits into a single ipv4 datagram (at least it did when I was using my own HTTP server).

More details here: https://adamdrake.com/the-biggest-smallest-website.html


Someone has to combat our wasteful packet usage!


It's 9XX bytes, is the extra from TLS?


When I moved it to AWS Lambda there was some size introduced from TLS and also from some headers that AWS API Gateway adds which seem, unfortunately, impossible to easily drop (e.g., if you make the headers empty in the response, APIGW will copy them into another header and still add its own).

I might pull that service off of AWS APIGW/Lambda, in which case the size as an artifact of AGIGW would disappear again.


This is neat!

My second reaction (after "awesome!") was that I'd love for this to support some kind of optional identifier+hmac functionality. That's not necessary for when you are fully in control of where people get the links of course, but as soon as someone starts sharing links to "your" content they can by definition modify it as well. It'd be cool if I could share a page that says "by chias" and that can prove it.

Then again, that would require itty-bitty to know that I exist, store my key somewhere, have some kind of auth system for me to log in and set my key... and now we're talking usernames, passwords, recovery email addresses, etc. etc., and I love the concept of services offered to people that explicitly don't do any of that. So third reaction is that it's great as-is :)


you could just sign your content yourself using a publicly known key (e.g. keybase.io)

Support for keys as a separate content "field" could be added to itty bitty i suppose?


So from Content Lengths [0]... You've tested on Chrome, and IE, but not Firefox?

[0] https://itty.bitty.site/#Content_Lengths/XQAAAAIXAwAAAAAAAAA...


Firefox's is very useful for testing image data URIs. You can paste the whole thing into the address bar and it will render the image.


I strongly suggest to not use this. Instead, create URIs that contain arbitrary content with the data URI scheme: https://en.wikipedia.org/wiki/Data_URI_scheme

The data URI scheme is standard and widely supported, does not rely on the host bitty.site being reachable and does not need JavaScript. One can even create data URIs with a small shell script that is given a filename argument:

  #!/bin/sh -eu
  printf 'data:%s;base64,%s' "$(file -bi "$1"|tr -d ' ')" "$(base64 -w 0 "$1")"


Try this: CTRL+A CTRL+C on a simple-ish site, then paste on itty bitty. Works surprisingly well - with the exception of images not rendering.

Super quick static mirroring. I like.

Edit: tabular format is preserved as well - copy paste from Excel :ok_hand:

Edit: Bookmarked! I'm going to use this for sure.

Idea: A javascript bookmarklet of some sort may be nice here. Maybe "create itty.bitty site from current selection" or "... from entire page"?





Ah, it's by Nicholas Jitkoff, of mac Quicksilver fame. More elegant design.

Edit: Oh, a bit more googling and he was lead designer for Material Design.



Why


If you're using this in your site as it stands, you are opening up XSS attacks as it does not appear to sanitise user input.


Only if you do something silly like serve cookies on that domain


I think this is ignoring the content of his warning and is a tautology.

"it's only opens up an attack if you allow the attack vector"


Another service that does the same thing: http://urlhosted.graphicore.de/

From http://urlhosted.graphicore.de/about.html: "urlHosted is an experimental web app that misuses the part after the "#" of a URL to store and read data."


Great work and congrats on making it to the top of HN.

I did something similar awhile back...I called it inception host :)

https://github.com/jrhea/inception/blob/master/readme.md


Data URL's do this without relying on the server.

E.g. this one cribbed from StackOverflow:

data:text/plain;base64,SGVsbG8sIHdvcmxkISBJJ20gZG93bmxvYWRlZCB2aWEgImRhdGE6dGV4dC9wbGFpbjsuLi4iIFVSTCB1c2luZyA8YSBkb3dubG9hZD0iZmlsZV9uYW1lIi4uLj4uDQpNeSBiaXJ0aHBsYWNlOiBodHRwOi8vc3RhY2tvdmVyZmxvdy5jb20vcXVlc3Rpb25zLzY0Njg1MTcvDQoNCk1vcmUgYWJvdXQ6DQpodHRwOi8vd3d3LnczLm9yZy9UUi9odG1sL2xpbmtzLmh0bWwjYXR0ci1oeXBlcmxpbmstZG93bmxvYWQNCmh0dHA6Ly93d3cudzMub3JnL1RSL2h0bWwvbGlua3MuaHRtbCNkb3dubG9hZGluZy1yZXNvdXJjZXMNCg0KQnJvd3NlciBzdXBwb3J0OiBodHRwOi8vY2FuaXVzZS5jb20vZG93bmxvYWQ=


I wonder how large a website using this scheme can practically be to work on most browsers. I know of at least one browser that limited URLs to 100 characters, but it's almost 30 years old…


While most sites/apps support about 2000 bytes (2KB), some can handle more... Twitter and Slack, for example, allow 4000 bytes in a link.

4000 bytes is enough to apparently encode the full text of Poe's The Raven.

https://twitter.com/edgar_the_poe/status/1003524516440563712


The HN frontpage bitty that I posted in another comment is around 8K. [1]

I also did a reddit one that works on my browsers[2] (copy/paste URL into new tab) at 26K (!). Which is apparently too much for HN. (Where's my free 5GB of storage for signing up?!)

[1] https://news.ycombinator.com/item?id=17460932 [2] latest chrome, firefox on ubuntu


It's now hosted on tinyurl: https://tinyurl.com/yafprdyo


Don't use this for anything where you don't want cross-site-scripting vulnerabilities...


What would such a cross-site scripting vuln do? There isn't anything to steal.

Moreover you wouldn't use this for anything where you aren't in control of where people get the links from, because as soon as someone else starts sharing it they can of course edit it too.


The trustworthiness of the domain name would effectively be stolen.


I disagree. It's more similar to how you can "inject" your scripts into fiddle.jshell.net (via JSFiddle), googleusercontent.com (via Google Translate), etc.

Have a look at https://fiddle.jshell.net/pvcL4mjh/1/show/light/

Would you call that XSS / did I just steal JSFiddle's trustworthiness?


That's a fair point.


Hypothetical:

Might a bad actor user something like this, combined with a homograph domain, to conceal malicious content in the URL and prevent a crawler discovering the malicious content (ignoring the fact that the homograph might be detected/redflagged on its own).

(use case might be a homograph phishing site, with a fakelogin and the target for the captured input being obfuscated into the URL)

---- Note: Homograph effectiveness depends on the browser, which you'd hope all be improving detection over time- https://dev.to/loganmeetsworld/homographs-attack--5a1p


I do something similar in a bookmarklet to give myself a nice looking notepad. It's essentially a padded content editable div, with a prepopulated H1. This lets me print it out or save it as a document when needed. Which is never.


Oh yeah!

data:text/html,<html contenteditable></html>

That's one of my favorite "stupid HTML tricks". I wonder who first figured it out? I can't find the original blog post (if it even was) and now it's a known thing:

https://news.ycombinator.com/item?id=6005295

https://www.simonewebdesign.it/how-to-make-browser-editor-wi...

etc...


It's a legit pattern. Compression of binary data. Encoding to base64 string. And packing into data uri. Golang out of the box has packages for handling the entire pipeline.

You get a small performance hit on page load from decoding. But its still performant and bandwidth-saving. For network constrained users this is a solution. Also consider the scenario where you are bootstrapping. Serving static content from a single server instance or S3. But still want to target a global audience ;)



So if one must think about it in terms of client/server (my mind works like this), since the website data is contained in the URI itself, then the server/site where you clicked on the URL from, can perhaps we thought of as the "web server" per se. Just a way for me to think about it. This is pretty crazy though, neat stuff.


The link says

> nothing is sent to–or stored on–this server

but isn't it the case that the entire page is sent to bitty.site as part of the HTTP request?


No, the page content is never sent to the server. all the server does is serve a static HTML page with some javascript. The javascript reads the compressed encoded data in the URL (the fragment which browsers never send to the server during a request) and then it decodes the data and writes the data to the current DOM.

All content data is stored in the link itself and never seen by the itty.bitty server.


most http clients (browsers included) will omit url data after the # character, eg:

`https://itty.bitty.site/#About/XQAAAAKrCQAAA...`

only sends

`https://itty.bitty.site/`

to the server.


The location.hash (#About...) is not sent to the server, which is where the content is rendered from.


The de-facto limit for the number of URL characters is around 2000 but it greatly varies per browser. See https://stackoverflow.com/a/417184/487771


Cool hack aside, can anybody think of an actual use case for this?


My mind immediately went to stuff like tickets. Can't say how many times I've gotten stuck having my ticket checked at a concert or on a train simply because there's no connectivity and the page is no longer in cache on my phone. If all the data is in the URI then it could work as an option for entirely static data. I'd say this is useful for almost anything that fits in a one page PDF.


How does that solve the no connectivity problem? You’d still need the site (with rendering scripts) cached.


If you want the whole UI around rendering the content, sure, but you don't need that – it's just an iframe that sets a `data:` URI anyway. The link can just as well be the data URI itself. It'd be a very long link for any non trivial content, but it works just fine without any connectivity.

Seeing as tickets are typically sent in HTML emails anyway the scary looking URI can just be hidden behind a nice anchor tag or something anyway. When the recipient clicks the link, the page will open in a browser and render just fine without any connectivity. Graphics can be embedded in the same way, using `data:` URIs as image tag sources, or in embedded css.

Here's an example, turn off your connection and try this in your browser's address bar:

  data:text/html;charset=utf-8;base64,PG1ldGEgY2hhcnNldD0idXRmLTgiPjxtZXRhIG5hbWU9InZpZXdwb3J0IiBjb250ZW50PSJ3aWR0aD1kZXZpY2Utd2lkdGgiPjxiYXNlIHRhcmdldD0iX3RvcCI+PHN0eWxlIHR5cGU9InRleHQvY3NzIj5ib2R5e21hcmdpbjowIGF1dG87cGFkZGluZzoxMnZtaW4gMTB2bWluO21heC13aWR0aDozNWVtO2xpbmUtaGVpZ2h0OjEuNWVtO2ZvbnQtZmFtaWx5OiAtYXBwbGUtc3lzdGVtLEJsaW5rTWFjU3lzdGVtRm9udCxzYW5zLXNlcmlmO3dvcmQtd3JhcDogYnJlYWstd29yZDt9PC9zdHlsZT4gSGVsbG8sIEx4ciBmcm9tIEhOIQ==
EDIT: Created another, more complex, example but it's unfortunately too long for HN comments. You can see it here:

https://gist.github.com/mstade/597f7da82841b1c27c63c8b383538...

Just copy and paste the raw contents of that file into the address bar and it should render the page.


Browsers have offline capabilities that are currently massively underutilized. You can store the LZMA scripts in IndexDB perhaps?


Unfortunately that comes with a bootstrapping problem: what if the first time you hit the link is when you aren't connected? If everything is embedded in the data URI and the link is in fact the data itself then it doesn't matter.


I think that's an acceptable requirement. After all, you wouldn't expect an app to work without downloading it. If instead of using lzma, gzip were used, I bet you'd be able to include some code that would unzip itself using the browser's unzip capability. I am not sure how much code that would take though.


Right, for an app that makes sense but the use case I mentioned was really just pure data. A ticket is rarely more than a qr code or some other identifier, whose veracity is then checked t the point of reading. In an arena full of people it’s not uncommon for cell towers to get overloaded, or you may be roaming if overseas and don’t want to waste data, or you’re on a moving train with no on board WiFi or if it has one it might not work right. In those cases (and it seems to happen to me a lot) I would’ve preferred that the ticket I received in my email was viewable offline, even if I’d never seen it before. With a data URI, that’s possible. It’s not an entirely contrived use case I don’t think.


Could you not register a URI format with the mobile OS to open an IttyBitty reader app that contains the rendering scripts?

Sure, initially that means only users who've noticed it would be able to benefit, but the app would be generic, not vendor-specific, so over time, it could become a standard & make it into the OS, at which point it becomes properly useful.

Edit: (sorry, I appear to have been rate-limited)

In reply to @mstated below:

---

Only really that itty bitty's compressed in the URL, not in the transport (& could be extended to add encryption).

It's absolutely true that, that may be tackling the wrong area - if data: were extended with compression/encryption options (not that itty bitty has encryption, but it's an option with the JS layer there) then the itty bitty decoder has nothing to do.

BUT, experiments like itty bitty might spur on ideas & standards changes for things like data: which to me makes it worthwhile.

(Self-contained, standardised, compressed [, encryptable?] content objects [in this case URL] renderable via ubiquitous web technologies feels like a win, itty bitty or not)

---

Also, if links are shared on html-based platforms, some block data: url's in links. Not sure of the justification, but reddit doesn't seem to render markdown formatted URLs with data: on them, as links. Not sure how to tackle that one - itty bitty sidesteps the restriction.


Why? The `data:` schema is already there and is well supported across platforms. All itty bitty does is take the base64 encoded fragment part of the URL and put it in an iframe, wrapping it in the `data:` scheme. No need to register anything new.


Only really that itty bitty's compressed in the URL, not in the transport (& could be extended to add encryption).

It's absolutely true that, that may be tackling the wrong area - if data: were extended with compression/encryption options (not that itty bitty has encryption, but it's an option with the JS layer there) then the itty bitty decoder has nothing to do.

BUT, experiments like itty bitty might spur on ideas & standards changes for things like data: which to me makes it worthwhile.

(Self-contained, standardised, compressed [, encryptable?] content objects [in this case URL] renderable via ubiquitous web technologies feels like a win, itty bitty or not)

---

Also, if links are shared on html-based platforms, some block data: url's in links. Not sure of the justification, but reddit doesn't seem to render markdown formatted URLs with data: on them, as links. Not sure how to tackle that one - itty bitty sidesteps the restriction.


It's potentially a more reliable way of saving out a web page...

Or in a 2-step process, do that, then feed it to a PDF converter for sites that don't play well with printing.


Copy/paste content to my coworkers, like paste-bin. but without server-side storing on the paste site. It's amazing.


So rather than paste the text.. Paste the link, which they then open, to fetch the render-Middleware from this site, to render the text in the browser? Rather than just allowing the message app render the text in-line or as an attachement?


Hmmm.

Could you not put an encryption layer into a derrivative of this for some generic PGP end to end encryption, without involving/trusting a centralised provider..?


You'd need to trust the (de) cryption code... So probably better off with cutting and pasting with gpg --armor...


What is required to run this on own web server? Is there something else to it, such as node?


No, you just host a local copy of the JavaScript file that the page calls to decode the URL.


This seems like a website version of the Library of Babel


github page is missing instructions to inline the whole html/js/css package in a single file.

Is it common knowledge?


How does this compare with data URLs?


It is smaller then data urls becausde the base HTML and javascript is provided by the itty.bitty server and not part of the data url. Only the inner content is encoded. So it is much smaller then a data url but requires a server to serve the initial decoding HTML page.


The base HTML/javascript coming from the itty.bitty server does not make it smaller. As a matter of fact, it appears that the javascript provided by itty.bitty simply decompresses the URL fragment and then converts it into a data-URI and passes that to an iframe.

So it is smaller than a data URI - but only because it is compressed.


Finally, a use case for code golf!



This is old news from 2011. https://news.ycombinator.com/item?id=2464213 It was done better by David Chambers https://github.com/hashify/hashify.me

And you can use it here: http://hashify.me/IyBUaXRsZQ==


That's cool, I hadn't come across hashify before, but I'm not sure about "better" overall - depends how you're using it...

...itsy bitsy site's demo page gives a couple of nice additions

* I use markdown a Lot, but the inclusion of codepen.io makes it simple to compose richer docs, including svg's in a page etc. (I haven't read the hashify code - I'm assuming it does support full HTML, not just MD, but the demo doesn't make that clear)

* QR code link - is just using a 3rd party zXing tool, but, nice touch for the demo

* Compression - hashify URLs seem to be longer than the content (that might mean lower effective limits on content lent), relying on 3rd party lookups for shortening, meaning accessingv the content offline looks harder to achieve (afaict itty bitty just needs the JS to decode the url)


Nice one.

https://boutell.com/newfaq/misc/urllength.html

JS-Fat-site shaming by link.


huh. I know a lot of early buffer overflows came via the URL back in the early 2000s and early IE days. I can understand why there shouldn't be a theoretical limit today, at least until your machine would run out of resources. 32GB url anyone?



It does require Javascript to render.


It does the decompressing and everything on client side. How would you do that without?


css


A decompression algorithm in CSS? What?


This seems like an exceptionally bad idea.

Even if if just used it for my blog, an attacker can craft a URL which renders obscene content ("Hitler did nothing wrong"), then scatter that link everywhere on the web and get it crawled. Eventually it shows up in searches, but it's definitely "on my domain."

Furthermore, it's like running a public anonymous FTP, it'll just get used as a malware hosting tool as soon as it's found.


I don't follow. I have a Twitter profile. If someone else shares racist stuff on their Twitter profile, is it "on my domain?"

In any case, I'm skeptical that you would ever use this for a blog because the links are immutable. You'd have to either change your homepage every time you updated your blog, or you'd have to dynamically render your homepage in Javascript by querying another server - and if you're gonna do that, what's the point? Just host your own stuff then.

What this gives you is a way to share static content where your entire app/page can be distributed in the url with extremely minimal server interaction. There's no risk of malware or vandalism because there's nothing actually being hosted - the URL is the data. When you share it with someone you know exactly what they're going to see when they click on it.

I can think of a lot of uses I might have for this; even just minor things like sharing quick lists with family members. Updating the URL whenever I changed something would be annoying, but the benefits might outweigh the drawbacks.

Edit: Are you talking about domain forwarding? I could see how that would be problematic. It didn't occur me because I'm not sure what the advantage would be. I don't think I've ever been in a situation where it was too cumbersome for me to host static content somewhere, but not too cumbersome for me to buy a domain and edit its config whenever I published content.


Will crawlers do the js DOM rendering before the page is indexed?


I think that Google does actually wait for Javascript to finish running. I don't think that it indexes URLs with their query params though. Maybe I'm wrong about that.

Not sure why it would be a problem though even if it did - if you've ever hosted content on Instagram, Reddit, Facebook, Twitter, or, heck, even HackerNews, it seems like you'd have the exact same threat model.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: