MonaTweeta II: Trying to encode images in Twitter's 140 character limit

joshu · on May 12, 2009

I love the 140 UTF-8 char limit.

Of course, SMS, which is where this limit is from, is 165 7-bit chars.

there · on May 12, 2009

bzip2/gzip-compressed tweets anyone?

silentbicycle · on May 12, 2009

Probably not, gzipped files have a header at the beginning.

  $ cat tweet  
  "If you want to get laid, go to college,
   but if you want an education, go to the library." -Frank Zappa
  $ wc -c tweet
       105 tweet
  $ gzip tweet
  $ ls -l tweet.gz   
  -rw-r--r--  1 scott  scott  109 May 11 22:12 tweet.gz
  $

Compressing very small files (or large files that are already compressed) can make them larger. There's a crossover point where you can pack more textual data, but it's probably a bit over 140 chars.

drp · on May 12, 2009

One could always view tinyurl as a type of lossless compression for essentially limitless amounts of data. In that sense you can store as much as you like in a single tweet.

JacobAldridge · on May 12, 2009

No, you can link to limitless amounts of data in a single tweet. Storing is very different.

derefr · on May 12, 2009

I've always thought the best solution to texting arbitrary data would be sending the MD5sum (or some other hash, and hopefully one which uses base 36 instead of base 16 for link brevity) on a link-forwarding service that sent you to the document on some server, somewhere. That is, something like: http://md5url.com/d41d8cd98f00b204e9800998ecf8427e would be a URL this redirector would give you, based on a hash it did of the content it retrieved from the link you fed it. If the link ever broke, you could just search for the particular hash and find the file that way, assuming that there are document/image/torrent hosting sites that allow you to search by the kind of hashes it uses.

silentbicycle · on May 12, 2009

Interesting idea. It's probably worth explicitly thinking of first it as a distributed hash table, rather than a link forwarder, though.

Incidentally, using google as a rainbow table sometimes works, though only for people who have really weak passwords and no salting.

silentbicycle · on May 13, 2009

Also, look at Venti (http://en.wikipedia.org/wiki/Venti) from plan9.

silentbicycle · on May 12, 2009

All that does is reference it by name. The data is stored externally.

There's a pretty approachable section on Huffman encoding in SICP, btw. Data compression algorithms can be really interesting.

joshu · on May 12, 2009

IIRC gzip doesn't make sense under 500 bytes

Confusion · on May 12, 2009

Better use Apache's deflate then (basically gzip minus header IIRC)