Hacker News new | past | comments | ask | show | jobs | submit login
Hong Kong Team Stores 90GB of Data In 1g of Bacteria (igem.org)
83 points by jmg on Nov 26, 2010 | hide | past | favorite | 28 comments



I don't know enough about biology to comment on their storage-density claims, but as to the encryption, I'm getting a strong whiff of snake oil from "only the client would know the function to derive the checksum". If you want to convince me that bioinformatics has something to offer cryptology, then you need to explain to me what property wetware has that silicon doesn't which causes it to be unlike a classical Turing machine.


The mass of a single base pair is about 1.08E-21 grams. That's 1.85E10^21 bits[1] of information in a single gram of purified DNA, about a forth of a zettabyte. So, if they're using DNA as the storage mechanism (the slideshow linked in the article indicates that they are), then 90GB is pretty insignificant. Sure, the bacteria will all be pretty filled with DNA, but it's not especially outlandish. Throwing compression at the information (as the slideshow discusses) makes it even less outlandish.

[1] Each base pair encodes two bits, as DNA and RNA is basically a base-four sequence (when we're thinking about it as data storage).


If we were to use yeast, we could additionally include methylation. We don't have to stop there: we could encode information in histone acetylation states, transcription levels, etc. to increase this density even further. Granted, how practical would that be?

Using the regulation machinery might be an interesting way to decrypt messages, if it were sufficiently complex a signalling pathway...


Yeast don't methylate.

Histone acetylation sites? Epigenetic information is too transient and is not necessarily passed down in a 1:1 fashion, which is really bad if you're trying to store data reliably.


Actually, knowing a 'good' (cryptographically) checksum function is equivalent to having a good encryption scheme. I believe it was Rivest who showed this, sometime in the late 1990s.

He suggested, for instance, blasting out a sequence of bits; if a block checksums to a certain number or matches a function, it has 'your' bits. An observer of the stream would see a bunch of random data. You would see: garbage-garbage-bits-garbage-garbage-garbage-garbage-bits-bits, etc.

This principle could work well in the system they describe.


You can do better than that. Run the hash in HMAC mode, hash successive counter values to get a pseudorandom stream of bits. Xor your plaintext with the stream to get the ciphertext.

But how does biology contribute to any of this? At best, they've taken a known cryptographic algorithm and figured out how to implement it with the computation done in wetware. At worst, and I suspect the worst, they've simply observed that some parameters of their encoding scheme are tunable, and claiming that you have a secure cryptosystem if you keep those parameters secret.


Don't bacteria die, reproduce, and mutate? How is it possible to store anything long term in room temperature conditions?


it's possible, they have pretty spectacular error correction mechanisms. Just don't blast it with UV.

There are two problems with bacterial data storage. The first is information retrieval. Running a sequencer is no fun.

The second problem is genetic recombination, which is what they are using to 'achieve encryption'.

http://en.wikipedia.org/wiki/Genetic_recombination

Of course, there's going to be some toxic DNA sitting in the sequences they are making, and it will be to the advantage of the bacteria to spit it out, they will find a way to do it with recombinases, even the ones with the most pernicious recombinase (recA) knocked out.


Everything degrades but I am particularly concerned re. the rate of decay of anything organic. What's the MTBF?


You sure MTBF is still the appropriate term? Perhaps it's time to coin a new one. How about MTBE- MTB Evolution.


Or at least Mean Time Before Mutation (MTBM). I'm curious to see the checksum protein they deploy.


Also interesting: they contributed their technique to the BioBricks Foundation: http://bbf.openwetware.org

Very exciting things coming as we get the protein expression laws down well. I can't wait to use http://mrgene.com to sequence DNA and a BioBrick Assembly kit to put it all together.

http://www.neb.com/nebecomm/products/productE0546.asp


Biobricks are going to die. Restrtiction digest/ligation is too much of a pain in the butt. I've been in professional labs where it takes upwards of 8 tries to get ligation products in. People are going to move to Gibson Assembly. If you're interested in DIY bio, Gibson Assembly is the way to go.

Obligatory awful youtube video that does a bad job of explaining Gibson Assembly:

http://www.youtube.com/watch?v=WCWjJFU1be8


Digest & ligate isn't even the worst problem. Back in the day (2006) the quality control and standardization of bricks was dreadful, I haven't seen evidence that it's improved substantially since then. Polymerases per second (POPs), the measure of brick efficiency, seems like a bit of a pipe dream -- I'm yet to be convinced that it can survive brick composition.

Did you do iGEM? Back in the day I started the iGEM team at Brown. We couldn't scum a PCR machine off anyone so we spent the summer doing minipreps. A few weeks before the jamboree, we refined our models a bit and showed that the parameter space in which our project would function was so narrow that it would probably never work. Oops!


never did iGEM. I think it started when I was in grad school. I'm building a sub $500 PCR machine now, might start a company to sell them, bio kits, etc. I'm also thinking about making ethidium-free paradigm (using fluorescent primer adapters instead) and eliminating the use of E coli so that DIY bio can really pick up.


For those who would like to watch a great talk, Learning to Program DNA, by Dr. Drew Endy (one of the main guys behind BioBricks)[1] at UW: http://uwtv.org/programs/displayevent.aspx?rid=25856

[1]http://en.wikipedia.org/wiki/Drew_Endy


My anti-virus software just deleted all my data!


You mean your antibacterial salve did!

Don't let people with a bandaid on their hand work on your computer, they might just have Neosporin in there!


Hehe, sorry, that was silly of me.


I have to admit that I think that the grand prize winner Slovenia brought up in my perspective a really simple but genious idea to the competition. Which might be a lot more worth discussing :)


Wonder if they can replicate data with binary fission (barring mutation.)


Finally, storage that increases in size faster than your data.


Cloning data takes on a whole new meaning.


hopefully no one leaves a cheese laying around to infect the hard drives!


Don't you mean that you hope your hard drive doesn't infect your cheese?


So, sick people will have more storage space?


The human body has more bacteria cells than somatic cells by about 10 to 1.


Especially your guts.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: