Man places his genome in public domain, on Github

sandipc · on Feb 13, 2011

Technically this isn't his entire genome - just SNPs. (http://en.wikipedia.org/wiki/Single-nucleotide_polymorphism)

One major problem with developing a "Google for the human genome" is that we don't actually understand how most of the genes (coding) and noncoding regions in our DNA actually work or interact with each other... except at a very basic level for a very limited set of genes.

There are genome browsers out there already that came out of the human genome project and work in that direction. One example: http://huref.jcvi.org/

cjbprime · on Feb 13, 2011

Yeah, I was disappointed that it's not a full genome too.

It's by no means all of his SNPs, either -- each person has around 3 million actual SNPs (variations from the reference genome), and 23andme just chooses a million sites that could be the location of a SNP to look at, most of which won't actually be points of variation for most people.

So, 23andme is only looking for common SNPs you might have. If you have a rare SNP you're interested in, or if you're a researcher trying to analyze the effects of an uncommon SNP, you're out of luck with 23andme data.

ryanlower · on Feb 13, 2011

I agree, but still applaud Manu's release of something many consider so private.

Even though this isn't a genome sequence, there is potential for interesting analyses if lots of people release their 23andme data. (I believe 23andme use the same SNPs for every user).

jsarch · on Feb 13, 2011

I'll go one step further and emphatically state "this isn't his entire genome."

Additionally, this data could be radically improved if his phenotype was also included. Just because we know that the marker says "AA" without the correlating information of "blond hair" doesn't tell us whether "AA" is important for hair color.

What you can mine with this type of information is the correlation between the markers themselves: if rs1001 = AA, then rs2002 = {GG(85%), TT(10%), GT(5%)}. This is where community software could definitely benefit from more data.

If you are interested in helping create a "Google for DNA", drop us a line at SeqCentral.com

sskates · on Feb 13, 2011

A lot of the “aren’t you afraid that somebody is going to use that against you?" remarks are reminiscent of the early days of the internet, when people were afraid to put pictures of themselves or their contact info online. There's now a $50 billion dollar company dedicated to doing just that.

bbgm · on Feb 13, 2011

This thread is probably a good place to point to perhaps the best resource on the web for personal genomics today, the Genomes Unzipped blog http://www.genomesunzipped.org/

The authors have not only released their genetic information into the public domain (http://www.genomesunzipped.org/data), but also developed a custom genome browser (http://www.genomesunzipped.org/jbrowse), have an API, and a github repo for code they will release (https://github.com/genomesunzipped/genomesunzipped).

These are early days in personal genomics, so it's great to see others jumping in. Hopefully they all do so with some awareness, and folks like Genomes Unzipped do a great job in creating that awareness, and never forgetting that there is difficult, evolving science behind our understanding.

bennylope · on Feb 13, 2011

This is a question asked out of ignorant curiousity: what, if any, are the intellectual property implications of releasing genomic information into the public domain? Does doing so preclude the patenting yet un-patented genetic sequences published in that genome?

bbgm · on Feb 13, 2011

A number of human genomes are public domain via the human genome project and 1000 genomes project, etc. The part that needs to be resolved is the bit about genes and disease implications. A recent case overturned Myriad's patents on BRCA1 and BRCA2 [1]. On the other hand, I believe it's still OK to patent signatures corresponding to a diagnostic etc (not 100% sure).

1. http://www.genomicslawreport.com/index.php/2010/03/30/pigs-f...

paradoja · on Feb 13, 2011

The curious thing is that he uses Github... does he expect forks? Or patches?

nixme · on Feb 13, 2011

TeMPOraL already forked and made improvements: https://github.com/TeMPOraL/dna/commits/master/

"Eyelids now close in proper way. Fixes issue #42."

TeMPOraL · on Feb 13, 2011

Actually, more action is going on scientist-mode-__experimental__ branch. Code is little rusty, but workable with and possibilities of optimizations are great.

EDIT:

BTW. Any ideas how to get continous integration working with it?

nixme · on Feb 13, 2011

Are you picking random SNPs or using something like SNPedia?

TeMPOraL · on Feb 13, 2011

Programmer's intuition ;).

I really like your question. Personally I value stories, creations, etc. in which authors make multiple layers of "jokes" or references. Unfortunately, I'm not good enough to know what letters I'm actually changing :(.

cariaso · on Feb 13, 2011

I am. https://github.com/cariaso/dna is a fork for real commits based on SNPedia.

joshu · on Feb 14, 2011

Where are the unit tests?

arfrank · on Feb 13, 2011

People have already forked it, obviously as a joke: https://github.com/TeMPOraL/dna/network

aphyr · on Feb 13, 2011

This brings a whole new meaning to "fork me on github".

wybo · on Feb 13, 2011

All we need now is a free compiler, to turn it into life-code that runs on the Universal machine...

Until that time agent-based modeling is the best we have :)

marxidad · on Feb 13, 2011

You can buy a DNA synthesizer on eBay: http://shop.ebay.com/i.html?_nkw=dna+synthesizer

pjscott · on Feb 13, 2011

Good luck finding an embryogenesis machine on eBay.

borism · on Feb 13, 2011

there are plenty on match.com etc, though getting any of those will be out of reach for most guys with DNA synthesizers :)

sliverstorm · on Feb 13, 2011

If you don't think about it too hard, the idea of a life-compiler sure does sound kind of godlike.

abhikshah · on Feb 13, 2011

The Personal Genome Project [personalgenomes.org] is aiming to recruit 100,000 people to publicly release their DNA sequence and medical data. The website currently has phenotype and medical history data and genotyping data for the first ten participants who are all well-known scientist.

pella · on Feb 13, 2011

more genomes:

"These are the 57 public genomes. They are from real people who've chosen to share their data to help all of us learn more about our genomes."

http://www.snpedia.com/index.php/Genomes