Computational feat speeds finding of genes to milliseconds instead of years

on April 8, 2010

[deleted]

carbocation · on April 8, 2010

It seems to me that the number of possible connections between genes should be (16000 genes * 16000 possible gene-gene interactions per gene) = 2.6 * 10^8. Why do you say 3 * 10^5?

Also, while a 14% FDR seems pretty good, it's just from one sample. I would like to have seen them repeat this procedure on (random) different pathways so we could see how variable the FDR might be.

Finally, I don't know how much this is truly enriched for B-cell genes as compared to a randomly seeded query. I'd like to see them compare their results to what they'd get if they had seeded their query with 2 random genes. Perhaps you could build a story about the association between a large fraction of all genes and B-cell development...

In short, this seems inadequately controlled.

randomwalker · on April 8, 2010

Here is the paper: http://www.pnas.org/content/107/13/5732.long

Tycho · on April 8, 2010

So in terms of curing cancer... where does this get us? (i'm not being sarcastic. it sounds very promising but I don't know enough on the topic to judge how significant it is)

thisisnotmyname · on April 8, 2010

First of all, don't ever trust a popular science writeup from the school of the corresponding author. It is almost certainly meaningless hype.

Secondly, PNAS is a good journal, but not a great one. He was almost certainly rejected from the top tier. (Nature, Science)

As someone who works in this field, finding genes that are similar in some way to two disease related genes is not at all novel. This is the goal of literally hundreds of computational methods. It sounds like what he did was to build a decision tree from a set of training data - hardly an earth-shattering application.

Edit: Wow, after fully reading the paper I am stunned how commonplace this analysis is. This exact approach has been taken for analyzing microarray data for the last decade. This does not warrant in any way the breathless writeup it receives in the original post.

Under what hypothesis would one expect nature to follow boolean rules? This approach ignores any subtle relationships or multifactorial causes of gene expression changes. The more I read the more I am convinced that this is utter garbage.

What really makes me mad about this is that increasingly the way to get ahead in science is to overstate your results and then have friends of the corresponding author "review" the manuscript. If you'll notice, this was submitted by Irving L Weissman who, according to his website is "Director, Institute of Stem Cell Biology and Regenerative Medicine, Stanford University School of Medicine". It is very odd that it wasn't submitted by either the corresponding, nor the lead author. It is very clear that this article did not receive the scientific scrutiny that it should have.

carbocation · on April 8, 2010

I agree that this paper is a run-of-the-mill systems biology paper. "Give me some genes, and I'll give you a network of genes that are associated via (insert functional method here)." Everyone does this. Unlike other methods that look at more than just expression... well, this just looks at expression.

A key question here is how many other pathways did they try this on before they got it to "work" on 10/14 (though really 10/60) B-cell-associated genes? I'm not asserting that they mined for the most favorable pathway, but I will say that this one example comes across as more of an anecdote than a proof of concept.

People are going to be using this tool perhaps hundreds of thousands of times. Don't show me one example where 10/14 genes "validated." Show me 20 examples where X out of N genes validate - if that's sufficiently high, I'll be much more interested.

Even the developmental focus of this new tool is fairly common practice by those in the field. Look at the Seidman lab or the Walsh lab at Boston Children's to see examples of other people thinking deeply about how developmental biology ties back into adult pathology.

aposteriori · on April 8, 2010

Regarding who submitted it, PNAS works differently from other journals. Weissman is a member of the National Academy of Sciences, and as such he gets to contribute a paper X times a year. It still gets peer reviewed. The question of how well things are reviewed applies to other journals as well.

rflrob · on April 8, 2010

There's also the process of "communicating" papers. In those, a NAS member can essentially vouch for other authors twice a year, and provide outside reviewers of their choice. This track is ending as an option this July, however.

selflessGene · on April 8, 2010

Could be fairly significant. One of the problems with biology is that we have a lot of data in terms of experimental tests, but we don't know all the data mean.

Only a relatively small amount of the 30,000 human genes are clearly understood. And when you THINK you understand what it does, these genes can sometimes surprise by having other unexpected effects.

What his method seems to do, is to help map out what genes are related to each other. There are cases in cancer for example where we may know that ONE gene gets hyper-activated when a type of cancer is around. If you can correlate the activity of this active gene with other previously unknown gene, you get a better understanding of what causes the disease. If you know what causes the disease, you can use a variety of techniques (drugs, designed proteins, or RNAi) to inhibit the Gene's effects and stop the disease.

carbocation · on April 8, 2010

A small quibble: there are more like ~18,000 genes in the human genome.

kvs · on April 8, 2010

True, we need a Library of Congress unit for computational biology.

Estragon · on April 8, 2010

With this kind of thing, the devil is always in the details. He's looking for genes whose expression levels correlate with those of genes known to be involved in the process of interest. This is not a new idea. The trouble is that spurious correlations can arise in all kinds of surprising ways, and swamp the interesting ones.

loboman · on April 8, 2010

What's the computational technique used for this?

itjitj · on April 8, 2010

Call the oracle function.

seshagiric · on April 8, 2010

I started off thinking it was another gpgpu application but the theory of boolean implications is something new to learn. thanks for the article.

rmorrison · on April 8, 2010

Before clicking on this, I jokingly thought to myself "let me guess, this is yet another NoSQL is better than ___ article"