This sounds interesting. I am in a field where a lot more focus is on visualizin...

gww · on Oct 16, 2016

There are so many arbitrary choices made when analyzing single cell RNA-seq data. There coverage cutoffs to decide when a gene is expressed, arbitrary QA points to decide when a cell is "good quality", the PC's chosen for t-SNE, the genes identified as more variable than estimated levels of technical noise, etc etc. is very frustrating. This leads to huge issues with reproducibility, almost every single paper uses their own in house custom analysis pipelines and they rarely make them open source.

daemonk · on Oct 16, 2016

The main reason is because no one wants to publish technical or "methods" type papers where they assess the technology. There is usually one or two initial big papers introducing the technology that makes a big splash. No one subsequently will want to assess it or improve much on it because it won't publish well and you likely will not get cited for it anyways.

lasso1 · on Oct 16, 2016

Yes i know the feeling.

Our field has some very arbitrary threshold for noise on single features, sounds like there is some slightly more principled strategy in single cell genomics?

daemonk · on Oct 16, 2016

The seurat R package (http://satijalab.org/seurat/) tries to give you more information on how to choose your PCs. But as with anything in biology, it becomes subjective at some point.

lasso1 · on Oct 16, 2016

Thanks!

I happen to have some super high-dimensional data (~100k-1m dimensions), which takes a huge amount of time to work with because i have to custom write everything, and i notice they claim all their underlying functions use sparse matrix representations. Have you tried it in a very high dimensional context?

daemonk · on Oct 16, 2016

With single cell data, gene expression data from each cell is considered a dimension. So you end up with something in the range of 20-30k genes and possibly thousands of cells (~25k x thousands matrix). I don't think the technology is at the scale of hundreds of thousands of cells yet. So I am not sure if this package will handle 100k-1m dimensions.