lasso1's comments

lasso1 · on July 19, 2018

Along the same lines, convergence is mentioned as a candidate for interpretability but convexity is not mentioned even once.

danielmorozoff · on July 19, 2018

This is a very interesting notion, could you elaborate why convexity would be a candidate? Convexity defining a class is clear, but as a proxy for interpretability I have not heard this before. I would assume this would need some measure of degree of convexity - different than strong > strict > convex

lasso1 · on March 20, 2018

Interestingly, according to the NYT Cruz campaign staffers did not think it worked:

"But Cambridge’s psychographic models proved unreliable in the Cruz presidential campaign, according to Rick Tyler, a former Cruz aide, and another consultant involved in the campaign. In one early test, more than half the Oklahoma voters whom Cambridge had identified as Cruz supporters actually favored other candidates. The campaign stopped using Cambridge’s data entirely after the South Carolina primary.

“When they were hired, from the outset it didn’t strike me that they had a wide breadth of experience in the American political landscape,” Mr. Tyler said."

https://www.nytimes.com/2017/03/06/us/politics/cambridge-ana...

lasso1 · on Feb 11, 2018

They actually do adress this, in what i thought was the weakest part of the paper, the "non-key factors" section.

Fourth, vaccines are only active while pathogens are inside hosts, but drugs can remain active in environmental reservoirs [89], suggesting that the strength of selection for resistance may differ for drug and vaccine resistance. However, drug resistance readily evolves even in pathogens that lack environmental life stages such as HIV [8].

lasso1 · on Oct 16, 2016

This sounds interesting. I am in a field where a lot more focus is on visualizing samples using different metrics with PcOA instead of using regular PCA.

If i just scroll through the Zheng et al arxiv paper it all seems a little arbitrary to me. Selecting a 1000 features, then 50 components. They argue that it is for computation time reasons, but is there any kind of benchmark suggesting this is a better strategy than just plotting the two first components or using MDS which also has the advantage in this scenario of being convex?

gww · on Oct 16, 2016

There are so many arbitrary choices made when analyzing single cell RNA-seq data. There coverage cutoffs to decide when a gene is expressed, arbitrary QA points to decide when a cell is "good quality", the PC's chosen for t-SNE, the genes identified as more variable than estimated levels of technical noise, etc etc. is very frustrating. This leads to huge issues with reproducibility, almost every single paper uses their own in house custom analysis pipelines and they rarely make them open source.

daemonk · on Oct 16, 2016

The main reason is because no one wants to publish technical or "methods" type papers where they assess the technology. There is usually one or two initial big papers introducing the technology that makes a big splash. No one subsequently will want to assess it or improve much on it because it won't publish well and you likely will not get cited for it anyways.

lasso1 · on Oct 16, 2016

Yes i know the feeling.

Our field has some very arbitrary threshold for noise on single features, sounds like there is some slightly more principled strategy in single cell genomics?

daemonk · on Oct 16, 2016

The seurat R package (http://satijalab.org/seurat/) tries to give you more information on how to choose your PCs. But as with anything in biology, it becomes subjective at some point.

lasso1 · on Oct 16, 2016

Thanks!

I happen to have some super high-dimensional data (~100k-1m dimensions), which takes a huge amount of time to work with because i have to custom write everything, and i notice they claim all their underlying functions use sparse matrix representations. Have you tried it in a very high dimensional context?

daemonk · on Oct 16, 2016

With single cell data, gene expression data from each cell is considered a dimension. So you end up with something in the range of 20-30k genes and possibly thousands of cells (~25k x thousands matrix). I don't think the technology is at the scale of hundreds of thousands of cells yet. So I am not sure if this package will handle 100k-1m dimensions.

lasso1 · on Jan 28, 2016

Here is some critical comments about the universal nature of power laws in science (in Science): http://science.sciencemag.org/content/335/6069/665.full