Hacker News new | past | comments | ask | show | jobs | submit | lasso1's comments login

Along the same lines, convergence is mentioned as a candidate for interpretability but convexity is not mentioned even once.


This is a very interesting notion, could you elaborate why convexity would be a candidate? Convexity defining a class is clear, but as a proxy for interpretability I have not heard this before. I would assume this would need some measure of degree of convexity - different than strong > strict > convex


Interestingly, according to the NYT Cruz campaign staffers did not think it worked:

"But Cambridge’s psychographic models proved unreliable in the Cruz presidential campaign, according to Rick Tyler, a former Cruz aide, and another consultant involved in the campaign. In one early test, more than half the Oklahoma voters whom Cambridge had identified as Cruz supporters actually favored other candidates. The campaign stopped using Cambridge’s data entirely after the South Carolina primary.

“When they were hired, from the outset it didn’t strike me that they had a wide breadth of experience in the American political landscape,” Mr. Tyler said."

https://www.nytimes.com/2017/03/06/us/politics/cambridge-ana...


They actually do adress this, in what i thought was the weakest part of the paper, the "non-key factors" section.

Fourth, vaccines are only active while pathogens are inside hosts, but drugs can remain active in environmental reservoirs [89], suggesting that the strength of selection for resistance may differ for drug and vaccine resistance. However, drug resistance readily evolves even in pathogens that lack environmental life stages such as HIV [8].


This sounds interesting. I am in a field where a lot more focus is on visualizing samples using different metrics with PcOA instead of using regular PCA.

If i just scroll through the Zheng et al arxiv paper it all seems a little arbitrary to me. Selecting a 1000 features, then 50 components. They argue that it is for computation time reasons, but is there any kind of benchmark suggesting this is a better strategy than just plotting the two first components or using MDS which also has the advantage in this scenario of being convex?


There are so many arbitrary choices made when analyzing single cell RNA-seq data. There coverage cutoffs to decide when a gene is expressed, arbitrary QA points to decide when a cell is "good quality", the PC's chosen for t-SNE, the genes identified as more variable than estimated levels of technical noise, etc etc. is very frustrating. This leads to huge issues with reproducibility, almost every single paper uses their own in house custom analysis pipelines and they rarely make them open source.


The main reason is because no one wants to publish technical or "methods" type papers where they assess the technology. There is usually one or two initial big papers introducing the technology that makes a big splash. No one subsequently will want to assess it or improve much on it because it won't publish well and you likely will not get cited for it anyways.


Yes i know the feeling.

Our field has some very arbitrary threshold for noise on single features, sounds like there is some slightly more principled strategy in single cell genomics?


The seurat R package (http://satijalab.org/seurat/) tries to give you more information on how to choose your PCs. But as with anything in biology, it becomes subjective at some point.


Thanks!

I happen to have some super high-dimensional data (~100k-1m dimensions), which takes a huge amount of time to work with because i have to custom write everything, and i notice they claim all their underlying functions use sparse matrix representations. Have you tried it in a very high dimensional context?


With single cell data, gene expression data from each cell is considered a dimension. So you end up with something in the range of 20-30k genes and possibly thousands of cells (~25k x thousands matrix). I don't think the technology is at the scale of hundreds of thousands of cells yet. So I am not sure if this package will handle 100k-1m dimensions.


Here is some critical comments about the universal nature of power laws in science (in Science): http://science.sciencemag.org/content/335/6069/665.full


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: