Let’s Try t-SNE

TeMPOraL · on Dec 23, 2018

I love the work of Mike Bostock and the team behind ObservableHQ.

This article seems to be both an explanation of how to massage MNIST dataset for use with TensorFlow.js, and a showcase of capabilities of Observable Notebooks. Of note are CORS requests, Disposable API, and implementation and use of async generators. Personally, I'm more interested in the latter, as recently I found myself doing a lot more of R&D than development, and ended up using Observable Notebooks to quickly build interactive prototypes of various feature concepts my team explored.

Looking from the angle of Bret Victor's "model-driven debate"[0], Observable seems to be nearly there as a tool. Almost all the components are built-in, what's missing is IMO some easier way to create "twiddlable" inputs. But I'm sure someone's going to make a lib for it eventually.

--

[0] - http://worrydream.com/ClimateChange/#media-debate

b_tterc_p · on Dec 23, 2018

To anyone using t sne, consider giving UMAP a try.

https://arxiv.org/abs/1802.03426

Equally accessible as tsne in Python.

aflam · on Dec 23, 2018

Thanks for the article.I came across it as part of scikit-tda, but did not investigate. Looks promising! The TDA background is appealing, as well as runtime complexity.

amelius · on Dec 23, 2018

In this tutorial the author reduces the size of the images from 28x28 to 10x10 to make the computation more tractable.

Would that be necessary with UMAP too?

b_tterc_p · on Dec 25, 2018

T SNE and UMAP are techniques for dimensionality reduction. Both of them scale pretty well to high dimensionality problems. I think the author downsized the data because he wanted to provide an interactive model. If I were doing this I would probably just leave it as is for either method. The article I linked actually does benchmark on the same dataset though. For the full 70,000 images at 28x28, it claims 87 seconds for UMAP and 1450 for TSNE.

SubiculumCode · on Dec 23, 2018

Is there an R implementation?

amrrs · on Dec 23, 2018

Yes https://www.kaggle.com/nulldata/tsne-alternate-umap-3d-viz-o... this has links to R UMAP implementation

amelius · on Dec 23, 2018

> They’re not tiled, like you might expect. Instead, each sprite is sliced into rows per pixel, like paper through a shredder, and rearranged into a single row in the big image. To reconstruct a sprite, we must reverse the process.

For most imaging libraries this format would be the natural order, so it would be as simple as providing a pointer to the beginning of the buffer, and the width and height of the image, and the library will simply read the image from the provided byte stream.

Reading data from a tiled image would be more work (unless the image is 1 tile wide)

domoritz · on Dec 23, 2018

This is a fantastic example of all the important steps you have to go through before you get to the final visualization. Many explanations jump over these instructive details.

SubiculumCode · on Dec 23, 2018

I tried applying tsne on an neuroimaging dataset analysis for which I had been using principal coordinate ordination techniques. Results were just OK. The implementation I used in R had a number of parameters to tune, but my interpretation of the visualization changed with those parameters...so I was unsure which to go with.

I wonder if one reason for my difficulty was the noise inherent in functional MRI data.

lmeyerov · on Dec 23, 2018

I found that true of t-sne (and umap + tda) in general. Huge sensitivity to feature encoding and parameter tuning, and libraries in practice fail to help there, which seems a barrier for the huge pool of potential users. Most of all, I've struggled with categorical dimensions (people, computers, things, and all sorts of things you see in most non-scientific settings). That comes back again to the encoding problem.

Would be genuinely interested to solutions there. I play every 3-6mo with these for finding something usable to add to Graphistry. While library devs talk about efforts here, seems to be an on-going challenge. In a sense, Quid has shown it is solvable in specific domains with focused effort. But I'm still looking for intuition to make them predictable & reliable techniques for the common case of structured data..

eggy · on Dec 23, 2018

Great article/notebook! I just discovered Observable last week. I plan on using it a lot more. I have been a Mathematica user for years because of the notebook interface and curated data. Then came Jupyter notebooks. I still like Mathematica/Wolfram Alpha, and it's interface seems to have stood the test of time, but a collaborative notebook format online with a big community sounds awesome. Great explanation for me of t-SNEs.

EDIT: I just saw the link to Stephen Wolfram's essay "What is a Computational Essay?" on the "Introduction to Notebooks" on Observable. No wonder!

master_yoda_1 · on Dec 23, 2018

Here is the link which share my sentiments about these type of articles https://www.kdnuggets.com/2018/02/neural-network-ai-simple-g...