If what you're asking about is the math, the steps are (essentially) as follows:...

mturmon · on Aug 22, 2018

This is a good summary. For a video that explains some of this (but which still hand waves), see the author's presentation at SciPy 2018: https://www.youtube.com/watch?v=nq6iPZVUxZU

digitaLandscape · on Aug 22, 2018

Is this actually capturing any properties of the original set, or is this a set of operations that will make any input look similar? (i.e. is this just a pretty picture with no real connection to the math.)

throwawaymath · on Aug 22, 2018

You're asking about two somewhat different things.

In the strict sense two things which are equivalent share the same properties, yes. This is the topological generalization of algebraic homomorphisms and analytic bijections. See the example about coffee mugs and donuts both being topological tori.

That being said I can't really comment on the potential artifacting details of this specific algorithm. In theory the overarching idea makes sense because if you find structure preserving maps between sets of varying dimensions you should expect relations within the set to be preserved (i.e. the relational information in the smaller set is equivalent, there's just less of it). But practically speaking not all datasets can be usefully abstracted to a manifold in this way, which means that (efficiently) finding an equivalent lower dimensional projection for the embedding might involve a fair amount of approximation.

With enough approximation you'll introduce spurious artifacts. But that's precisely where all the innovation comes in - finding ways to efficiently find equivalent structures with the representative data in fewer dimensions. This isn't the first proposal for topological dimension reduction (see information geometry); the devil is really in the details here.

thraway180306 · on Aug 22, 2018

It captures more of the spatial (metric topological) arrangement in the set. Example they give in the paper is the MINST dataset where distinctly looking digits like 1 and 0 get separated farther apart and similar ones clump together, whereas t-SNE while correctly delimiting individual clusters clumps them all in a blob.

digitaLandscape · on Aug 22, 2018

Cool. Thanks.

the_cat_kittles · on Aug 22, 2018

sidenote: i only recently realized the X in arXiv is a greek chi, so its sonically "archive"...

throwawaymath · on Aug 22, 2018

Yep, just like LaTeX :)

the_cat_kittles · on Aug 22, 2018

"latechi?" ive always heard it "lah-tek"

Insanity · on Aug 22, 2018

Latech is how I have heard it.

the_cat_kittles · on Aug 22, 2018

im assuming you mean hard "a" like lay-tech. ive heard that too.