Hacker News new | past | comments | ask | show | jobs | submit login
HyperTools: A Python toolbox for gaining insights into high-dimensional data (hypertools.readthedocs.io)
174 points by schuetze on April 28, 2018 | hide | past | favorite | 23 comments



Quote of the day from HyperTools README https://github.com/ContextLab/hypertools

"To deal with hyper-planes in a 14 dimensional space, visualize a 3D space and say "fourteen" very loudly. Everyone does it." -- Geoff Hinton


Don’t be silly. One can visualize the N-dimensional case and then it is just a matter of setting N=14.


Looks neat and wonderful. It’s always impossible to compress all structure from high dimensions to 2D or 3D via something like PCA or t-SNE, and the focus on “geometric” insight is also encouraging. Cool/appropriate name too, nice choice of well-supported dependencies, and minimalist design are all appealing. This could become a go-to toolkit for early stage exploration, while it’s also pretty and smart enough to wow a coworker / investor. Hope it continues to really develop conceptually and practically!


Do you happen to know of a good graphing package for high dimensional data in R?


I find the best way for my feeble mind to understand more than 3 or 4 dimensions is a parallel coordinate graph where you can view the relationships between two of the dimensions at a time. You can do this interactively in R using plotly:

https://plot.ly/r/parallel-coordinates-plot/


As far as I can tell, HyperTools are a bunch of convenience functions for doing something along the lines of the following dummy R code

    df %>% dim_reduction_function() %>%
       ggplot() + geom_*()
So in a sense, I think you should be looking for a dimensionality reduction library, rather than a plotting library.


so it seems, looking at their gallery, and I'd rather just write the lines calling pca, or whatever.


You should take a look at tourrr which implements a bunch of "grand tour" algorithms in R. These take your on a smooth tour of random projections of your data, visualised in various ways (including into 3d if you have some red-blue 3d glasses!)


Interesting thanks


Good visualisation of high dimensional data is almost all about the mental model and mapping of dimensions, barely ever about the library. ggplot2 is excellent for displaying high dimensional data in R if you can reason well about your own data. If you can't, no library is going to save you.


Well I already use ggplot. I'm not sure ggplot does 3D much.... About the reasoning, well who knows.


If you present an example of high-dimensional data I'd be happy to help you reason about how to visualise it (although this is perhaps better suited to StackExchange).

edit: if you feel you need 3D, you've already failed to represent the data clearly


if you feel you need 3D, you've already failed to represent the data clearly

Why?


I took a visualization course and to paraphrase the prof. you have a 3d object projected onto a 2d (display) plane and reasoning about depth gives additional mental load, not to mention occlusion. You have to interact with the model a lot to get a view you want. The course was based on old papers about UX with a dash of cognitive science, so take it with a grain of salt.


Yup, this. There's almost always a way to represent the data that allows clearer interpretation.


He is umm. very opinionated. Its kind of like saying why need two dimensions, surely a 1D vector is good enough..


The first example

http://hypertools.readthedocs.io/en/latest/auto_examples/plo...

seems a little disappointing -- it seems to exemplify more than solve the problems of representing 3D data on a 2D screen. (Is it interactively rotatable or something? I did run it locally in a notebook and it didn't appear to be.)


Indeed, all the examples appear to be showing datasets that would benefit from good data visualisation and in some cases dimension reduction, but all of them are terribly represented. I don't think this necessarily a problem with the library, but the examples certainly do not exemplify good visualisation of high-dimensional data.


When you say it doesn't exemplify good visualization of high-dimensional data do you have alternate examples in mind? I would like to be able to see examples of good visualization of high dimensional data.


I have many examples in mind. Here's an old article with good examples: https://cacm.acm.org/magazines/2010/6/92482-a-tour-through-t...

In general I'd say that for any dataset, the ideal visualisation depends on the features and meaning of the data, but almost always some combination of geometry, colour, shape and size can capture all the dimensions. Beyond a small number of categories, colour, shape and size are much less visually informative than geometry. Most often creative use of geometry on different scales is the best first stop for high-dimensional data, for example using faceting (like with ggplot2's facet capability: http://ggplot2.tidyverse.org/reference/facet_grid.html). Simulating 3D is almost never required or optimal.


EdX has a good course on this (tho' taught in R) https://www.edx.org/course/high-dimensional-data-analysis-ha...


I can't see anything in the tutorials or examples that makes this a good library for visualisation of high-dimensional data. It seems somewhat equivalent to base graphics in R, which is much less useful for visualising high-dimensional data than anything based on Wilkinson's grammar of graphics, like ggplot2 in R.


Would this be useful for word vectors/embeddings?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: