Hacker News new | past | comments | ask | show | jobs | submit login
A Tour of Python’s Data Visualization Landscape, Including Ggplot and Altair (dansaber.wordpress.com)
240 points by kawera on Oct 17, 2016 | hide | past | favorite | 36 comments



Starting out with R and later moving more into Python I absolutely hate the seemingly unnecessary complexity of matplotlib. Honerable mention is also bokeh - http://bokeh.pydata.org/en/latest/ which does a nice job.


As someone who has never used R, but uses matplotlib frequently, could you give some examples where matplotlib is comparatively more complex? I'm genuinely curious. Also, have your tried seaborn [1]? It's a matplotlib wrapper for making some common statistical plots less tiresome.

[1]: https://seaborn.github.io/


https://cran.rstudio.com/web/packages/dplyr/vignettes/introd...

In R we have a neat Library that really shows the simplicity of called dplyr. If you just look at this little vignette I think you will see the simplicity we are moving in as a language.


The article does a good job for scene 5 where they show the code for a barchart in matplotlib in python, ggplot2 in R , and ggplot in python (with help from pandas).


Just stating this so people don't view this as a slight to R, because I don't think that was your purpose.

I started in Python and then Pandas and moved to R and haven't looked back.

R vs Python is really just a personal experience. They are both great choices. I just like the R world more and the way everything can be integrated with each other. For example the Hadley Wickham Universe of Libraries.


Matplotlib allows insane customization including 3d plots with projections, and plots with more than 2 y-axis.


The matplotlib API was designed for people moving to Python from Matlab.


This is interesting. I have to say that for all the power that Python has in the scientific community, and scientific computation in particular, visualization has a long way to go. There are charts I made 5 years ago in C++ using a framework called ROOT that was much prettier. ROOT was only maintained by a couple of people at the time. This was particle physics.

Example:

https://en.wikipedia.org/wiki/ROOT#/media/File:CMS_ROOT_plot...


I use both frequently. The real thing I like about root is the ability to interactively change the style of the plot itself by opening it in the editor. Saves a lot of time, better than having change one line of code and recompile just because a label is off by a few pixels


In the past I've used Inkscape to hand-massage some mpl plots exported to svg before including them in a document. Works well enough if needed.


ROOT is a parallel computing universe.

Fun fact: there is a Jupyter kernel for it!


matplotlib: The 800-pound gorilla — and like most 800-pound gorillas, this one should probably be avoided unless you genuinely need its power...

Hmm, I never even considered that matplotlib was some sort of high-end powertool. At least not when using the old-fashioned MATLABish API. If the advantage of the competition is that they are supposed to be easier to use, then I will stick with matplotlib.


Yeah, that's an odd comment to make. I can't really imagine using anything else but matplotlib. I'm a scientist and need fine control over most access of my plots, though the interface for it I admit is a bit cumbersome.


The comment is perfectly reasonable: you need an 800 pound gorilla, but many people don't.


Also, most of these libraries are implemented using Matplotlib. If you want full flexibility, or custom visualizations, Matplotlib is the way to go. I would recommend one of the other libraries for interactive plots, though.


Last time I needed a quick bar plot I spend several hours in matplotlib trying to get something worth-while. And then I needed to add another column ...

How would you go about the bar-plot example in this post (first example in Scene 5)? And how would you go about expanding to to have 1 more bar in each group?

I find this to be something I want to look at quickly, to judge my data and judge if it is something I need to look at, and fiddling with hard-coded spacing and line widths is not the way to get anything done quickly.

So I am generally curious how you go about it in a way that is easy enough for "easier" not to be worth your while.


By far the easiest way I've found to plot data from pandas dataframes is plotly with the cufflinks wrapper[0]. For anything that isn't easy using cufflinks, I have started learning R only because of ggplot2.

I see a lot of comments saying matplotlib is easy to use. It isn't. If you have to write a for loop to iterate over your data to make a bar chart where you define the x and y by hand, that's not as easy as it should be. If you have to write a library to use your library, it's overkill for most plotting. For that kind of power, I'd do it in javascript with d3js so I get css for the styling and a much easier path to interactivity.

[0] https://plot.ly/ipython-notebooks/cufflinks/


You missed this: Custom visualizations for Jupyter

http://nbviewer.jupyter.org/github/vidalab/vida-notebook/blo...

Disclaimer: I work at vida.io.


My go-to library, not mentioned in this post, is plot.ly in offline mode for easy things. Matplotlib for custom plots where I want fine-grain control and extra stuff such as using my own shapefiles.


I don't know if it qualifies here, but there's the orange python package that help to have a pretty advanced standalone dataviz solution

http://orange.biolab.si/


If you are not yet strongly attached to a graphing solution I'd like to suggest gnuplot. It can be used with Python through a thin library, by commanding it through a pipe, or simply by writing your numbers into a file that is read by a gnuplot script.

It's better suited to scientific-style graphs than such things as bar- or pie-charts, but it can be cajoled into making those, too.

There is some advantage to decoupling your visualization task from the rest of your computation.

https://lwn.net/Articles/628537/


These are all nice for static visualizations. But the only thing I have found that can stream lots of data on the fly is PyQtGraph. Doesn't look the prettiest, but is very powerful and fast. Although I wish it was more intuitive to setup sometimes, it can be tricky to tell when to use Qt stuff or the plotting libraries as they are pretty closely coupled.

http://www.pyqtgraph.org/


I believe bokeh can handle streaming data quite well. I remember at least one demo of various spectrogram and related plots updated live from the microphone on the presenter's laptop. It seemed impressive.


Nice article comparing the "easy to plot". I've missed something comparing the quality of the generated plots. I don't have a lot of experience using the libraries, but I've used matplotlib and was impressed with the low quality of the defaults. I've thought we all would live by now in Tufte's wonderland, but the graphic defaults aren't very good.

Do you have any suggestion on a lib that generates high quality images?


while obsolete, I still prefer working with PyQwt - http://pyqwt.sourceforge.net


Theres a few things missing in this comparison:

How do you go about plotting things that aren't in a data frame? And how do you customise your plots?


Bad news: Seaborn's documentation appears to be gone. The author and/or Stanford appears to have removed public access.

http://stanford.edu/~mwaskom/software/seaborn/generated/seab...



Looks like it moved to https://seaborn.github.io/ , though there's no redirect in place


It is mentioned as an issue on the github page.


What is the best python tool to plot a million points in the same figure?

I'd love to see a quick comparison post like this for millions of points.

Would it be easier to plot with JavaScript? I have the two options.


The real question to ask is "Why am I plotting a million points?"

Unless you're making a 1000' x 1000' display representing our understanding of remote galaxies, chances are there's a few conceptual simplifications that can be taken before rendering the visualization.



Its always been difficult to connect the Python visualization to Frontend - Altair looks like it support Vega-lite Json Output which helps in frontend - rendering...


I know the article focused on 2D, but it's worth mentioning www.vtk.org for 3D. C++ native, with excellent Python bindings.


Better than any of these: R plotting with rpy2.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: