I might have missed the memo where Gnuplot has a better api than matplot ;)
Seriously though, there have been many attempts to make a "better matplotlib" and yet it's still going strong - mostly because when you really get into scientific plotting and need print quality plots or embed plots in an GUI with very specific parameters it's hard to beat. Sometimes you just need to place a label in a specific spot and that's where a lot of alternatives fall down. That and the multiple very mature backends and library integrations.
P.S. I also highly recommend using the object base API. There's a lot of learning material around the web that still uses the old MatLab inspired plotting api which has a plot of gotchas, the object based one is pretty clean.
I think matplotlib's main strength is its breadth and power. It really lets you do exactly what you want if you spend enough time fiddling and digging through the documentation.
All this versatility comes at the expense of ease of use. It could certainly do a better job of making the simple common use cases more straightforward.
gnuplot arguably has similar power and versatility and it does make the simple stuff easier.
One thing that matplotlib is IMO bad at is interactive plots. They are very slow, and the controls are not intuitive. 99% of the time you just want to zoom and pan and those should be default actions.
gnuplotlib looks interesting and I will have a look, but these days most of the plots I do are in jupyter notebooks and I really want inline interactive plots so I don't think I will use it much. FWIW, what I use currently is plotly - the interactivity is very good (way better than matplotlib's) and plotly.express is very easy to use for the simple use cases.
I also have difficulties with Gnuplot and Matplotlib. I like Vega [1] that allows me to create visualisations in a declarative way. If I really need something special I go with d3.js, which had a really steep learning curve but with ChatGPT it should have become easier for beginners.
There is so much matplotlib and plotly code on the web, that nothing else comes close to the effortless plotting of matplotlib/plotly.
I almost never have to write the styling myself. LLMs understand matplotlibs complex, but well specified docs really really well.
This points to a larger trend. If you want your language or hard-to-learn tool to get adoption. Then you better have an LLM that does 90% of the work for newcomers.
It's the '*a language is only as good as its idee*' phenomenon that every Java user is surely aware of; but 2024 version.
Have there been many attempts to make a "better matplotlib"? Enlighten me, I have searched for them but never found something which seems to actually try to be "matplotlib but better".
You might be right that there aren't many examples of projects saying "matplotlib but better", but many python plotting libraries are sold with something like "you won't need matplotlib anymore".
Note that there may be times when titles may be substituted, usually for reasons of length, de-clickbating or de-sensationalising, etc. In which case the preferred option is to omit nonessential words or find an alternative, clearer, phrase within the article text itself.
But editorialising, even with relatively innocuous phrases such as "non-painful" is strongly frowned upon.
I don't mind matplotlib at all but I just find it annoying in one of my most important use cases which is to update the plot during a long calculation.
You gave to call fig.canvas.draw but it doesn't always work and sometimes plt.pause works but sometimes it only partially updates the display and you have to call it multiple times. (At least this in is my experience with TkAgg backend.)
Meanwhile you can't interact with it.
I would love a version of matplotlib which lives in a separate process and gets updated via nonblocking communication.
Absolutely yes, but you'd have to design that for each application you develop. I'd love an out of the box solution. Then maybe skip the file and use pipes or sockets.
I always find it easier to produce publication-quality figures using gnuplot (but not with its defaults settings, mind you) than with Matplotlib. Check out http://gnuplotting.org/
Also, it's hard to beat gnuplot's speed refreshing a live scatter plot with many thousands of points using the x11 terminal.
I'm using gnuplot for plotting too (the actual gnuplot application not a library that uses gnuplot as its backend).
And I usually keep computation and plotting separate. Computation produces data files, and a gnuplot script generates plots. This separation of computation and plotting allows updating charts later if needed, collected data can be reused in other plots, and additional data analysis can be performed and charts can be augmented.
So I personally don't see many advantages from integrating chart generation into computational pipeline itself (except for computation monitoring or maybe when user response is needed to direct computation). Because of that, libraries that encourage charts generation from a computed array instead of dumping that data into persisted files feels like an anti-pattern to me.
Completely aeree. I keen computation steps (which create csv files) separate from charting steps. I use make to orchestrate pipelines. I also keep everything under source control, and insert git commit ids into every chart. This ensures that all the analysis and charts can be linked directly to the code used to produce them.
Somewhat agree but sometimes there is need to change/filter the data that goes into making the chart which is only realized after plotting it. Combining data and the figures into one "pipeline" makes it easy to iterate especially with exploratory data analysis.
Regardless, this comment made me think about my general workflow which is usually combined. Appreciate this comment.
This is a minor source of confusion. Pylab and Pyplot are packages within Matplotlib. They are what most casual users experience when they say that they're using Matplotlib. I use them, they're convenient.
A minor headache is when you have to break out of Pyplot to use some of the more detailed behaviors of Matplotlib, and now you're interacting with both Pyplot and the lower level calls. For instance, plt.title('foo') and gca().set_title('foo') do the same thing.
If you're a fluent programmer, you fly past those seeming inconsistencies with barely any notice. Explaining them to a novice programmer is harder.
Matplotlib has been working fine for me. Some caveats. I'm a physicist working in industry, and don't publish in academic journals. But I do a huge amount of data visualization, for my own use, and to produce graphs for internal reports.
The graphs look as good to my eye as what I see in papers, but I have no idea what extra steps are needed to satisfy each journal's style guide.
Before Matplotlib, I created graphs in Excel.
A possible question is whether Matplotlib deserves the status of being the default for teaching scientific programming in Python, or if a different tool would make it easier for beginners.
Nothing, really. I have been using matplotlib for years and it's... fine.
The only problem I have is that is has number of minor annoyances that are never getting fixed, despite being well known and the project actively maintained.
From the top of my head: the Tk backed not supporting DPI scaling on GNU/Linux; the aspect="equal" not working on 3D plots; covered parts of 3D objects appearing in front of the objects covering them, twin axes not having the origin aligned, etc.
Broadly, the problem is that its syntax was meant to reflect that of Matlab, which, I guess, makes it intuitive for Matlab users. For the rest of us, it's mostly unintuitive and inconsistent.
I would be genuinely interested what are the inconsistency from a Python-perspective.
I am former Matlab/Octave user. To me the julia interface of matplotlib is actually quite nice to use, but unfortunately the installation is a bit brittle.
From the top of my head, the one that annoys me regularly is the difference between setting a title/x or range on plot, and on a subplot. So plot.title(), and ax.set_title().
By the way, who came up with the idea that an axis object is a great handle to handle subplot settings..?
People who are making plots in the terminal don't want to type out the fully qualified library name. Majority of plots are written and read only once during data exploration and analysis time.
I've found Lets-Plots [0] to my least painful plotting option for Python so far. I'm so used to the grammar of graphics from ggplot, especially compared to plotting in matplotlib or seaborn.
Interesting. It seems the major difference with plotnine is that lets-plot is implemented in Kotlin and the Python package uses a Python-Kotlin bridge. Both lets-plot and plotnine are available under the MIT license.
It looks like lets-plot does not implement the ggplot2 theme interface yet, unlike plotnine which is 99.99% compatible with ggplot2. On the other hand lets-plot also includes additional geospatial functionality on top of basic ggplot2 compatibility.
Ok, thanks - not sure how I missed that. I guess the conclusion is that we now have two pretty much complete reimplementations of ggplot2 available in Python.
The landing page (readme, in the case of GitHub repos) is way too important. Even a one-click barrier to an examples page obscures the information unacceptably. UX principles also apply to documentation.
Does it implement the greedy matching/parsing of gnuplot? Doesn't look like it?
pl 'data.dat' u 1:3 w p pt 7 lc 3 tit 'Interesting measurements'
flows from my mind like breathing. I miss that brevity and speed so much when exploring datasets quickly in other languages....
It's not nearly as maintainable as more-verbose and well-specified languages, but it's far more readable than regex :).
Edit to add: simply plotting a two-column data file in Gnuplot is as simple as:
pl 'data.dat'
the above plots the third column against the first column with blue circular dots and applies annotated text to the key/legend, as I prefer to do when exploring data for the first time...
From https://github.com/has2k1/plotnine, their front page examples. I get that it's ripping off the ggplot api, but why would they not alias "aes" to something more meaningful? It stands for "aesthetic mapping".
My problem with all the popular plotting libraries in python is they seem designed for researchers slapping together a paper and produce code that's awful to maintain.
> [...] why would they not alias "aes" to something more meaningful?
This criticism doesn't do anything for me.
Is an alias for a 3 letter function name really going to improve readability / usability? It's the same API. Take it up with Hadley Wickham.
I think having one interface that is the same (ok, "mostly the same") for R and Python is a blessing for people who work with both. I don't graph with Python much, but every time I have to plot in Python I sigh and ask myself if it would be easier to just import the data into R.
As who hadn't seriously used any plotting library, some of my questions:
- Why isn't the first term `ggplot(mtcars, "wt", "mpg", color="factor(gear)")` instead? It seems that the second argument always has to be a `plotnine.mapping.aes(...)` call or a saved `aes` value. It is not really hard to distinguish `ggplot(data, saved_aes)` from `ggplot(data, "x-col", "y-col", ...)`. The only issue might be the third `environment` argument, but that can go elsewhere (see below).
- What's up with that operator overloading? Is `+=` even supported? It seems that the original R version had the same syntax, but Python's statement-based syntax makes it annoying to use. Maybe any "additions" should have been moved into `ggplot` arguments by default: `ggplot(mtcars, "wt", "mpg", geom_point(), stat_smooth(...), facet_wrap(...), color="...")`.
- Many "addable" values have common repeating prefixes (`geom_` etc). Doesn't sound like a good API design at all, especially in Python. Probably there should be a `geom` module and so on that are exposed via `from plotnine import *` instead, so that `geom.point()` would work for example.
- Formulae as strings are fine to have, but it somehow has multiple "stages" where they can be evaluated (`after_stat` etc). Such feature would be necessary from time to time, but the concept itself doesn't seem to be not well polished.
As someone who makes plots every now and then with breaks of several months in between, I have found the ggplot syntax surprisingly intuitive and easy to remember even after a long break. Importantly, the basic syntax can be used to compose quite complicated graphs, although it does have its limits.
It's also quite nice to be able to use the same interface in both R and Python, although it is obvious that it was originally developed for R.
do you feel that about Seaborn even? it can be too "on the rails" for some applications, but if the rails meet your needs, the plotting calls look clean and maintainable to me.
Looks great, minor API usability note: I don't know if there is an official convention but when naming an object that has the potential of overriding a builtin, rather than prepending with an underscore, e.g. `_with`, one does append to it: `with_`. The former is used and recognised for unused variables.
> _with is a curve option that indicates how this dataset should be plotted. It’s _with and not with because the latter is a built-in keyword in Python. [1]
I'd imagine an issue with most plotting APIs is that they're declarative, and specified in terms of a semi-mathematical domain -- rather than imperative and specified as graphics operations. Since they're graphics libraries, that would make their operation more obvious.
Having just compared this to a Common Lisp library that does the same thing [0], given Python's prominance in numerical computing, Im actually surprised that the latter is better.
Seriously though, there have been many attempts to make a "better matplotlib" and yet it's still going strong - mostly because when you really get into scientific plotting and need print quality plots or embed plots in an GUI with very specific parameters it's hard to beat. Sometimes you just need to place a label in a specific spot and that's where a lot of alternatives fall down. That and the multiple very mature backends and library integrations.
P.S. I also highly recommend using the object base API. There's a lot of learning material around the web that still uses the old MatLab inspired plotting api which has a plot of gotchas, the object based one is pretty clean.