Hacker News new | past | comments | ask | show | jobs | submit login
Gnuplotlib: A gnuplot-based plotting backend for NumPy (github.com/dkogan)
78 points by dima55 on Feb 1, 2024 | hide | past | favorite | 66 comments



I might have missed the memo where Gnuplot has a better api than matplot ;)

Seriously though, there have been many attempts to make a "better matplotlib" and yet it's still going strong - mostly because when you really get into scientific plotting and need print quality plots or embed plots in an GUI with very specific parameters it's hard to beat. Sometimes you just need to place a label in a specific spot and that's where a lot of alternatives fall down. That and the multiple very mature backends and library integrations.

P.S. I also highly recommend using the object base API. There's a lot of learning material around the web that still uses the old MatLab inspired plotting api which has a plot of gotchas, the object based one is pretty clean.


I think matplotlib's main strength is its breadth and power. It really lets you do exactly what you want if you spend enough time fiddling and digging through the documentation.

All this versatility comes at the expense of ease of use. It could certainly do a better job of making the simple common use cases more straightforward.

gnuplot arguably has similar power and versatility and it does make the simple stuff easier.

One thing that matplotlib is IMO bad at is interactive plots. They are very slow, and the controls are not intuitive. 99% of the time you just want to zoom and pan and those should be default actions.

gnuplotlib looks interesting and I will have a look, but these days most of the plots I do are in jupyter notebooks and I really want inline interactive plots so I don't think I will use it much. FWIW, what I use currently is plotly - the interactivity is very good (way better than matplotlib's) and plotly.express is very easy to use for the simple use cases.


I also have difficulties with Gnuplot and Matplotlib. I like Vega [1] that allows me to create visualisations in a declarative way. If I really need something special I go with d3.js, which had a really steep learning curve but with ChatGPT it should have become easier for beginners.

[1] https://vega.github.io/vega-lite/


And LLMs have made it even better.

There is so much matplotlib and plotly code on the web, that nothing else comes close to the effortless plotting of matplotlib/plotly.

I almost never have to write the styling myself. LLMs understand matplotlibs complex, but well specified docs really really well.

This points to a larger trend. If you want your language or hard-to-learn tool to get adoption. Then you better have an LLM that does 90% of the work for newcomers.

It's the '*a language is only as good as its idee*' phenomenon that every Java user is surely aware of; but 2024 version.


Have there been many attempts to make a "better matplotlib"? Enlighten me, I have searched for them but never found something which seems to actually try to be "matplotlib but better".


You might be right that there aren't many examples of projects saying "matplotlib but better", but many python plotting libraries are sold with something like "you won't need matplotlib anymore".


The HN post title should probably not include "non-painful" given that the README never uses that phrase or compares against other plotting libraries.


Email such suggestions to HN mods at <hn@ycombinator.com>.

I've done so in this case.


Why?


HN rules say: “… please use the original title, unless it is misleading or linkbait; don't editorialize.”

Therefore, the title should be “gnuplot for numpy.”


Right.

Note that there may be times when titles may be substituted, usually for reasons of length, de-clickbating or de-sensationalising, etc. In which case the preferred option is to omit nonessential words or find an alternative, clearer, phrase within the article text itself.

But editorialising, even with relatively innocuous phrases such as "non-painful" is strongly frowned upon.


I understand, thanks for the explanation.

It makes a whole lot of sense.


I don't mind matplotlib at all but I just find it annoying in one of my most important use cases which is to update the plot during a long calculation.

You gave to call fig.canvas.draw but it doesn't always work and sometimes plt.pause works but sometimes it only partially updates the display and you have to call it multiple times. (At least this in is my experience with TkAgg backend.)

Meanwhile you can't interact with it.

I would love a version of matplotlib which lives in a separate process and gets updated via nonblocking communication.


Can't be solved by writting output in a file and reading that file in parrallel?

I'm a noob, it's honnest question.


Absolutely yes, but you'd have to design that for each application you develop. I'd love an out of the box solution. Then maybe skip the file and use pipes or sockets.


What's wrong with matplotlib? I might be living under a rock...


I always find it easier to produce publication-quality figures using gnuplot (but not with its defaults settings, mind you) than with Matplotlib. Check out http://gnuplotting.org/

Also, it's hard to beat gnuplot's speed refreshing a live scatter plot with many thousands of points using the x11 terminal.


I'm using gnuplot for plotting too (the actual gnuplot application not a library that uses gnuplot as its backend).

And I usually keep computation and plotting separate. Computation produces data files, and a gnuplot script generates plots. This separation of computation and plotting allows updating charts later if needed, collected data can be reused in other plots, and additional data analysis can be performed and charts can be augmented.

So I personally don't see many advantages from integrating chart generation into computational pipeline itself (except for computation monitoring or maybe when user response is needed to direct computation). Because of that, libraries that encourage charts generation from a computed array instead of dumping that data into persisted files feels like an anti-pattern to me.


Completely aeree. I keen computation steps (which create csv files) separate from charting steps. I use make to orchestrate pipelines. I also keep everything under source control, and insert git commit ids into every chart. This ensures that all the analysis and charts can be linked directly to the code used to produce them.


Somewhat agree but sometimes there is need to change/filter the data that goes into making the chart which is only realized after plotting it. Combining data and the figures into one "pipeline" makes it easy to iterate especially with exploratory data analysis. Regardless, this comment made me think about my general workflow which is usually combined. Appreciate this comment.


Matplotlib isn't very friendly to casual users.

For even the simplest possible plot, I have to create a subplot and axis.

Sometimes I'd like to just plot a function. I don't want to initialize arrays for that.

It's easy to forget that I have to `import matplotlib.pyplot`

I don't need to plot things often, but whenever I use matplotlib, I always have to spend a few minutes to look up how to use it.


>For even the simplest possible plot, I have to create a subplot and axis.

So? Can't that be abstracted away once in a custom lib, of the 3-4 plots you use 99% of the time, and be done with it?

In which case, you just need to pass in your data and labels, in a specific format, and that's it.


> So? Can't that be abstracted away once in a custom lib, of the 3-4 plots you use 99% of the time, and be done with it?

This does not beat gnuplot's simplicity where you don't even need to define that.

The following line is a complete gnuplot program to plot the sine function:

    plot sin(x)
Every parameter of the plot has reasonable defaults, and you can redefine all of them as you wish.


"that can be abstracted away in a custom lib"

Yeah, and that's really not helpful for sharing code or doing exploratory charting. It's never, ever as simple as just "being done with it".

vega-lite-api is my charting library of choice these days. Much simpler than gnuplot, d3, matplotlib, etc.


import pylab as pb

x = np.arange(0, 10, 0.01)

pb.plot(x, np.sin(x)

pb.show()

what do you mean hard?


This is a minor source of confusion. Pylab and Pyplot are packages within Matplotlib. They are what most casual users experience when they say that they're using Matplotlib. I use them, they're convenient.

A minor headache is when you have to break out of Pyplot to use some of the more detailed behaviors of Matplotlib, and now you're interacting with both Pyplot and the lower level calls. For instance, plt.title('foo') and gca().set_title('foo') do the same thing.

If you're a fluent programmer, you fly past those seeming inconsistencies with barely any notice. Explaining them to a novice programmer is harder.


Matplotlib has been working fine for me. Some caveats. I'm a physicist working in industry, and don't publish in academic journals. But I do a huge amount of data visualization, for my own use, and to produce graphs for internal reports.

The graphs look as good to my eye as what I see in papers, but I have no idea what extra steps are needed to satisfy each journal's style guide.

Before Matplotlib, I created graphs in Excel.

A possible question is whether Matplotlib deserves the status of being the default for teaching scientific programming in Python, or if a different tool would make it easier for beginners.


Nothing, really. I have been using matplotlib for years and it's... fine. The only problem I have is that is has number of minor annoyances that are never getting fixed, despite being well known and the project actively maintained. From the top of my head: the Tk backed not supporting DPI scaling on GNU/Linux; the aspect="equal" not working on 3D plots; covered parts of 3D objects appearing in front of the objects covering them, twin axes not having the origin aligned, etc.


Matplotlib is what is wrong with matplotlib..

Broadly, the problem is that its syntax was meant to reflect that of Matlab, which, I guess, makes it intuitive for Matlab users. For the rest of us, it's mostly unintuitive and inconsistent.


Can you give an example of its inconsistency?

I would be genuinely interested what are the inconsistency from a Python-perspective.

I am former Matlab/Octave user. To me the julia interface of matplotlib is actually quite nice to use, but unfortunately the installation is a bit brittle.


Somewhere you can use c="k" for colour but sometimes you cannot, you must use color="k".

Some settings only exposed to figure class but not the ax class. And when you are doing stuff towards ax class, you must write

ax.set_ylim(0,3)

instead of

ax.ylim(0,3)

Matplotlib.pyplot is known for its nonsensical api.


From the top of my head, the one that annoys me regularly is the difference between setting a title/x or range on plot, and on a subplot. So plot.title(), and ax.set_title().

By the way, who came up with the idea that an axis object is a great handle to handle subplot settings..?


I think matplotlib feels natural to people who use(d) Matlab, but not necessarily to others.


    from matplotlib import pyplot as plt
WTF? Broken from right there. What's a plt? "o" key broken? Why didn't they just call it pyplot? Why not just

    import pyplot
    pyplot.plot(lambda x: math.sin(x))
    pyplot.plot(x=[0,1,2],y=[0,2,4])


People who are making plots in the terminal don't want to type out the fully qualified library name. Majority of plots are written and read only once during data exploration and analysis time.


The library is not forcing you to use plt as a shorthand. Nothing is stopping you from calling

from matplotlib import pyplot

pyplot.plot([0,1,2], [0,2,4])


Yeah I know. It's just that they've created an ecosystem of "plt" and that makes me want to use the entire library less.


It's a convention of the math-heavy python libraries, numpy and pandas also often get imported as np and pd.


I've found Lets-Plots [0] to my least painful plotting option for Python so far. I'm so used to the grammar of graphics from ggplot, especially compared to plotting in matplotlib or seaborn.

0: https://lets-plot.org/


Vega-Altair is pretty great as well. It uses a grammar of graphics that’s slightly different from ggplot, but has most of the same advantages.

https://altair-viz.github.io/


Interesting. It seems the major difference with plotnine is that lets-plot is implemented in Kotlin and the Python package uses a Python-Kotlin bridge. Both lets-plot and plotnine are available under the MIT license.

It looks like lets-plot does not implement the ggplot2 theme interface yet, unlike plotnine which is 99.99% compatible with ggplot2. On the other hand lets-plot also includes additional geospatial functionality on top of basic ggplot2 compatibility.


Lets-plot does implement theme and more - flavors (i.e. color profiles) take a look: https://lets-plot.org/pages/charts.html#presentation-options...


Ok, thanks - not sure how I missed that. I guess the conclusion is that we now have two pretty much complete reimplementations of ggplot2 available in Python.


Just a comment for the repo owners: insert images for each code example!


The docs link to the "guide" right at the top. This has basic usage examples and images: https://github.com/dkogan/gnuplotlib/blob/master/guide/guide...


The landing page (readme, in the case of GitHub repos) is way too important. Even a one-click barrier to an examples page obscures the information unacceptably. UX principles also apply to documentation.


Yep. There's a reason for the way I did it, but clearly I need to change it. Thanks for looking.


Does it implement the greedy matching/parsing of gnuplot? Doesn't look like it?

  pl 'data.dat' u 1:3 w p pt 7 lc 3 tit 'Interesting measurements'
flows from my mind like breathing. I miss that brevity and speed so much when exploring datasets quickly in other languages....

It's not nearly as maintainable as more-verbose and well-specified languages, but it's far more readable than regex :).

Edit to add: simply plotting a two-column data file in Gnuplot is as simple as:

  pl 'data.dat'
the above plots the third column against the first column with blue circular dots and applies annotated text to the key/legend, as I prefer to do when exploring data for the first time...


Just curious....can you tell me what that means? I was actually looking at gnuplot earlier, but it always intimidates me.

Edit: you responded fast with your edit, but can you explain the full command -verbose? :)


The command is equivalent to the more verbose:

  plot 'data.dat' using 1:3 with points pointtype 7 linecolor 3 title 'Interesting measurements'


Thank you!


By far and away the best python plotting library is plotnine, a python clone of ggplot maintained by Hassan Kibirige from Uganda.

By the estimation of one esteemed colleague "it obviously pays back the time investment in less than a couple of days". I agree.


I see the potential long term but that api is unintuitive at best. It almost seems obstructively terse.

    (ggplot(mtcars, aes("wt", "mpg", color="factor(gear)"))
     + geom_point()
     + stat_smooth(method="lm")
     + facet_wrap("~gear"))
From https://github.com/has2k1/plotnine, their front page examples. I get that it's ripping off the ggplot api, but why would they not alias "aes" to something more meaningful? It stands for "aesthetic mapping".

My problem with all the popular plotting libraries in python is they seem designed for researchers slapping together a paper and produce code that's awful to maintain.


> [...] why would they not alias "aes" to something more meaningful?

This criticism doesn't do anything for me.

Is an alias for a 3 letter function name really going to improve readability / usability? It's the same API. Take it up with Hadley Wickham.

I think having one interface that is the same (ok, "mostly the same") for R and Python is a blessing for people who work with both. I don't graph with Python much, but every time I have to plot in Python I sigh and ask myself if it would be easier to just import the data into R.


>Is an alias for a 3 letter function name really going to improve readability / usability?

In this case, absolutely. Plus it shows the attention to detail, or lack thereof, across the whole API.


As who hadn't seriously used any plotting library, some of my questions:

- Why isn't the first term `ggplot(mtcars, "wt", "mpg", color="factor(gear)")` instead? It seems that the second argument always has to be a `plotnine.mapping.aes(...)` call or a saved `aes` value. It is not really hard to distinguish `ggplot(data, saved_aes)` from `ggplot(data, "x-col", "y-col", ...)`. The only issue might be the third `environment` argument, but that can go elsewhere (see below).

- What's up with that operator overloading? Is `+=` even supported? It seems that the original R version had the same syntax, but Python's statement-based syntax makes it annoying to use. Maybe any "additions" should have been moved into `ggplot` arguments by default: `ggplot(mtcars, "wt", "mpg", geom_point(), stat_smooth(...), facet_wrap(...), color="...")`.

- Many "addable" values have common repeating prefixes (`geom_` etc). Doesn't sound like a good API design at all, especially in Python. Probably there should be a `geom` module and so on that are exposed via `from plotnine import *` instead, so that `geom.point()` would work for example.

- Formulae as strings are fine to have, but it somehow has multiple "stages" where they can be evaluated (`after_stat` etc). Such feature would be necessary from time to time, but the concept itself doesn't seem to be not well polished.


best by far != perfect


As someone who makes plots every now and then with breaks of several months in between, I have found the ggplot syntax surprisingly intuitive and easy to remember even after a long break. Importantly, the basic syntax can be used to compose quite complicated graphs, although it does have its limits.

It's also quite nice to be able to use the same interface in both R and Python, although it is obvious that it was originally developed for R.


do you feel that about Seaborn even? it can be too "on the rails" for some applications, but if the rails meet your needs, the plotting calls look clean and maintainable to me.


Looks great, minor API usability note: I don't know if there is an official convention but when naming an object that has the potential of overriding a builtin, rather than prepending with an underscore, e.g. `_with`, one does append to it: `with_`. The former is used and recognised for unused variables.

> _with is a curve option that indicates how this dataset should be plotted. It’s _with and not with because the latter is a built-in keyword in Python. [1]

- [1] https://github.com/dkogan/gnuplotlib/blob/master/guide/guide...


I'd imagine an issue with most plotting APIs is that they're declarative, and specified in terms of a semi-mathematical domain -- rather than imperative and specified as graphics operations. Since they're graphics libraries, that would make their operation more obvious.

eg., pseudocode,

    def plotLinear(data, options):
        drawCanvas(options);
        drawAxes(options);
        drawGrid(options);
        drawPoints(data);
        drawLine(calcRegressionLine(data));
        drawLabel(options);


I find it weird that the README for a fancy plotting library doesn't have a single plot in it, other than some ASCII art


Having just compared this to a Common Lisp library that does the same thing [0], given Python's prominance in numerical computing, Im actually surprised that the latter is better.

[0] http://guicho271828.github.io/eazy-gnuplot/


Pandas has the best plotting for NumPy... and matplotlib for that matter.


It’s good, but it’s even better with plotnine (formerly ggplot, the python port of ggplot2).


This seems way more painful than plotting with matplotlib.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: