Hacker News new | past | comments | ask | show | jobs | submit login
Lets-Plot: An open-source plotting library for statistical data (lets-plot.org)
313 points by pkkm on July 15, 2023 | hide | past | favorite | 86 comments



Wow nice, it's based on ggplot2-like grammar-of-graphics language. See examples here: https://lets-plot.org/pages/charts.html

The data plots look pretty nice: https://lets-plot.org/pages/charts.html#discrete-icon-discre...

And so do the distribution plots: https://lets-plot.org/pages/charts.html#visualization-of-dis...

How does this project compare to `plotnine`, which is the ggplot2-like plotting library in Python?


Is ggplot2 generic enough that it allows one to create any graph one wants, or is it opinionated in how or what graphs can be made?

However the best thing about this library may be that it is available for Kotlin.


Ggplot is opinionated, but mostly in its api.

From memory the only thing you can’t do in base ggplot is have to y-axis, but there are plenty of libraries available that expands the types of plots available (eg gggraph for plotting network graphs)


You can’t have two y axes? I’d call that a pretty strong opinion.


This isn’t the case


Well, the thing is you can have two y axes, but they must be simple transformations one another. The reason being (iirc, don't want to put words into Hadley Wickham's mouth) is that if you cannot provide a transformation from one y axis to the other, then the data shouldn't be in one plot to begin with (as it then shows clearly distinct data).


>However the best thing about this library may be that it is available for Kotlin.

What? How's that? It's written in python. Does kotlin have python interop?


It's linked in the description

https://github.com/JetBrains/lets-plot-kotlin


Downside of all the ggplot-alikes in languages other than R is that they lose the enormous number of amazing add-on libraries[1] available for the original. Personally, where it's better I do a lot of data crunching in Python, then export to R and do all my graphics there. I feel like the statistics crowd just appreciates graphics more and has spent more time getting it perfect. Another plus is that Copilot really helps with R-based ggplot semantics and options because it's got so much to learn from. Not sure that would be true for the subtle differences in the ggplot clones.

[1] https://youtu.be/7UjA_5gNvdw


Nice video. It definitely tells the story that R provides extra leverage in exposition. Might as well learn some R rather than yet another python plotting wrapper.


As a novice, R is really fun too because it operates differently than general purpose languages (it's all about matrices).


>Copilot really helps with R-based ggplot semantics and options because it's got so much to learn from. Not sure that would be true for the subtle differences in the ggplot clones

I would suspect this wouldn't be much of a hurdle for an LLM. If you've ever tried converting scripts from one language to another you can see how well LLMs generalise (not perfect ofc). So, as long as you give it enough context it will probably provide viable results.


Except if the range of the function given the same domain is much, much bigger. Extrapolation of a [massively multidimensional, non-linear] function is a lot harder than interpolation.


This seems quite similar to plotnine [0], which also provides a grammar of graphics interface for Python. That said, I love ggplot and I can't wait to use this in my research! I hope we can port/re-implement ggthemes, scientificplots [1], and other ggplot libraries for lets-plot.

0: https://plotnine.readthedocs.io/en/stable/

1: https://github.com/garrettj403/SciencePlots


Why though? Who was desparately looking for one more plotting library??


It does statistical plots and isn’t matplotlib based which is a great start.

I find Plotly finicky, Altair doesn’t have great ergonomics, and bokeh is the same imperative style as matplotlib (plus is kinda heavy weight). Seaborn is good but you’re still playing with a leaky abstraction on matplotlib which can make things hard to compose and you can’t get interactivity.

So me, I’m asking for this. I’ve tried building my own because I want good interactive charts with fast native ergonomics. Ggplot just lets you focus on what you want to plot and throw a data frame into it, which is what this appears to do.


> It does statistical plots and isn’t matplotlib based which is a great start.

What? Matplotlib has the best, intuitive interface among the competition. Every other plotting library I have tried either leads to ridiculously verbose code or is too opinionated to be useful.


This comment is surprising. Can you remember how to do 3d scatter plots, for example, without looking at the manual ? IIRC, that's an example of a shitty interface from matplotlib, while plotly is better on this regard.


I very rarely see a 3d scatter plot that's useful.


scatter3?

matlab provides a slightly esoteric yet incredibly consistent and lightning fast environment for analysis and visualization. it's kinda like unix in that regard.


> What? Matplotlib has the best, intuitive interface among the competition. Every other plotting library I have tried either leads to ridiculously verbose code or is too opinionated to be useful.

What??? Have you ever used ggplot?? I know we all have opinions, but yikes, just look at the tutorials!


The parent comment may find matplotlib approachable because they are a programmer already. A lot of science people are not programmers. So when they have to choose between matplotlib, where you will need to understand OOP, and ggplot, where you can get away with not knowing what a class is, its pretty clear which one is more accessible.


It’s not the programming. Every default in matplotlib is poor. You have to write about ten times more lines of code to get something clean with legible fonts and minimal, understandable aesthetics. Very few people bother and then you get presentations full of ugly, time-wasting plots. Ggplot2 makes it harder to obfuscate your intent and very easy to experiment with different aesthetics and fix up the style of your plot. Yes it is opinionated, because, thankfully, there exist a good set of useful ideas for communicating data analyses.


The matplotlib defaults are poor, but you can import the style class from matplotlib and apply decent defaults

> from matplotlib import style > style.use([…])

I’m on vacation away from a computer so hopefully that snippet is close enough that you can Google it.


Mmmm, ggplot has better defaults but I am still impressed by how bad the plots some people produce are. Absolutely agree on ggplot's ease of use, but that is partly a consequence of not needing to know how to program to use it.


Have you tried plotnine? It's a faithful port of ggplot to python. It's built on top of matplotlib, so no interactivity, but I have not noticed any problems with leaky abstraction.


matplotlib supports interactivity if you use the qt backends.


Do those work in a notebook interface?

Usually I don’t control the server my code is running on, I’m only exposed to a Google Collab style jupyter notebook.


no, unfortunately not. it's been a while since i've looked but i think mpld3 and plotly were the best bets when i last was trying to make use of notebooks.

i've personally never been a big fan of the horizontal breaks in the editor that notebook experiences provide. more recent versions of jupyter have added a side-by-side mode which is kind of an improvement but it still doesn't go where i'd fully want, which would be a full blown editor pane with a full blown document pane that sit side by side and are linked by user placed anchors with plots that can be popped out and floated.


I think it's for use in their new Kotlin Notebooks


Everyone. All the alternatives have some big disadvantages, so more competition is good.


Is their a Python-centric tutorial or guide on how to construct plots in this grammar of graphics way? Or do you need to read the ggplot2 book and translate R examples to Python?


Vega-Altair seems to have some introductory documentation and a tutorial for their flavor.

https://vega.github.io/vega/docs/


It's all Kotlin Multiplatform with a thin Python wrapper. That's pretty amazing.


It would be amazing if you could use it for all Compose Multiplatform targets. Right now I see it supports Kotlin/JS and JVM: https://github.com/JetBrains/lets-plot-kotlin#in-jvm-js


It has a Kotlin API as well, so Yes!


The biggest issue I had with plot libraries is that they don't work out of the box for millions of data points. Last time I was doing a data science project, I tried all of the major plotting libraries and none of them works well beyond a few million data points. I want a graph that I can visualize and zoom in/out in real-time and that became the hard part of the project. Only one product claims to be able to handle it using GPUs in the cloud and it needs a paid subscription and uploading your data into the cloud. I don't want yet another library, but some library that works really well and can utilize local GPU for plotting.


This one does! https://github.com/wwwtyro/candygraph

Scroll down into the examples for some plots with lots of points: https://wwwtyro.github.io/candygraph/examples/dist/


Or that interactivity, 3d plots and styling of plots are kind of half baked, if supported at all.

Or that they try to emulate the non-intuitive Matlab plotting interface.


Surprisingly, immediate mode graphing libraries work pretty well at this!

https://github.com/epezent/implot

Java: https://github.com/SpaiR/imgui-java

Also for rust: https://www.egui.rs/#Demo (Open Plot demo)

For web you'd want to compile for WASM. I imagine you could just make the graphs WASM and embed in existing DOM.


If you use Julia, Makie crushes this use case and comes with great Python interop.

https://github.com/holoviz/datashader is a good one in the Python ecosystem.



That doesn’t sound like it should be that bad. Are you only looking at Python libraries?


Plotly works pretty well (which I suspect you're alluding to) and it works completely offline, no need to upload any data to their cloud.


Not my experience at all. After a couple thousand data points it becomes completely useless to the point that it completely freezes a jupyter notebook on my 64 core threadripper. Just trying to zoom can take minutes. It's a total joke.


That's odd. Are you sure this is not related to Jupyter? I use plotly.js via a Rust wrapper (https://github.com/igiagkiozis/plotly) and the performance seems ok when generating a static, interactive html. The wrapper language itself should be irrelevant here. Is it the same if you generate a static html-file? (EDIT: I only view the html in a browser as is, no notebooks)

While I can't speak for millions of data points, generating a gyroscope plot with x, y, z, where each gyro axis is 400k+ samples is fine performance wise. This is generating a static, interactive html. Zooming etc is fine on my M1 MacbookPro 13" - delay when zooming in this specific case is maybe 0.5secs. The html-file is 60mb+.


I might have got a million samples in to KST before. It was always extremely fast and has great ergonomics for panning and zooming plots


at a certain point it would probably help to meaningfully downsample/summarize the data at the larger scales..."semantic zooming"...then you just aren't plotting as many points


I too want the moon on a stick, for free


So no criticising open source code then?


ggplot2 is great for exploring data. Once it was a unique selling point for R.

For Dashboards I prefer Apache ECharts:

https://github.com/ecomfe/awesome-echarts


Thats really cool! Does it reimplement ggplot2 in python? pygg is a lightweight library that transpiles ggplot syntax in python into R ggplot2 code. Downside is that it is not interactive and executes in R; upside is it run hadley’s ggplot inplementation in R.

https://github.com/sirrice/pygg


Kinda disappointing to me that, as far as I can tell, they directly copied ggplot. ggplot is not the last word in designing visualization libraries. For example, it has the concept of a scale which is exactly a function. This just adds useless conceptual cruft in the library. Getting rid of that is an easy improvement.


Whatever ggplots flaws are, I haven't found matplotlib, base R, or any other plotting library to allow me to generate plots at essentially the speed I can type at. I can do that in ggplot without any help. I can only get close to that in matplotlib with recent exposure to it and github copilot.


I'm a big fan of the compositional style of ggplot vs the imperative style of Matplotlib. I just don't think ggplot is the last word in this style of library.


A scale isn’t exactly a function because it also needs the inverse in order to draw axes and legends. And it turns out axes and legends are where most of the complexity of scales lie.


Since this is Kotlin there's not really a sane way to call this from Clojure, right?


Most of Kotlin should map quite well to Java primitives, AFAIK. But I’m sure some bridge/wrapper will soon surface if it is truly great - the Clojure ecosystem is a niche, but surprisingly it always has high quality wrappers for anything I might touch :D


On a similar vein, there is a cli tool that can draw plots on the terminal.

https://github.com/red-data-tools/YouPlot


I hope with every fibre of my being that this eventually replaces matplotlib and pyplot.


I can clearly see they've tried, and perhaps succeeded, in solving some of the frustrations of those tools.

However, what on earth kind of behind the scenes object oriented abstract bastardization have they done wherein you modify the instance of the ggplot class by ADDING stuff to it with a + sign!? Like, when you're trying to toggle one of the flags that would get passed to the class object, why not just do it the way every other class in almost every python library i've ever used does?

I find that horrifying. And I think it is just begging to get some bizarre errors, and I think that


It’s a port of the R ggplot2 api which works the same way. This wouldn’t be a surprise to most R and statistical computing users.


    def __add__(self, other):
      self += other
      return self
    
Simple


what is the alternative? python doesn't have a pipe operator

do you want to do

plot(density(colorify(labelaxis(logscale .... )))))))


Python doesn't have a built-in pipe operator, but that doesn't mean you can't do it. The pipe [1] library is popular.

    is_even = where(lambda x: x % 2 == 0)
    sum(fib() | is_even | take_while(lambda x: x < 4000000)
The Apache Beam SDK for Python is another example. It has its own pipe expressions (|, >>, |>, etc.).

[1] https://github.com/JulienPalard/Pipe


I would haver have + than | if we introduce "custom" operators.

Like maybe bitshifting strings into streams wasn't the right syntax, Bjarne


datar effectively recreates a pipe operator and the tidyverse approach.

  df >> mutate(z=f.x)
https://pypi.org/project/datar/


No empirical CDF plots? This is the first thing I do when looking at the new data.


There is no such thing in Lets-Plot yet, but there is a Q-Q plot: https://lets-plot.org/pages/api/lets_plot.geom_qq.html

For a uniform distribution, a Q-Q plot would look almost the same as eCDF, only with switched axes, and with points instead of lines.


Any good charting libraries for react? I've tried react-chart-js, Rechart and they all just... suck. Non-intuitive use, problems scaling in element containers.

Settled on nivo for now but it doesn't support multi-axis plots :/



Plotnine is a well-established implementation of the Grammar of Graphics in Python, and to a lesser extent Seaborn. Why should I use this instead?


Does this provide interaction? I.e. getting values at a user-picked point, moving labels around, or viewing a 3d plot from a different angle?


Wonder how this compares to vega plotting library


Vega like ggplot (which this is very close to) build on the grammar of graphics but differ in the targets they support (Vega is using web technologies), language design details (e.g., Vega and ggplot use different terminology for the graphical elements and the encodings), and supported features (Vega has a reactive dataflow and support for building custom interactions). There is a lot more to be said but that’s a start for how they compare.


Or even ggplot2 for R. We use vegalite for our web based graphing tools and really like it.

https://vega.github.io/vega-lite/

A lot or are scientists use R and ggplot2. This looks more similar to the thus but with a python twist.

https://ggplot2.tidyverse.org/


I see wildcard imports. I'll pass. Also if you want to use R in python why not use the rpy2 library?


It's actually written in Kotlin with a Python wrapper, an alternative is Plotnine, which is a pure python reimplementation of ggplot.


Is there a way to export them to a raster format? or are they only usable as HTML?


I mean if it's vectorised then you can make a raster by screenshotting it? Looks like they support PNG at least: https://lets-plot.org/pages/api/lets_plot.ggsave.html?highli...


Yes.

  ggsave(plot, "plot.png")


Oh very cool!


pip not found on my windows machine.

Then i try python -m pip install

Python is not found, too.

Then i tried pip3 install.

It worked.

I should try with python3 -m pip3 or python3 -m pip install ?

Confusing.


For a recent python install, python should be available under `python`, but you should also have the `py` utility installed - this is normally helpful in selecting from a set of versions. If neither of these resolve, the first step is probably to reinstall python

With regard to installing packages, I think the general rule is that `python -m pip` is best practice, because it assures that the python you install packages to is the same one you're planning to run them on


Tbh If pip isn’t found nothing much is going to work. I would fix that first, and although it’s an overhead I would learn about virtual environments too because will save pain fairly quickly.


py -3 -m pip




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: