Hacker News new | past | comments | ask | show | jobs | submit login
Dexplot: Python library for data visualization (dexplo.org)
104 points by illuminated on June 12, 2020 | hide | past | favorite | 48 comments



Of the "we really don't like matplotlib syntax, really do want to live in dataframes all the time, so we wrapped mpl and pandas" plotting libraries, plotnine has impressed me the most by far:

https://plotnine.readthedocs.io/en/stable/

plotnine is an implementation of a grammar of graphics in Python, it is based on ggplot2. The grammar allows users to compose plots by explicitly mapping data to the visual objects that make up the plot.


I'm really fond of https://altair-viz.github.io/ which wraps and translates into https://vega.github.io/vega-lite/.

The interactivity and cross-filtering possibilities is really good when you have many datapoints.

From vega-lite I miss that they did not develop object constancy (https://bost.ocks.org/mike/constancy/) animation possibilities since they're building on top of d3.js.


I'm a big fan of Altair and have used it for demos in the past. However, restrictions such as a 5,000-row limit on dataframes and usage only inside notebooks pushes me to other solutions like Plotly.


BTW, the 5000 restriction is more a "guide" for structuring your data - you can turn it off.

Also, Altair plots can certainly be used outside of notebooks! Happy to share some examples..


Please, I couldn't find a straightforward configuration to do that


When I was in the business of making plots, ggplot2 was by far my favourite. It seemed odd at first, but eventually it clicked and it felt great. Once you grok the grammar of graphics you'll find you can really quickly express all kinds of plots from any given dataset. I would routinely impress people with how quickly I could visualise data.


It looks like this one is designed specifically to be easy to learn and remember. It looks very useful to me for people who make plots less frequently.


What makes you choose this over something like seaborn?


Not parent, obv. But the syntax (or grammar if you like) although jarring at first, is starting to grow on me. I am not fully onboard yet, but it is an interesting idea.


Hey everyone, I'm the author of Dexplot. I have written large sections of books on seaborn, have taught it in classes for years, and had many issues with it, some of which are outlined below:

• Not allowed to set figure size

• No wrapping of tick labels

• No strings for pandas aggregation functions

• No automatic ordering of x/y labels (dexplot provides several options)

• Having to use separate grid functions (catplot, lmplot) for multiple subplots

• Something like 5 different functions for scatterplots. Dexplot has one

• No relative frequency bar charts, which are a fantastic way to explore data. Dexplot provides normalization over any set of variables

• No stacked bar charts

• Seaborn docs have distribution plots (box, violin) in the "categorical" section. A major distinction needs to be made between plots that aggregate, those that show distributions, and those that plot raw data (like scatterplots)

• Returning of matplotlib axes or seaborn grid objects. Dexplot always returns the matplotlib figure

• Seaborn is essentially dead as far as I can tell with few changes in the last 2-3 years. There are even parameters that continue to be non-functional

In the future, Dexplot will add:

• Many more plotting functions

• Several apps (built from ipywidgets) to explore data. Currently, there is one for viewing colors

• Better automatic figure sizing (it exists now, but will be improved)

• Automatic DPI detection so that matplotlib inches correspond to actual screen inches

Dexplot aims to be very intuitive, easy to use, consistent, and allow easy exploration (the name is a smashing together of data exploration plotting).

Here is one example comparison between dexplot and seaborn. https://twitter.com/TedPetrou/status/1271436948721328129

Examples such as these are what drove me to create the library.

I'd love to get feedback and happy to take detailed criticism.


> Seaborn is essentially dead as far as I can tell with few changes in the last 2-3 years. There are even parameters that continue to be non-functional

https://seaborn.pydata.org/whatsnew.html

Seaborn has received a couple updates this year. Not sure what you mean by can't control figure size either. The ways to do so are inconsistent, but they're there.

> I'd love to get feedback and happy to take detailed criticism.

I like your syntax a lot. This page isn't a good way to show it. Seaborn's gallery page is excellent, even if redundant at times. I would dedicate more time to creating more easily useable docs like that. Docs are almost everything when it comes to charting.

Also need to see stuff on how to control aesthetic things like color, outlines, style, etc.


Technically, there was a new "major" release with version 0.10, but it was just some bug fixes and the same as 0.9.1. The last release with anything new was in July of 2018. Given the rate of the last several releases, I don't expect much to happen for a while, thus "essentially dead".

You cannot control axes plot figure size from seaborn directly. You have to access the figure from the axes (which most people don't know how to do) or create the figure first by importing matplotlib. Really annoying for those that just want to analyze data quickly. Grid plots have the ability to adjust figure size, but return a seaborn object and not a matplotlib figure.

Agreed, docs need to get better. Better datasets, a gallery, etc... I've only spent a week on this, so there will be a lot of improvements in the future.


Hey for what it's worth I think this is really impressive for only a week!


Good job, this looks extremely expressive and with fewer corner cases ("dark knowledge") than MPL/SNS, and unlike Altair/Plotly doesn't require a whole browser to display the output!

Still, I'd like to ask if you considered alternatives to MPL for the back-end. It's a venerable but ancient project with years of accumulated technical debt, and I'm sure you had to deal with lots of inconsistencies there.

For example, PyQtGraph is an alternative with a clear class hierarchy and can handle large-scale datasets without slowing down (while anything non-trivial in MPL has you wait seconds to render).

(I'd love to hear more suggestions that don't require a JS engine and don't build on MPL.)


Thanks! I'm focused on building the user-facing API, as this is where I believe I'm best suited to make improvements due to my experience teaching and writing.

I'm definitely open to looking at alternative backends in the future and will check out PyQtGraph, but am sticking to matplotlib for now.


Given this, would you say your goal with Dexplot is to be a better Seaborn (and replace it given its dev state), basically the same usage cases but with improvements as you describe?

Thanks for the effort. Looks like a great project.


Correct, I'd like dexplot to be a superset of seaborn first, making it much easier to use for those that don't want to dip into matplotlib for making minor adjustments that are necessary for most plots (figsize, ticklabels, etc..).

There should be a library to do exploratory data analysis quickly, without having to touch matplotlib, numpy, or pandas, and without installing something like pandas-profiling to make reports.

This is where the apps will come in to allow users to quickly generate reports on things like missing values, duplicate rows/columns, outliers/bad data, view different colors, etc...


To be honest I thought the fig on the right was your proposal. You better improve the default. Plus when you compare the data should be the same.


The data is the same. Dexplot automatically sorts the xtick labels alphabetically. Seaborn uses order of appearance. For the seaborn plot, the figure size and dpi have to be manually adjusted and there is no option to wrap the tick labels. They are a mess and overlap one another. The tick label wrapping is a huge win imo, otherwise you have to rotate them, which makes long labels look terrible.


I personally find the Python plotting landscape a mess and confusing. It always seems to be a lot harder than is necessary to do anything and make it look good.


I agree! I found this presentation invaluable in getting a good overview of the landscape and the different segments within: https://www.youtube.com/watch?v=FytuB8nFHPQ

It helped me settle on https://altair-viz.github.io/ (coming from matplotlib) and I never looked back


I find the big benefit of Altair is that the API is so nice and composable, and because vega provides a ton out of the box to build really high quality visualisations.

For instance, if you take this (sample) report using Altair to plot Default Alive / Default Dead: https://datapane.com/leo/reports/startup_finance_report/ - the interactive code is actually relatively small: https://gist.github.com/lanthias/5a41c1e4b21ae274ddb95cf5ad1...

It's also great being able to add Altair shapes to Folium for geoplotting (as the vega geoplotting is a bit more low-level).

What I really think is missing in the ecosystem is a "vega for tables", so you could be rich, interactive tables with a similar grammar. That would rock.


yes ... I feel like everyone is stuck in a local minima (or maxima, depending on your optimisation direction!) of using matplotlib which is "good enough" but not very good. So there isn't great momentum to improve it, but nobody really likes it. My favorite plotting syntax is that exposed in BeakerX [1], but that's less common than everything else put together.

[1] https://github.com/twosigma/beakerx


yep- far too many libraries. Matplotlib, seaborn, bokeh, plotly, altair, ggplot2, vega, now this. And each library covers about 90% of your use cases on its own but you're constantly having to switch to another one for the remaining 10%. Nightmare.


I am a bit confused why we need another additional package instead of working with seaborn and implement the changes in there.

In my business, we have a lot of test data on a database, where everyone uses their own python-based solutions for plotting, mostly done in a Jupyter notebook.

Guess what, to compare results and have one dedicated style-guide for the project, you create more complexity than needed.

-> I tried Orange3 for a while, which has a really intuitive way to use, but I miss the direct connection to a DB. Any advice warmly welcome :-)


Hey there, I'm the author of Dexplot. There are many issues I had with seaborn

• Not allowed to set figure size

• No wrapping of tick labels

• No strings for pandas aggregation functions

• No automatic ordering of x/y labels (dexplot provides several options)

• Having to use separate grid functions (catplot, lmplot) for multiple subplots

• Something like 5 different functions for scatterplots. Dexplot has one

• No relative frequency bar charts, which are a fantastic way to explore data. Dexplot provides normalization over any set of variable

• No stacked bar charts

• Seaborn docs have distribution plots (box, violin) in the "categorical" section. A major distinction needs to be made between plots that aggregate, show distributions, and those that plot raw data (like scatterplots)

• Returning of matplotlib axes or seaborn grid objects. Dexplot always returns the matplotlib figure

• Seaborn is essentially dead as far as I can tell with few changes in the last 2-3 years. There are even parameters that continue to be non-functional

In the future, Dexplot will add:

• Many more plotting functions

• Several apps (built from ipywidgets) to explore data. Currently, there is one for viewing colors

• Better automatic figure sizing (it exists now, but will be improved)

• Automatic DPI detection so that matplotlib inches correspond to actual screen inches

Dexplot aims to be very intuitive, easy to use, consistent, and allow easy exploration (the name is a smashing together of data exploration plotting).


Great work. This is easily among the most elegant Python plotting APIs for exploratory data analysis. Seaborn is a fantastic tool but I sympathise with the author's frustrations with it.

Excellent documentation, but as others have said docs are everything with charting, so expanding that and adding a gallery are probably the best ways to get people onboard. Looking forward to giving this a try.


The facet options with the simple “split” arguments look quite nice.


How do people feel about Plotly?

I've been using it for years and by far it's been the easiest plotting library (while still being really flexible).

The company is very active in developing, for example recently adding plotly_express, which lets me get charts with one liners like: px.line(df, x='x_column', y='y_column')

I'm not affiliated with Plotly but just curious what people think since I find it to be an awesome library but I rarely hear about it or meet people who use it.


Last I checked I was only 99.9% sure that nothing was ever being sent to plotly servers. I needed it to be 100%, not even possible for that to happen.

edit: but due to this post I checked again and it seems this is no longer an issue. So +1 to plotly.


Yes, here's the official word on this, under "Can I use Plotly for Python offline, without being connected to the internet?" https://plotly.com/python/is-plotly-free/


In past versions, although free, it's default settings would try to ping their servers. You had to manually set offline mode to False.


I love it! What I always missed in, e.g., matplotlib, is the interactivity. I need to be able to zoom in and out without changing code. Hover information is another great feature coming to mind.


Again a static default plot library? I would love to see some libraries with higher level visualizations like parallel coordinates, star coordinates and so on. I know, they require interaction, but hey ... why not spending time on this and not again a default plot library? Bar plot stays a bar plot, doesn't matter if you plot it with mpl or seaborn or whatever.


Plotly is pretty great for that sort of thing. It has those sorts of things built in.


And for those who missed the announcement, plotly is now an open source library (it used to be a web service where you needed an API key and had to upload data to their servers, etc.) You still get all the "polish" (nice docs, consistent API, examples, etc.) that went into it when it was a commercial product.


Thanks for mentioning this! If people are curious, here's the official announcement: https://medium.com/plotly/plotly-py-4-0-is-here-offline-only...


Thanks for this! I dropped plotly due to having projects on machines wtihout an internet connection. Have to try it again.


The same creators have a few other interesting packages on the same site.

The dexplo package (a dataframe libray) also sounds really interesting. They also have a barchart race, which looks like an excellent way to get to the front page of Reddit's r/dataisbeautiful


Thanks! The whole goal of dexplo and dexplot is to provide a much simpler and more consistent alternative to pandas/seaborn. I find pandas especially cumbersome to work with. I haven't had time to develop dexplo much, but am hoping for a first major release this year.


Thank you for sharing.

I'm curious, how does this compare to Chartify [1]? Note, I am not affiliated with Spotify or the Chartify team.

[1] https://github.com/spotify/chartify


I don't want to be too snarky, but "beautiful data visualizations" does not go well with typographic errors on the example pages:

- overlap of label

- HTML tables overlapping the right many bar


Labels do not overlap and HTML tables are now fixed in the docs.


Without PDF output it's a chore to share visualizations over email, chat, etc. The uncharted aspect of data visualization is reporting and sharing IMHO.


Hey, that's what we do at Datapane! https://docs.datapane.com/tutorials/tut-creating-a-report - you can create a standalone HTML report which contains interactive tables and plots.

You can also publish this to the web (https://datapane.com/leo/reports/stock_report_5d2925b9/), embed it into social media (https://medium.com/@leo_26134/embedding-with-datapane-366e60...), or deploy your Jupyter Notebook or Python script so that other people can run them with parameters to generate reports dynamically.

It's still pretty new, so if there's anything you'd like to see, give me a shout at leo [at] datapane.com.


I love the simplicity


Might want to change your grid settings so site makes better use of large screen real estate. The plots are overflowing under the right side nav.


It's not clear how this API works from scanning the docs, but nested in there is a key point:

> A matplotlib figure object is returned [from each plot function]




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: