Not that I am a heavy DataFrame user, but I have felt more at home with the comparatively light-weight TypeTables [1]. My understanding is that the rather complicated DataFrame ecosystem in Julia [2] mostly stems from whether tables should be immutable and/or typed. As far as I am aware there has not been any major push at the compiler level to speed up untyped code yet – although there should be plenty of room for improvements – which I suspect would benefit DataFrames greatly.
> As far as I am aware there has not been any major push at the compiler level to speed up untyped code yet – although there should be plenty of room for improvements – which I suspect would benefit DataFrames greatly.
That's not quite correct. The major `source => fun => dest` API as part of DataFrames.jl was designed specifically to get around the non-typed container problem. And it definitely works. That's not the cause of slow performance.
I think the reason is that, as you mentioned, DataFrames has a big API and a lot of development effort is put towards finalizing the API in preparation for 1.0. After that there will be much more focus on performance.
In particular, some changes to optimize grouping may have recently been merged but didn't make it into the release by the time this test suite was run, as well as multi-threaded operations, which havent been finished yet, should speed things up a lot.
That said, this new Polars library looks seriously impressive. Congrats to the developer.
[1]: https://github.com/JuliaData/TypedTables.jl
[2]: https://typedtables.juliadata.org/stable/man/table/#datafram...