Embedded Python/NumPy in MonetDB

RyanHamilton · on Jan 23, 2017

Pushing calculations to the database is a great idea, good to see it gaining more traction. Too often people pull back "all" the data just to iterate it locally.

I looked at MonetDB as part of researching column databases in general. It was as fast as the commercial databases on some queries, I was really impressed. The main thing that put me off was that you couldn't seem to nudge data layout / the query optimizer to do exactly what you specify. They seemed to have the idea of an all-knowing system that would optimize for you, perhaps as a research idea that works but for commercial use it worried me too much.

For anyone interested in column databases in general I put this list of comparisons together: http://www.timestored.com/time-series-data/column-oriented-d...

brynedwards · on Jan 23, 2017

The list seems out of date, InfiniDB has been open-sourced since Calpont went bankrupt, and Greenplum is open-sourced under Apache 2 license[1]. There's also MariaDB Columnstore, a fork of InfiniDB[2].

1. http://greenplum.org/

2. https://mariadb.com/products/mariadb-columnstore

maxpert · on Jan 23, 2017

Is anyone using MonetDB production? How does it compare to other column stores?

trengrj · on Jan 23, 2017

You can do this in Greenplum too.

I've always liked the idea of moving general compute to the database, but usually in an organisation the databases are pretty tightly locked away and it becomes a pain to get the right versions of your libraries installed.

tingletech · on Jan 23, 2017

I've never hear of this database, but it looks sort of interesting.

espeed · on Jan 23, 2017

See this classic Google Talk by Peter Boncz, the creator of MonetDB (https://en.wikipedia.org/wiki/MonetDB)...

MonetDB/X100: a (very) fast column-store https://www.youtube.com/watch?v=yrLd-3lnZ58

dangoldin · on Jan 23, 2017

It's nice if you're looking for an alternative to a paid columnar database (ie Vertica, Redshift). I used it on a side project and it performed surprisingly well for large queries (think sum/group by across tens of millions of rows).

Not sure how it works on a production system.

assface · on Jan 23, 2017

It's over 15 years old.

Xorlev · on Jan 23, 2017

Still, it suffers from a lack of visibility. I'd only heard about it because one of our DB vendors in the past had mentioned integrating with it.

That said, it's had a lot of interesting development in the last few years (e.g. ocelot which tries to hardware accelerate operations via OpenCL).

tingletech · on Jan 23, 2017

yea, I noticed that when I looked at the site. I've still never heard of it.