An huge amount of effort and innovation is going into BSD/MIT/Apache projects, including PostgreSQL. What is your evidence that the results are better with the GPL?
Usually means they are trying to sell to enterprise rather than small business. It's pretty much just a variant of "If you have to ask, you can't afford it". Pricing pages like this aren't about finding the price, but initiating a potentially months or years long sales process.
On that other hand, I guess they could just still be working out their pricing ;)
I hear you, and we are working on fixing that even as it might take some time. The challenge for us is that for an enterprise, alternatives could cost literally in the millions (see Oracle pricing at $100k's for just a single 8-core commodity machine). For start-ups, we have offered Citus for prices lower than $5k per node in the past, and we provide an entirely free community version as well.
Essentially, our take is to not have pricing be what stops you from using CitusDB. And if you are an enterprise, the value you get from using Citus should far exceed that you'd get from any other alternative out there.
Yes, it would suck to not capture the value you create, and I think you totally deserve to be paid very handsomely for this tech, of course. Oh, and please make sure to charge extra from the energy, finance and healthcare sectors.
It's just that "please call for pricing" means a negotiation with a sales guy, which many people find uncomfortable, unless they work in corporate purchasing.
No thats not really true, everything is sold for less than it is "truly" worth. Price discrimination works both ways, no reason to assume the seller will capture it all.
Most of the time when garage based uber hackers make a product they lose proportionally more. Not that this is bad, but higher tech doesn't mean correspondingly more profit.
This benchmark confuses CitusDB with PostgreSQL + cstore_fdw extension. CitusDB scales out PostgreSQL to multiple machines, and cstore is a columnar store for PostgreSQL. The author has a clarification posted at the end.
For single node Postgres + cstore numbers on TPC-H, we found that a few simple changes notably help. 1/ Analyze on foreign tables + increasing work_mem helps join queries by 2-4x, and 2/ Using the double precision instead of the numeric type increases aggregate function performance by 6x.
Lastly, we agree that vectorized execution can result in notable performance wins! See https://github.com/citusdata/postgres_vectorization_test for some initial work. We hope to incorporate some of MonetDB's vectorized execution features in cstore_fdw in the future.
Instead of seeing this asa Hadoop alternative - this might be a better alternative to the clunky data warehousing options like Vertica, Netezza, Greenplum, etc.
The Citus vs Hadoop comparison feels a little apples vs oranges as presented.
I worked a bit with Netezza appliances which use an older version of postgres which can spread queries across a Bladecenter ... I wonder how this compares.
The downside of the Netezza (beside the huge cost) is that it is not expandable at all - to get more Netezza you need to buy another multirack system.
Also there is a bottleneck getting data in and out as there are individual host servers that you launch jobs through (ibm x3650s if I remember correctly).
Hadoop does a significantly better job than something like Netezza in those 2 areas.
I guess the head to head comparison would be Citus vs Impala/ Hbase? That is probably where a 'massively parallel' postgres setup that can scale horizontally would out perform its hadoop counterpart.
I don't know much about the practical operation of this kind of software. What is it that makes Citus a better alternative to, say, Greenplum? Both of them are PostgreSQL-derived parallel column-store databases, right? What is Citus's USP?
Netezza is fast. It solves many problems very well. It is expandable. How can it be free and expandable at the same time while also being blazingly fast?
Neat. Postgres has always had a kick-ass I/O layer - particularly on ext4.
I think showing Q2 and Q11 numbers would've been great, because for something like Tez, this is how those plans look in Hive (before the cost-based optimizer work)
You've seen better performance on ext4 than XFS? The opposite has been my experience (mainly on 1tb data across 100 million rows, 20,000 queries/sec). btrfs + compression was 5x faster than XFS, but btrfs has nasty kernel deadlock bugs when the disk is almost full.
It is tricky. It is also hard to make a real FT postgresql instance, as most tutorials have a single pgpool node doing the load balancing, which shifts the SPOF to the pgpool node. You can do it more or less with a virtual IP ala http://www.pgpool.net/pgpool-web/contrib_docs/watchdog_maste...
To add sharding on top of that is a similar tutorial, but even more complicated.
I'm genuinely asking. I did not use it yet, but when I was researching database design subject that was my assumption we would use if we need to scale horizontally.
But lack of sharing is what we get when major open-source projects do not choose the GPL.