If you are asking about performance, my answer is: I do not know yet, haven't stressed it too much. I would very like to hear any feedback and/or recommendations. Current focus was on simplicity and easy of use, performance will come later.
I hope to prepare more information soon, with examples. I want data brewery to be more distributed with cusomisable nodes (like you would be able to use a distant server as a processing node or part of processing stream).
Goal of data brewery is to provide "way of working with data streams", focusing more on data analysis than on data transformation. However, it does not mean that you would not be able to use it for the further.
Anyway, I would appreciate any feedback, and gladly answer any questions. I am also looking for cooperation, if you are interested, drop me a line.
Stefan - @Stiivi on twitter, author of Data Brewery/Cubes
Check out Mondrian: http://mondrian.pentaho.com/ I used it a couple years back at a startup. It's written in Java and IIRC works (only?) with MySQL. Mondrian took a bit of work to get setup, mostly due to my lack of OLAP knowledge at the time, but once setup, it was pretty nice and fast. I think it uses materialized views for the cube data.
I'm actually working on a new project where an OLAP will be nice. Thanks for posting Brewery, I'll definitely give it a spin. OLAP's don't get much love, but are very useful for certain types of problems.
* What kind of limits and performance does this implementation have?
* Will data be fetched from the database for each query?
* Is it possible to have dimensions with millions of
values and expect reasonable query times?
* Looks like it supports advanced topologies and hierarchies.
How will dimensions with a high carnality affect performance?
See my post about the projects: they are very young, just little over half-year old - performance was not focus yet. I would definitely have them to be able to handle more data more efficiently, however, goal more on simplicity of use than on ability to process really huge amounts of data (like telco data - background where I come from).
Before I answer your questions (I assume that you are referring to Cubes - OLAP framework), I think it would be good to note, that Cubes has pluggable backends. Currently simple denormalisation-based SQL backend and MongoDB backend are implemented. I want to have them more advanced.
* Will data be fetched from the database for each query?
- currently yes, however we did some experiments with plain HTTP caching of Cubes/Slicer server and it worked pretty nicely for our current needs
* Is it possible to have dimensions with millions of values and expect reasonable query times?
- not tested yet
* Looks like it supports advanced topologies and hierarchies. How will dimensions with a high carnality affect performance?
- right, it supports hierarchies, however same as above: not tested yet for performance
I am open to any commeents/suggestions regarding the framework(s).
Stefan Urbanek, @Stiivi on Twitter (author of Cubes)
As for Cubes: goal is to create light-weight framework with pluggable backends. Currently simple SQL backend and MongoDB backend are implemented.
Some public projects that are using Cubes for OLAP are:
Donations for sport and culture:
http://granty.transparency.sk/en/
Public procurements of Slovakia (still under development):
http://vestnik-test.democracyfarm.org/en/report/all?cut=date...
If you are asking about performance, my answer is: I do not know yet, haven't stressed it too much. I would very like to hear any feedback and/or recommendations. Current focus was on simplicity and easy of use, performance will come later.
For brewery, here are some blog notes:
http://blog.databrewery.org/
Presentation where data brewery was used in a project:
http://slidesha.re/i9O4kC
I hope to prepare more information soon, with examples. I want data brewery to be more distributed with cusomisable nodes (like you would be able to use a distant server as a processing node or part of processing stream).
Goal of data brewery is to provide "way of working with data streams", focusing more on data analysis than on data transformation. However, it does not mean that you would not be able to use it for the further.
Anyway, I would appreciate any feedback, and gladly answer any questions. I am also looking for cooperation, if you are interested, drop me a line.
Stefan - @Stiivi on twitter, author of Data Brewery/Cubes