Looks really promising for smaller clusters. However, the pull/scraping model fo...

bbrazil · on Feb 4, 2015

> However, the pull/scraping model for stats could be problematic for larger scale.

From experience of similar systems at massive scale, I expect no scaling problems with pulling in and of itself. Indeed, there's some tactical operational options you get with pull that you don't have with push. See http://www.boxever.com/push-vs-pull-for-monitoring for my general thoughts on the issue.

> InfluxDB

InfluxDB seems best suited for event logging rather than systems monitoring. See also http://prometheus.io/docs/introduction/comparison/#prometheu...

simple10 · on Feb 4, 2015

Good point on push-vs-pull. I'm biased towards push because of microservices that behave like batch jobs. In effect, I'm using AMQP in a similar way as the Prometheus pushgateway.

Agreed that InfluxDB is suited for event logging out of the box, but the March 2014 comparison of Influx is outdated IMO.

I'm using Heka to send numeric time series data to Influx and full logs to ElasticSearch. It's possible to send full logs to non-clustered Influx in 0.8, but it's useful to split out concerns to different backends.

I also like that Influx 0.9 dropped LevelDB support for BoltDB. There will be more opportunity for performance enhancements.

jrv · on Feb 4, 2015

Yeah, I would be really interested in hearing any arguments that would invalidate my research (because hey, if InfluxDB would actually be a good fit for long-term storage of Prometheus metrics, that'd be awesome, because it's Go and easy to operate).

However, if the data model didn't change fundamentally (the fundamental InfluxDB record being a row containing full key/value metadata vs. Prometheus only appending a single timestamp/value sample pair for an existing time series whose metadata is only stored and indexed once), I wouldn't expect the outcome to be qualitatively different except that the exact storage blowup factor will vary.

Interesting to hear that InfluxDB is using BoltDB now. I benchmarked BoltDB against LevelDB and other local key-value stores around a year ago, and for a use case of inserting millions of small keys, it took 10 minutes as opposed to LevelDB taking a couple of seconds (probably due to write-ahead-log etc.). So BoltDB was a definite "no" for storing the Prometheus indexes. Also it seems that the single file in which BoltDB stores its database never shrinks again when removing data from it (even if you delete all the keys). That would also be bad for the Prometheus time series indexing case.

pauldix · on Feb 5, 2015

InfluxDB CEO here. It's true that Bolt's performance is horrible if you're writing individual small data points. It gets orders of magnitude better if you batch up writes. The new architecture of InfluxDB allows us to safely batch writes without the threat of data loss if the server goes down before a flush (we have something like a write ahead log).

Basically, when the new version comes out, all new comparisons will need to be done because it's changing drastically.

bbrazil · on Feb 4, 2015

> the March 2014 comparison of Influx is outdated IMO.

I think we expected that, feel free to add comments on the doc for things that are different now.

simple10 · on Feb 4, 2015

Cool. Will do once InfluxDB 0.9 is released in March. Not worth comparing to 0.8 since so much is changing under the hood.