Prometheus is a very needy child in terms of data volume and hardware resources....

lima · on July 21, 2020

Please elaborate - how is it one engineer's full time job?

We run Prometheus in production and this hasn't been our experience at all.

A single machine can easily handle hundreds of thousands of time series, performance is good, and maintaining the alerting rules is a shared responsibility for the entire team (as it should be).

gen220 · on July 21, 2020

I think the parent's complaint is a function of how your engineering org "uses" prometheus.

If you use it as a store for all time series data generated by your business, and you want to have indefinite or very-long-term storage, managing prometheus does become a challenge. (hence m3, chronosphere, endless other companies and tech built to scale the backend of prometheus).

IMO, this is a misuse of the technology, but a lot of unicorn startups have invested a lot of engineering resources into using it this way. And a lot of new companies are using it this way; hence the "one engineer's FT job".

dharmab · on July 21, 2020

I'd agree; I'm at a large corp that has a need to store our data for a very long term. If we were using Prometheus as an ephemeral/short term TSDB to drive alerting only, it would be really easy.

user5994461 · on July 21, 2020

For reference: A single machine can push 5k metrics so you're saying a single prometheus instance can easily serve 20s of hosts. lol

sevagh · on July 21, 2020

False in my experience. Full-time job? After the initial learning curve, a simple 2x redundant Prometheus poller setup on can last for a long time. Ours lasted for 30,000,000 timeseries until encountering performance issues.

After that, we needed some more effort to scale out horizontally with Thanos, but again, once it's set up, it maintains itself.

edoceo · on July 21, 2020

My company's Prometheus setup was super easy, one $10/mo box. About 1 week of fiddling all the exporters and configs but now it just runs and has for months.

You can be small with Prometheus and grow into needing an FTE for it - w/o having the migration hurdle of moving out-source to in-source