Ask HN: Monitoring historical server resource usage on linux?

ezmobius · on Oct 4, 2009

Collectd, I swear by it, running it on thousands of hosts without issues.

jacquesm · on Oct 4, 2009

jbyers · on Oct 4, 2009

Collectd is great. High resolution data collection, reliable, low resource usage. We wrote our own rrd graphs against it.

employfive · on Oct 4, 2009

What do you use to generate graphs?

janitha · on Oct 4, 2009

collectd can spit out rrd files, and there are tons of rrd graphers out there.

I use drraw... it's really old and boring, but I like it for it's simplistic and I am used to it. (it's just a perl script you run on your web server, no DB needed)

jbellis · on Oct 4, 2009

There's a ton of projects, none of which is truly easy to deploy, all of which kinda suck, and all of which store data as RRDs (which is the worst possible solution except for all the others, as the saying goes).

This is an area that could use some new blood, but apparently the existing solutions work barely well enough.

lg · on Oct 4, 2009

You probably know about sar, but a lot of people may not; it can get advanced with all the options, but for simple cpu spot checks:

  sar -u X Y

prints Y lines of utilization info, one every X seconds.

rv77ax · on Oct 4, 2009

http://www.solarisinternals.com/wiki/index.php/Dim_STAT

<quote> All STAT data are collected from standard Solaris or Linux programs (vmstat, iostat, etc.) or some special (like psSTAT for users/processes activity) and saved in MySQL database. Collected data are accessed via Web interface and can be presented in several manner (interactive or static graphs, text, HTML tables).

dim_STAT can be used for On-Line monitoring one or several hosts on the same time. As well, data may be easily post loaded from output files of stat commands and analyzed in the same manner. At any time collecting from new stat commands may be added to the tool (via Add-On interface) and enlarge your view on application workload, RDBMS, your personal STAT program, etc. </quote>

neondiet · on Oct 5, 2009

Where I'm working at the moment we use Zabbix to monitor system availability and collect performance and trend data for 184 servers [that generate 8701 monitored items of data]. The Zabbix folk do all their development on Ubuntu and it's their preferred platform; but it works just as well on Redhat, CentOS, etc and has client agents for a wide variety of platforms.

jws · on Oct 4, 2009

I keep munin, http://munin.projects.linpro.no/ installed on all my servers. When things begin to look unwell, it lets you look back to see what resource, temperature, voltage, or activity might have changed.

It is alleged to be easy to add your own data points to it as well.

JustRick · on Oct 4, 2009

+1 for Munin. Very easy to install and deploy. There is a central collecting server and each host being monitored runs a fairly lightweight agent. You can write plugins in any language (bash, Perl, Ruby, PHP etc) to graph custom data points. I prefer Munin over Cacti because Munin's config is in simple text-based config files which can be scripted.

tlrobinson · on Oct 4, 2009

It's pretty easy add your own graphs, you just write a simple command line program that outputs the data you wish to graph in a simple textual format.

carl_ · on Oct 4, 2009

I use zabbix over ~<1000 physical and virtual hosts for historical resource usage and alerting.

Cacti for network traffic monitoring (and alerting using thold plugin).

Smokeping for network latency and availability monitoring and alerting.

duskwuff · on Oct 4, 2009

Ganglia is pretty good, although it's oriented more towards detecting problems than diagnosing them post-mortem.

josephruscio · on Oct 4, 2009

Are you looking only at free solutions? Or is proprietary software an option?

employfive · on Oct 4, 2009

Free, but feel free to recommend proprietary solutions; I'm curious about what's out there

josephruscio · on Oct 4, 2009

The startup I work for sells a product called "Load Monitor" (I know, real dynamic name) that provides application-centric, comprehensive monitoring (including cross-host rollup) with both a real time dashboard and historical access to collected data. Probably overkill if you only have a few servers, but it's nice if you have a few 1000: http://www.librato.com/products/load_monitor

ApolloRising · on Oct 4, 2009

You can try Munin, MRTG, or CACTI

brianr · on Oct 4, 2009

Cacti.

maelstrom · on Oct 4, 2009

rrdtool