Internet-monitoring – A Docker stack that monitors your home network

Noah-Huppert · on April 11, 2021

Cool! It's so funny my internet has been bad just recently so I took an old bash script of mine which did something similar and have been upgrading it into a proper application which exports metrics to Prometheus and Grafana: https://github.com/Noah-Huppert/net-test

I saw a comment below where some was rolling their eyes that you "complicated" stuff with Prometheus, Grafana, and Docker and how you could just use Bash scripts and crons. As I just upgraded my codebase from this more bare metal approach to this "more complex setup" I'd like to mention: there's no way you could do time series statistical analysis easily with "just a cron job and a bash script". Prometheus and Grafana are for more than just buzz words. Prometheus offers an advanced time series database which allows you to, at minimum, do more robust analysis using data techniques like Histograms. As for Grafana, it makes exploring data dead easy.

Providing users with a Docker Compose setup is also something I did with my tool and the benefits are huge. It lets me distribute a setup which relies on multiple moving parts working smoothly together. Sure I could write a whole wiki on how you should setup Prometheus Grafana and my tool, or I could distribute the setup with a configuration as code tool. Ensuring that even if someone doesn't want to use Docker Compose they can at least read my configuration as code and see exactly what I did to setup my tool.

midrus · on April 11, 2021

Recently I was looking to add some metrics to a small script and I didn't want to deal with all the complexity of setting up grafana, prometheus, the push-proxy-thing (sorry, don't remember the name).

Looking for alternatives, I found InfluxDB, which was just one apt-get install away.

It comes with a web interface to create dashboards, supports push based metrics, and overall I'm loving it.

The web interface doesn't seem as powerful as grafana, but it covers most basic needs. And I think you can use it with grafana too.

armoredkitten · on April 12, 2021

>And I think you can use it with grafana too.

You can indeed; I use InfluxDB and Grafana to provide some stats on my home server. Pretty easy to set up, and Grafana just uses the Flux query language to pull the data from Influx.

nitrogen · on April 11, 2021

The closest analogous tool people would use back in the classic cron and bash days was MRTG.

jcims · on April 11, 2021

I've been tangentially looking at this as a 'beta' customer of Starlink, as the service is presently extremely variable with many small outages throughout the day. This kind of a collection of internet monitoring is super useful for the individual, but what I would really love to see is a way to federate the data in a meaningful way so that you can start to see operating patterns in the provider(s) being used.

fy20 · on April 12, 2021

RIPE Atlas does exactly this:

https://atlas.ripe.net/

If you are on a ASN that they don't have many existing probes for, they'll send you one for free.

imup · on April 12, 2021

We do something similar at https://imup.io but we run ping & speed tests from your home PC rather than shipping you a device. We approach the situation wanting to save folks money on their internet services for degradation in addition to providing analytics.

Thanks for sharing that link for Ripe though, super cool project.

jcims · on April 12, 2021

This is awesome thank you!

programd · on April 11, 2021

Use node exporter on your machines and Prometheus to scrape the metrics and you will be drowning in pretty graphs. Super easy to set up and gives you system details across all your machines, all in one place.

https://github.com/prometheus/node_exporter

_joel · on April 11, 2021

There are also script exporters (a few implementations) for simpler Nagios style checks and more specialised ones for getting things like Cloudwatch data (shout out to YACE[1] - with dashboards from promcat). It's really quite vibrant the ecosystem.

[1] https://github.com/ivx/yet-another-cloudwatch-exporter

nitrogen · on April 11, 2021

Does anyone still use SNMP to announce metrics/monitors?

_joel · on April 12, 2021

Of course but it's still in the traditional domain of networking kit (generally). As there are many components virtualised or software defined then I think in those areas, not so much as there are alternative ways to get the metrics. SNMP is a bit long in the tooth and can be a bit unweildy sometimes, ymmv etc.

Avamander · on April 11, 2021

In addition to monitoring your own network, you might want to monitor your ISP's and share the results for research - a software RIPE Atlas probe is a great method for that: https://labs.ripe.net/Members/alun_davies/ripe-atlas-softwar...

corndoge · on April 11, 2021

See also smokeping, vaping

https://github.com/20c/vaping

https://oss.oetiker.ch/smokeping/

avh02 · on April 11, 2021

I recently set up smokeping to gather some data, I can't say it was a breeze - documentation appears complete but misses some key pieces of information you have to dig for or trial/error it out.

E.g: i struggled figuring out where to put a non-standard probe

Failure modes were also quite weird... adding a slave to the mix broke my mind a little. Was never sure it worked but then didn't need it so retired it before i figured it out

Changing what you think is something trivial (e.g: frequency of data collection) invalidates your previously collected data files.

Other than that - fairly solid, did what i needed, runs without a fuss.

linsomniac · on April 11, 2021

I've set up smokeping a number of times over decades. Always a minor pain until the most recent one...

I used a docker container, and it was a breeze! `sudo docker run -it -p 8000:80 -e WIPE=y -e TARGET="ISP;NextHop;$IP" -d dperson/smokeping`

https://hub.docker.com/r/dperson/smokeping

mjlee · on April 11, 2021

https://github.com/SuperQ/smokeping_prober

This project is quite good for a smokeping-a-like for prometheus.

windexh8er · on April 11, 2021

I did something similar for continuous speedtest and tracking of home Internet [0].

[0] https://gitlab.com/splatops/cntn-speedtest

neilv · on April 12, 2021

For simpler needs, like being able to see when my network dropped out, without needing another device... On a OpenWrt home router, I usually install `collectd-mod-ping` and `luci-app-statistics`, and have it periodically ping a couple of different hosts that will answer pings reliably. (This can double for monitoring your third-party hosting.)

Caveat: since you're probably running OpenWrt on a consumer SoHo router, I recommend sending this logging data to a cheap USB flash drive, not wearing out the router's onboard flash.

An even simpler alternative would be to keep a browser window open with the OpenWrt admin UI's "Status -> Realtime Graphs" displaying.

(If you're new to OpenWrt, a fairly recent router hardware model that's easy to re-flash with solid OpenWrt, and fairly inexpensive: https://openwrt.org/toh/netgear/r7800 )

2dvisio · on April 12, 2021

Interesting lots of people had the same idea. I have done something similar for my home network using an Ethernet connected RaspberryPI dedicated to this task using a cronjob, the speedtest-cli JSON format, MongoDB and just some JS graphs. - very rudimentary.

One thing I found very important was to plot the scatter of PING vs. Download speed (my Upload is super stable).

Then I created the probability density function of that (using KDE) and used it to get a better deal from my network provider - I could prove they were selling me a slower network most of the time - they now allocate 70Mb/s on a 50Mb/s deal so just that they hit the 50Mb/s most of the time.

It also allowed me to clearly identify some outlier situations where my network suddenly became 'bad' and I could show evidence of it being bad over time.

Certainly a project that should see more traction!

jtchang · on April 11, 2021

I've been toying around with the idea of building a small device or widget to display things like this. There is hardware like this for temperature and humidity and such but nothing for internet health.

If anyone is interested feel free to contact me. Would love to learn about what you would want in such a device.

Namidairo · on April 12, 2021

Sounds similar to the SamKnows Whitebox, which is a MT7621+MT7603+MT7612 router from which they run their speedtests and monitor various metrics. They don't actually run clients off those radios, they just scan and sniff.

I don't think regular consumer is their target audience however, they seem more geared to ISP's and government agencies who need performance data over a large amount of connections.

I suppose the problem with wanting to build a small device that does this, is finding the balance between cost-effective and actually being able to pull off 1gbit throughput on it.

Of course, this all assumes you were going the repurposed hardware route as opposed to design from scratch. Doing it from scratch would have it's own costs and pitfalls. (Board design, manufacture, licensing costs, etc)

syoc · on April 11, 2021

Cool. I have read a lot of approaches to home network monitoring and everybody seems to do things a little bit different. Telegraf has built-in modules for ping and dns latency. I recommend it as an alternative to prometheus.

linsomniac · on April 11, 2021

Telegraf is just going to do the "probe" component, right? You're still going to need Prometheus or Influxdb for storing the timeseries, and Grafana for visualizing I expect.

aequitas · on April 11, 2021

Grafana and prometheus are a very nice combo for all sorts of monitoring. For my current setup I have added victoriametrics which is a drop-in replacement for prometheus. I like it more for home setup as it is easier on disk-space usage and the collection client (vmagent) does disk buffering. So I run the collection on a battery backupped raspberry pi which uses remote_write to send it to victoriametrics on my home server (which has a less higher uptime than the raspberry pi) without the risk of losing precious metrics.

sdfhbdf · on April 11, 2021

I use telegraf with Ping and speedtest cli that both output to influxdb which then I browse with Grafana as well. Kind of similar but not quite. I also orchestrate with docker-compose on my RPi and I find this quite pleasent but in the end you should choose what you’re most familiar with. I didn’t want to spend my innovation tokens on promotheus hence I’ve gonna in what I viewed them as simpler, with telegraf.

I ping Google, CloudFlare, Quad9 and Amazon EC2 and plot it to see the any unusual spikes.

guillaumerose · on April 11, 2021

Remember the time the same job could be done with a simple cron, rrd and cacti for the storage and apache.

Now you need "complicated" things: prometheus, grafana, docker, etc.

I am a bit puzzled. Is it because sysadmin tutorials from 2000's are not showing up anymore in Google ? Why these tools failed to stay popular ?

Jiejeing · on April 11, 2021

You can still do it, but the "old school" way of doing it is much less flexible.

For this kind of setup, prometheus and grafana are not complicated (close to 0 configuration), and docker mostly works out of the box on linux hosts, barring nftables shenanigans. You end up with something that "just works", is pretty, and does not require fiddling.

Technically I do find this overkill for "home network monitoring", but at least it does not require k8s so I won’t complain.

toyg · on April 11, 2021

> at least it does not require k8s

One would think that should be a hell of a low bar. It’s like having to choose a car and going “Trabant will do, at least it does not require a crew of seamen”.

midrus · on April 11, 2021

For a home setup having to bring up 3 containers just to have some metrics is too much in my opinion.

I've been using InfluxDB at home just because of how easy it is to setup and use.

linsomniac · on April 11, 2021

As someone who has set up a number of "simple" visualizing systems using the exact toolpath you mention (cron, shell scripts, rrdtool), I can say with some authority that getting them right was never simple.

And the load of dozens to hundreds of data points in rrd is shockingly large. Our infrastructure utilization dropped by 30% when I switched from a Graphite/collectd setup to gathering MORE data more frequently with telegraf and influxdb.

enobrev · on April 11, 2021

rrd and cacti are considerably more complicated and far less useful than prometheus and grafana.

I generally agree with the rest of your point, but the new-fangled infrastructure tools do have some merit and I look forward to seeing where we go with them.

hoophoop · on April 11, 2021

Don't be fooled by the rain of downvotes that some comments are getting: the HN crowd is incredibly biased towards hyped stuff and CV-driven development.

I worked in FAANGS where simplicity really mattered over hype.

xfitm3 · on April 11, 2021

Docker is a pain in the butt. I much prefer "bare metal" and VMs. I have tried to get into it but I can't say I see the appeal.

The best I can make of it is to treat docker as a package manager. All the dependencies in one place.

angrais · on April 11, 2021

Could you expand on why Docker is a pain for you?

hacker_newz · on April 11, 2021

Prometheus as a monitoring tool is definitely a lot more complex and does not 'just work' straight out of the box. You need to configure Prometheus to know where to scrape using service discovery, set up different agents on your hosts to export metrics, build graphs in Grafana using PromQL to look for specific labels, and create alerts Alert Manager for every alert you want to receive.

_joel · on April 11, 2021

You don't even need cacti or rrd if it's just taking a value and putting it in a file. However these newer tools are much more flexible that cacti and RRD. I find using Grafana miles better than dealing with RRDs, beter granularity and more options for querying data/overlaying results from different metrics on the fly etc.

lbotos · on April 11, 2021

A lot of people are now building huge container based SaaS software in their $Day_job, so in their free-time they use those technologies because they are more familiar, or they want the practice.

I ran K3s for a few weeks on ODroid hosts at home to get more k8s experience.

gmuslera · on April 11, 2021

I would use Telegraf (a better/distributed collector far more efficient than cacti, even if you have to learn the OIDs for the snmp devices you want to monitor, because not everything is snmp anyway), InfluxDB (a timeseries database better than rrd, is not the only alternative, but at least it works for me), and yes, Grafana.

The cron gives you low granularity (and for some things, it may matter to measure things that changes in less than a minute), and flexibility (both in what you can measure and how you can arrange and correlate the different pieces of information).

rcarmo · on April 11, 2021

Well, FWIW, I’m running a simple smokeping Docker container.

Just works, sits neatly alongside a few extra services (managed via compose) in a little ARM dev board and has kept on ticking for around five years without any hassles using exactly those bits you mentioned.

The new things are shinier, I think, and on the other hand a lot of the old, simpler tools have become less relevant.

As an aside, I recall a younger self annoyed at the inability to get data out of RRDTool into a nice SVG chart. These days I just want the data in whatever format the best tool spits out...

midrus · on April 11, 2021

See my other comment in this thread. I've found InfluxDB a lot easier to setup and use (just apt-get install) and comes with push support, dashboards web ui, etc.

I'd say it is easier and more powerful than rrd and cacti (used them like 10 years ago!)

nine_zeros · on April 11, 2021

Reinventing the wheel is the new mantra.

_joel · on April 11, 2021

It's really not, Prometheus and Grafana can do a hell of a lot more than Cacti/RRDs

colemannugent · on April 11, 2021

They called to stay popular because they kinda suck. The setup you described works on one host and produces static graphs.

With Prometheus and Grafana you can monitor an entire data center and add visualizations for metrics as you need them with no change to the monitored hosts.

TLDR: the old software stack could be made to do this, modern monitoring stacks do it out of the box.

izacus · on April 11, 2021

So people regularly run datacenters at home that need monitoring? That's what the original article is about after all.

deadbunny · on April 11, 2021

You should check out r/homelab, the answer is yes.

orhmeh09 · on April 11, 2021

I think they meant to give an example to indicate the flexibility of the platform. You know nobody is claiming people regularly run data centers at home — why are you asking?

Avamander · on April 11, 2021

It's a great way to learn without breaking things for a few hundred thousand users.

atat7024 · on April 11, 2021

That is correct.

All of the SEO skiddies are the ones pimping Kubernetes for your personal homepage and a few links.

And it shows. Lots of ways to build things fast, but nobody's building cool, lasting shit.

mschuster91 · on April 11, 2021

> Lots of ways to build things fast, but nobody's building cool, lasting shit.

No way to make money with something that's well-built with the intention to last a life time. It's the "planned obsolescence" of the software world...

Havoc · on April 12, 2021

Reminder that Speedtest can chew through a fair bit of data for those on metered connections

dikaio · on April 12, 2021

Simon releases so many community based projects how can you not love this guy.

DavideNL · on April 11, 2021

Another similar alternative: https://github.com/henrywhitaker3/Speedtest-Tracker