Grafana v6.2

mikepurvis · on May 24, 2019

Tried Grafana briefly a year or two ago, and I wanted to like it, but similar to Kibana, it's laser-focused on the task of realtime monitoring current data. I wanted to use it for a high-level view of historical stuff (robot data recordings from ROS), and there was a lot of really basic functionality for that use case that just wasn't there at all.

Even stuff as basic as being able to pan a plot back and forth after you've zoomed in— here's the four year old ticket for that in their issue tracker: https://github.com/grafana/grafana/issues/1387

I ended up generating Bokeh plots and had a much better time. So Grafana is great for what it's great at, but I don't recommend it for uses other than current-moment data.

dTal · on May 24, 2019

It's solely focused on data analysis (none of this "dashboard" stuff), but I've found Kst[0] to be the lowest-friction way of going from plaintext time-series data to fancy, zoomable, draggable plots in whatever layout you want. It handles realtime data very handily as well, happily scaling with as much backlog as I could throw at it.

Bokeh is a nice kit as far as it goes, but I hit scaling problems quite early with it - as I understood it, it was supposed to send incremental updates to the client, but in fact it resent everything every time - and therefore fell over after half an hour when the time to update exceeded the update rate. Maybe I was holding it wrong.

[0] https://kst-plot.kde.org/

snowwindwaves · on May 24, 2019

kst is amazing! I write CSV files with 100 series about 10 samples per second for 8-12 hours a day and it doesn't break a sweat plotting the whole day, any number of series, it does transformations as required, and the next day when I have a new CSV file I can point it at the new file and keep all the plots I configured the previous day.

KST has been a total game changer for me commissioning plants and machines.

new4thaccount · on May 25, 2019

Can you go into more detail for how you use it?

snowwindwaves · on May 25, 2019

I don't have access to a proper data acquisition system that can sample at kHz rates and I'm not willing to buy one out of pocket.

SCADA systems and HMIs are generally not set up to poll data from PLCs faster than 1 Hz.

Using a driver for the PLC communications protocol I write all of the variables of interest to a CSV file at 10 Hz.

PLC scan times are usually 10-100Hz so while I can't capture everything the PLC sees or higher frequency components to the signals than the PLC can measure, it is happy middle ground between a proper DAQ and just using the HMI software.

In addition a DAQ wouldn't be connected to all of the PLC IO but with this system I can easily grab all the PLC tags I want, as well as internal PLC tags that are not IO points.

Most HMI software has pretty brutal plotting capabilities as well (Citect process analyst being the only one that is better than passable), so on top of getting at least 10x the resolution in the data I get to use KST which is great for zooming and panning on the plots, creating sets of plots, or different plots for specific tests.

Not all HMI software even allows plot configurations to be saved, so you can spend a lot of time just re-adding the time series to the plot and setting the scales.

The plots are used for commissioning reports and records.

new4thaccount · on May 25, 2019

Interesting and thanks! I use time historians for grid level operations, but the data coming from a single generator is somewhat limited to things like real and reactive power (among others) and the economic data is submitted in a different way.

snowwindwaves · on May 25, 2019

Although we sometimes supply historians they are not usually part of the up-front controls and commissioning contract so I am not as familiar with them. Usually part of the reason the HMI software plotting capabilities are such shit is so that the HMI vendor can try to up-sell a historian.

nopzor · on May 24, 2019

thanks for the feedback. there's definitely a lot of validity to what you're saying, for what you describe.

grafana has traditionally been used for 'real time dashboarding and analytics' in the IT/devops world. that's the original use case, and its sweet spot, as you allude to.

but, since the beginning, the mission of the open source grafana project has had nothing to do with IT per se. it was about democratizing metrics; helping teams understand their 'systems', by breaking down silos between databases and people.

over the last few years, interesting things are afoot in grafana community. we're seeing grafana used for more and more non-IT use cases. it's being deployed in the industrial and business worlds. about 10-20% of the grafana community now deal with things that have nothing to do with IT/devops.

the 'systems' are no longer limited to things like servers, switches, containers and clusters. these emerging users deal with things like temperature sensors, dollars, robots and ambulances. we are making progress in bringing grafana to these worlds, while also ideally improving it overall.

there's tangential threads in various stages of recent completeness (none of which solve your specific issue admittedly). things like sql support, general focus on ad-hoc analysis with ('explore'), the upcoming abstraction around being able to better use ui components within grafana ('grafana/ui'), improved support for tabular data, new panels, etc.

sorry about the four year old issue; i'd be lying if i said there weren't myriad things we'd like to do, that don't make the cut not due to desire but due to time and resources.

again, thanks for the feedback, please know that we're very interested in continuing to develop and improve grafana for use cases like yours!

-r

[disclosure, very biased and opinionated response. am co-founder/ceo at grafana labs. lucky enough to work with torkel and the team on making grafana better]

mikepurvis · on May 25, 2019

Thanks for the response and for a pretty cool open source project! Sorry my comment dumping on it ended up being the top of the thread here. FWIW, I definitely had a nicer time trying out Grafana than I ever have fighting with Kibana, and I definitely liked that I was able to use it with SQL based datasources rather than just Elastic.

bjelkeman-again · on May 25, 2019

We are at the beginning of trying out Grafana for real time monitoring of a combined indoor fish and vegetables farm (aquaponics).

rb808 · on May 24, 2019

I had the same problem. I also want simple binary metrics and batch related updates like are batch jobs running on time or over due. I really want to avoid writing my own dashboard but seems the only way.

xmichael999 · on May 24, 2019

I use it for my Pi based solar production and house monitoring and like it. It was the first time i've ever used it. I found it fit perfect for this project. I was surprised when I tried to use it for literally anything other than raw linear data input it was completely useless. Neat product, but not sure why it gets so much attention given how it really one does one very very specific task... I guess it looks neat...

jackfraser · on May 25, 2019

Actually, this greatly comes down to the datasource. Each exposes different analytical functionality at the database level. Combined with the templating and variable functionality, you can do a ton of excellent analysis on all kinds of metrics.

What kind of data were you wanting to analyse?

sgt · on May 24, 2019

I had a similar problem. I think Grafana or Chronograf still has its place, but based on what you've said I'll try replacing some of our dashboards with a Bokeh generated dashboard. The library looks very powerful. What is the best practice of updating the plots (e.g. once a minute)?

mikepurvis · on May 24, 2019

There are several strategies for live-updating bokeh plots; tbh I'm not sure which ones are best. The officially supported one is Bokeh Server: https://demo.bokeh.org/surface3d

In any case, for me it's "the robot had a problem sometime in this 30 minute window, please dig through 3GB of logs to figure out what went wrong", and doing a bunch of pandas crunching upfront and dumping out a bunch of time series plots makes that kind of task really straightforward.

lykr0n · on May 24, 2019

Sounds like you want to look at Redash or other like BI platforms.

markbnj · on May 25, 2019

A fun little thing to do if you want to play around with the new release: install prometheus, prometheus node_exporter, grafana, and grab the node_exporter full dashboard from the grafana site. In like 10 minutes you've got a pretty cool system info dashboard for your laptop :).

latchkey · on May 26, 2019

I did this for 1500 raspberry pi-class litecoin miners. It was awesome to be able to see all the data in realtime across so many devices.

tbarbugli · on May 24, 2019

Lazy loading is a feature I was waiting for long time, hopefully this time is here to stay!

retzkek · on May 24, 2019

If your dashboards have so many panels that lazy loading is important, you need to reconsider your dashboard design. Endlessly scrolling to find the panel you're interested in makes for a painful user experience, and it makes it harder to compare series across panels.

I aim to keep dashboards no larger than what can be displayed in a single window on a desktop, with perhaps some supplementary plots below, often in a collapsed row. I make heavy use of "drill-down" links, preferably from tables or single-stats (or more often the "status panel" for denser displays [1]), or in panel notes otherwise, to dive further into the data.

When designing a dashboard, I ask myself "what story is this dashboard going to tell?", and as with a good novel I try to keep from straying too far from that narrative, branching side-plots out into new dashboards as needed.

[1] https://grafana.com/plugins/vonage-status-panel

mtrpcic · on May 24, 2019

It depends on what your use case for the dashboard is. If the dashboard is meant for constant display, then yes, too many panels is a bad user experience. On the other hand, if you're trying to create a "Oh my god something is wrong in production right now, show me everything so I can see at a glance" dashboard, I would rather scroll than jump between 3 tabs because of "aesthetics".

retzkek · on May 24, 2019

Aesthetics? How about ergonomics? In the same vein as "alert fatigue," having too many panels on a dashboard (or too many lines on a graph, etc) can overload the user, and obscure the real issues.

> show me everything so I can see at a glance

Yes, exactly, at a glance, not after scrolling through five pages. With careful dashboard design, you should be able to see a problem area actually "at a glance," and then drill-down to pinpoint the actual cause, faster than you'll find it scrolling through a single large dashboard.

I admit this is something of an ideal to aim for, and it can take a lot of time and effort to achieve, which may not be available. However, it will pay off in the "Oh my god something is wrong in production right now" scenario if you can take that time.

zepolen · on May 24, 2019

Agree, designing a good dashboard is a skill just like anything else it comes with experience.

zepolen · on May 24, 2019

It's not about aesthetics it's about identification of problems in O(log n) instead of O(n) time.

Scrolling through 100 graphs is slower than scanning 10 main graphs and 10 subgraphs per.

mkching · on May 24, 2019

Same here, we just deployed a test instance running the upgraded code. No issues in our existing dashboards so far, and our initial testers are liking the speed improvements a lot.

NickBusey · on May 24, 2019

Those new gradient bar gauges look great, can't wait to use them on some environmental data.

colechristensen · on May 24, 2019

I have been having some fun recently with Grafana session storage. In the end it seemed like a database issue which was unavoidable because there aren't other options for session storage when you use a db (other than downgrading to 5.x)

After endless grinding with configuration options, debugging go code (new skill) and javascript running in the browser I tried switching out to a local mariadb and it worked instantly. Lesson learned, be wary of a Galera mysql cluster. My running but unproved theory that during the login process the creation of the user token in the db and reading it back happen so quickly that the item isn't available yet so Grafana can't find it and logs the user out.

BossingAround · on May 25, 2019

If I have a CSV of a number of values (say in the thousands), and what I need is basically a tool that will create a slick, good-looking graph that compares two or more of these CSVs, what's the best tool for that? Think JMeter if that rings any bells.

I honestly just used some basic graph-generation tools which would spit out PNGs, which is always less than satisfying. I looked at Grafana, but never had the time to actually try it out. My feeling also was that it was a bit different of a use-case as I had no real-time data, but I may be totally wrong here.

wielebny · on May 25, 2019

We have JMeter to pump the data to InfluxDB and then visualise them in Grafana.

Additionaly, you can see results while JMeter is still running.

dogtail · on May 24, 2019

Waiting for better Loki integration.

netingle · on May 24, 2019

Loki author here: got some ideas? We’re all ears!

physicles · on May 26, 2019

Allow me to view log lines nearby a specific line that showed up in a search, kind of like grep -C n. Use case is that I’m seeing HTTP 500s from my web service. I can isolate the log lines for those requests, but if I want to see the extra lines (perhaps for a crash) that led up to it, I have to remove my search filter and manually set a time range that’s small enough that it returns fewer than 1000 lines.

That, and retention policies. I just had a disk fill up because Loki keeps saving data.

That said, I’ve been using Loki for several months and I love it. Keep up the great work!

tmzt · on May 25, 2019

Reading up on Loki, it seems it lacks full-text search, prefering something called 'distributed grep'.

I'm also looking at sonic, which is a new full-text search engine in rust with less overhead than Elasticsearch. It's lacking a gui focused on log search.

Could they work together somehow, maybe as an alternative backend to you distributed grep?

Sonic does not store the original content, so it could store a reference into your compressed chunk.

mgbmtl · on May 24, 2019

Hi! Any plans to release binaries? If I am currently using 1% of ELK features (I use it for simple log aggregation of servers), but not k8s, would you still recommend loki?

netingle · on May 24, 2019

Yes we do! Plan on cutting v0.1 in the next week or so, kubecon kinda got in the way... I’ll work on adding some binaries to that - what platform you looking for?

nickserv · on May 24, 2019

Not the OP but in the same situation. I run Debian 9 on my servers, would be great to have some debs to try it out.

Thanks!

mgbmtl · on May 24, 2019

Great, thank you! I look forward to testing it. I also use Debian 9 (or stable, slowly moving towards Debian 10 which will soon be released).

Hansi · on May 24, 2019

+1 RedHat 7 and 8.

oeoe · on May 25, 2019

Alerting on certain log entries, please :)

Even a simple label match would be a good start.

Sytten · on May 25, 2019

Great job. Though I am still waiting for official CSP support, it seems like it should already be there. Unfortunately the legacy angular code prevents us from applying any real policy.