Back when I worked at Google, the standard log processing tool was Dremel. You c...

mentat · on Jan 13, 2019

If you want streaming inserts to BQ, that becomes the biggest cost. Dataflow could be used to turn inserts into batch and gather interesting metrics that you don't want to hit BQ for, but I don't think anyone's open sourced anything in this space. I've implemented streaming inserts to BQ for logs "at scale" and it was at least an order of magnitude cheaper than splunk still. Happy to talk via email.

dominotw · on Jan 13, 2019

what was the monitoring solution at the end, where were BQ results going to?

mentat · on Jan 14, 2019

It was generic log aggregation used mostly for incident response and forensics as well as some offline metrics. There were a bunch of metrics that were being created (like 3 different ways) on box with parsing that we were looking at moving into the log processing stream. We had a chat bot that people could use to interact with common queries as well as standard SQL interaction via UI and API auth'd by Google IAM.

dominotw · on Jan 13, 2019

we use spark on hdfs.