Example of an associated bug: https://community.splunk.com/t5/Archive/Splunk-Y2K...

raverbashing · on Sept 13, 2020

Oh my. Why are people matching an epoch with a regex?! Or you know, just think a couple of years ahead

Funny how the comment says "replace this before 2017" oh well.

> The app executes bash/Powershell at Splunk startup to check for the above regex and add a '6' if needed. It may not be the best fix, but it does kick-the-can on the problem for another 3 years, at least until epoch time reaches 1.7e10.

masto · on Sept 13, 2020

I’ve never used this piece of software (and their web site is remarkably uninformative about what it does, besides transform my enterprise), but from the comments it looks like some kind of search engine. It wouldn’t be too surprising to have a bunch of heuristics that try to extract meaning from unlabeled strings. Indeed, while viewing that very page, my iPad turned several of the epoch timestamps into telephone numbers.

So I don’t know if this is what actually happened, but this is a plausible reason for having such a regex that does not depend on stupidity: a “guess what this number could mean” routine written around 2015 might make include finding Unix timestamp-ish values not more than a few years into the future.

I’d still be pretty unhappy with the hard coded magic number approach and wish to see the specific requirements documented, along with some sort of tunable parameter for the range, as well as a test that verifies it works for the given requirements. Which gets into a fun and potentially philosophical exercise as well, since a test that passed yesterday should pass today, but this might be one case where “doesn’t fail with dates less than 5 years into the future” is a reasonable request.

pwg · on Sept 13, 2020

> I’ve never used this piece of software (and their web site is remarkably uninformative about what it does, besides transform my enterprise), but from the comments it looks like some kind of search engine.

It is an 'enterprise' log aggregator, storage system, and log search engine/alert generation engine.

One sets one's Java code (remember: "enterprise") to stream log output to splunk, and splunk handles receiving, storage, alert generation from programmed patterns matching, and archived log data search.

> It wouldn’t be too surprising to have a bunch of heuristics that try to extract meaning from unlabeled strings.

That is a very accurate description of just what it does.

np_tedious · on Sept 13, 2020

Left a few years buffer!

scrollaway · on Sept 13, 2020

How the hell do regexes like these find their way into production …

mannykannot · on Sept 13, 2020

There's an example of the sort of chronic (pun intended) patch-driven-development, YAGNI(Y) thinking that leads to this sort of thing, in the last paragraph:

"For instances that can't/won't get updated in time, this Splunk app can be deployed as a workaround. The app executes bash/Powershell at Splunk startup to check for the above regex and add a '6' if needed. It may not be the best fix, but it does kick-the-can on the problem for another 3 years, at least until epoch time reaches 1.7e10."

Someone, somewhere is saying "but it passed all the unit tests..."

a1369209993 · on Sept 13, 2020

> at least until epoch time reaches 1.7e10.

And to add insult to injury: that should actually be 1.7e9; 1.7e10 would be 17000M epoch seconds, not 1700M.

jcims · on Sept 13, 2020

Spunk also has field extractors where they automatically generate regexes from sample data that look a little bit like this.

foota · on Sept 13, 2020

Am I the only one that thinks this isn't completely unreasonable? Is it a hack? Definitely, but I don't see a much better way without more context of determining whether some number is likely to be a timestamp. Should it base it on a range of numbers determined at startup? Probably, but it's not fundamentally much different.

crehn · on Sept 13, 2020

A much saner heuristic could use a reasonable range (for some value of reasonable) around the current timestamp to check whether it's also one. The assumption being that logs will be streamed, hence any timestamp will likely refer to something that happened in the recent past.

0xbadcafebee · on Sept 13, 2020

Knowing Splunk, I'm surprised it was a regex and not a bunch of strchr()s

JdeBP · on Sept 13, 2020

See also https://news.ycombinator.com/item?id=24453575 .

fishywang · on Sept 13, 2020

first thought after seeing this: holy shit how did that pass code review?

on second thought, that's probably intentional to force their customers to upgrade at least once every 3 years.