MacroBase: Prioritizing Attention in Fast Data

wackspurt · on April 1, 2018

MacroBase's pipeline is broken up into the following operators: Transform, Classify, Explain.

I find the Explanation operator very valuable and haven't seen something like this in any work in monitoring/anomaly detection (correct me if I'm wrong). It finds attribute value combinations that are disproportionately concentrated in the outlier datapoints. Outliers are determined by the classifier by looking at metric value(s). The classifier can be based on static thresholds, percentiles, unsupervised/supervised learning algorithms, etc.

Example: "A mobile application manufacturer issues a MacroBase query to monitor power drain readings (i.e., metrics) across devices and application versions (i.e., attributes). MacroBase's default operator pipeline reports that devices of type B264 running application version 2.26.3 are sixty times more likely to experience abnormally high power drain than the rest of the stream, indicating a potential problem with the interaction between devices of type B264 and application version 2.26.3".

I think this sort of engine/system is valuable for detecting and isolating anomalies in telemetry data.

fouc · on April 1, 2018

> [a solution for the] relative scarcity of human attention and overabundance of data: return fewer results, prioritize iterative analysis, and filter fast to compute less.

>By combining streaming operators for feature transformation, classification, and data summarization, MacroBase provides users with interpretable explanations of key behaviors, acting as a search engine for fast data.

nycdatasci · on April 4, 2018

Is there anything to this platform beyond the two new SQL operators mentioned in the docs? https://macrobase.stanford.edu/docs/sql/docs/