Also the article doesn't say WHY they didn't use Kafka to persist the events. Kafka it's designed to do that.
Once persisted the consumers can just read kafka data and send them to hadoop, with less latency. Or you can plug storm o spark in as you said and do the analisys there real time. Or both.
> When the system was built, one of the missing features from Kafka 0.7 was the ability of the Kafka Broker cluster to behave as a reliable persistent storage. This influenced a major design decision to not keep persistent state between the producer of data, Kafka Syslog Producer, and Hadoop. An event is considered reliably persisted only when it gets written to a file on HDFS.
Once persisted the consumers can just read kafka data and send them to hadoop, with less latency. Or you can plug storm o spark in as you said and do the analisys there real time. Or both.
I'm just intrigued why.