Hacker News new | past | comments | ask | show | jobs | submit login

I take the author's viewpoint as: when the data is actually big, big data operations are for all practical purposes going to be run in batch mode - it's just hard to squeeze enough new data through the pipeline in ten minutes to change the statistical analysis of many terabytes of big data...if the data is so unpredictable that a snapshot isn't going to be meaningful then slowing operations down to allow concurrent writes isn't going to provide better analysis and what is really needed is not big data but some sort of near real-time stream processing.

All this comes with the assumption that much of what people consider big data isn't really big data.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: