I wonder if companies built Hadoop clusters for large jobs and then also use them for small ones.
At work, they run big jobs on lots of data on big clusters. The processing pipeline also includes small jobs. It makes sense to write them in Spark and run them in the same way on the same cluster. The consistency is a big advantage and that cluster is going to be running anyway.
At work, they run big jobs on lots of data on big clusters. The processing pipeline also includes small jobs. It makes sense to write them in Spark and run them in the same way on the same cluster. The consistency is a big advantage and that cluster is going to be running anyway.