I've spent almost a year with Spark and I think I'm just scratching the surface now. There are so many knobs that just configuring out just the optimal cluster configuration for production jobs took weeks of testing (so much of that is dependent on the data size and specific use-case). The docs are pretty good, but really don't detail any of 'gotchas' that you'll find in production (you have google relentlessly for those), and the hacks you put in place to deal with those, well, you're on your own. Unless you have a staff that has worked with the toolset for years with it (which we don't.. I'm basically it), you will spend weeks in a try / hack loop. All that said, it's a great toolset for its intended purpose..scala is a great language, etc, etc.. but I've spent a long time 'figuring the system out'.