I had an intern over the summer, working on a basic A/B Testing framework for our application (a very simple industrial handscanner tool used inside warehouses by a few thousand employees).
When we came to the last stage, analysis, he was keen to use MapReduce so we let him. In the end though, his analysis didn't work well, took ages to process when it did, and didn't provide the answers we needed. The code wasn't maintainable or reusable. shrug It happens. I had worse internships.
I put together some command line scripts to parse the files instead- grep, awk, sed, really basic stuff piped into each other and written to other files. They took 10 minutes or so to process, and provided reliable answers. The scripts were added as an appendix to the report I provided on the A/B test, and after formatting and explanations, took up a couple pages.
I used Hadoop a few times this semester for different classes and it seemed like the code was so easy to write and then because everything is either a Mapper or a Reducer, you just read enough of the docs to figure out what is intended to be done and then build on top of it, can I ask how it wasn't maintainable?
When we came to the last stage, analysis, he was keen to use MapReduce so we let him. In the end though, his analysis didn't work well, took ages to process when it did, and didn't provide the answers we needed. The code wasn't maintainable or reusable. shrug It happens. I had worse internships.
I put together some command line scripts to parse the files instead- grep, awk, sed, really basic stuff piped into each other and written to other files. They took 10 minutes or so to process, and provided reliable answers. The scripts were added as an appendix to the report I provided on the A/B test, and after formatting and explanations, took up a couple pages.