Hacker News new | past | comments | ask | show | jobs | submit login

I had an intern over the summer, working on a basic A/B Testing framework for our application (a very simple industrial handscanner tool used inside warehouses by a few thousand employees).

When we came to the last stage, analysis, he was keen to use MapReduce so we let him. In the end though, his analysis didn't work well, took ages to process when it did, and didn't provide the answers we needed. The code wasn't maintainable or reusable. shrug It happens. I had worse internships.

I put together some command line scripts to parse the files instead- grep, awk, sed, really basic stuff piped into each other and written to other files. They took 10 minutes or so to process, and provided reliable answers. The scripts were added as an appendix to the report I provided on the A/B test, and after formatting and explanations, took up a couple pages.




I used Hadoop a few times this semester for different classes and it seemed like the code was so easy to write and then because everything is either a Mapper or a Reducer, you just read enough of the docs to figure out what is intended to be done and then build on top of it, can I ask how it wasn't maintainable?

Just curious


On a tangent, I'd be interested in how you format heavily piped bash code for documentation. Can comments be intersparsed there?


Functions, mostly - the big `awk` command in the example goes into something like

    # @param $1 whatever
    chess_extract_scores() {
         awk blah blah blah
    }
and then your whole pipeline simplifies to

    cat foo | grep bar | chess_extract_scores
which is pretty readable. You can even do most of this in a live bash session with ^X ^E.


You can actually do without cat:

grep bar foo | chess_extract_scores

http://en.wikipedia.org/wiki/Cat_%28Unix%29#Useless_use_of_c...


Sure you can, but premature optimization is also a real thing http://en.wikipedia.org/wiki/Program_optimization#When_to_op...


bash has functions; functions are just like commands. Write your comments in the functions, and your final line will be the pipeline of awesomeness:

   generate_data ()
       {
        # make it rain
       }

   process ()
       {
        # chunky
       }

   gather ()
       {
        # puree
       }
  
   generate_data | process | gather




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: