I had an intern over the summer, working on a basic A/B Testing framework for ou...

arthurcolle · on Jan 18, 2015

I used Hadoop a few times this semester for different classes and it seemed like the code was so easy to write and then because everything is either a Mapper or a Reducer, you just read enough of the docs to figure out what is intended to be done and then build on top of it, can I ask how it wasn't maintainable?

Just curious

m_mueller · on Jan 18, 2015

On a tangent, I'd be interested in how you format heavily piped bash code for documentation. Can comments be intersparsed there?

mappu · on Jan 19, 2015

Functions, mostly - the big `awk` command in the example goes into something like

    # @param $1 whatever
    chess_extract_scores() {
         awk blah blah blah
    }

and then your whole pipeline simplifies to

    cat foo | grep bar | chess_extract_scores

which is pretty readable. You can even do most of this in a live bash session with ^X ^E.

plaes · on Jan 19, 2015

You can actually do without cat:

grep bar foo | chess_extract_scores

http://en.wikipedia.org/wiki/Cat_%28Unix%29#Useless_use_of_c...

theepauk · on Jan 19, 2015

Sure you can, but premature optimization is also a real thing http://en.wikipedia.org/wiki/Program_optimization#When_to_op...

dsr_ · on Jan 19, 2015

bash has functions; functions are just like commands. Write your comments in the functions, and your final line will be the pipeline of awesomeness:

   generate_data ()
       {
        # make it rain
       }

   process ()
       {
        # chunky
       }

   gather ()
       {
        # puree
       }
  
   generate_data | process | gather