Hacker News new | past | comments | ask | show | jobs | submit login

OP points to SPARK-12795 [0] and is all open source. They generate Java source code. You can read more at the prototype pull request: https://github.com/apache/spark/pull/10735 (I couldn't find a spec doc). If I understand correctly they insert a `WholeStageCodegen`[1] operator into the plan:

    /**
     * WholeStageCodegen compile a subtree of plans that support codegen together into single Java
     * function.
     *
     * Here is the call graph of to generate Java source (plan A support codegen, but plan B does not):
     *
     *   WholeStageCodegen       Plan A               FakeInput        Plan B
     * =========================================================================
     *
     * -> execute()
     *     |
     *  doExecute() --------->   inputRDDs() -------> inputRDDs() ------> execute()
     *     |
     *     +----------------->   produce()
     *                             |
     *                          doProduce()  -------> produce()
     *                                                   |
     *                                                doProduce()
     *                                                   |
     *                         doConsume() <--------- consume()
     *                             |
     *  doConsume()  <--------  consume()
     *
     * SparkPlan A should override doProduce() and doConsume().
     *
     * doCodeGen() will create a CodeGenContext, which will hold a list of variables for input,
     * used to generated code for BoundReference.
     */



[0] https://issues.apache.org/jira/browse/SPARK-12795 [1] https://github.com/apache/spark/blob/0e70fd61b4bc92bd744fc44...



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: