OP points to SPARK-12795 [0] and is all open source. They generate Java source code. You can read more at the prototype pull request: https://github.com/apache/spark/pull/10735 (I couldn't find a spec doc). If I understand correctly they insert a `WholeStageCodegen`[1] operator into the plan:
/**
* WholeStageCodegen compile a subtree of plans that support codegen together into single Java
* function.
*
* Here is the call graph of to generate Java source (plan A support codegen, but plan B does not):
*
* WholeStageCodegen Plan A FakeInput Plan B
* =========================================================================
*
* -> execute()
* |
* doExecute() ---------> inputRDDs() -------> inputRDDs() ------> execute()
* |
* +-----------------> produce()
* |
* doProduce() -------> produce()
* |
* doProduce()
* |
* doConsume() <--------- consume()
* |
* doConsume() <-------- consume()
*
* SparkPlan A should override doProduce() and doConsume().
*
* doCodeGen() will create a CodeGenContext, which will hold a list of variables for input,
* used to generated code for BoundReference.
*/