Writing code that writes code – Hack Codegen

PretzelFisch · on Aug 20, 2015

In my experience when codegen shows up as a solution, you took a wrong turn somewhere and are walking down the path of very brittle code that becomes a nightmare to maintain over time. Several times the code generation libraries don't get updated and your are left in a difficult spot when migrating your code to a newer version of your platform. but hey maybe that's just a .net and java thing.

munificent · on Aug 20, 2015

I think the motivation is pretty clear here:

    * It was not typed, and with the advent of Hack, we wanted to have types.
    * The actual setter method was not defined. If you searched the codebase,
      you wouldn't find it. If you looked at the class, you couldn't see which
      methods you could call, which made it harder to discover other calls,
      such as ifNotNullSetName.
    * IDEs can't autocomplete the code.

What this boils down to is metaprogramming. If you want to create an array that contains twenty items, you aren't going to do:

    array.add(1);
    array.add(2);
    array.add(3);
    ...

You'll use a loop, some imperative code. That's because data structures like arrays are within the domain of things you can touch with code.

But what about class definitions? What if you want to add twenty methods to a class? Fans of dynamic languages will rightly say that class definitions, functions, etc. should also be in the domain of what your code can affect. Smalltalkers and Rubyists will happily write imperative code to build and modify classes.

This is really powerful. It's exciting stuff.

At the same time, for better or worse, most programmers are not spending their day in a live editing environment. They aren't in a Smalltalk browser all day. They're in an IDE or text editor.

That means the window into the program that we have while editing doesn't see any of the results of that metaprogramming. All of the static analysis tools, auto-completers, go-to-definition-ers, etc. can't handle dynamic metaprogramming because they run before any of that metaprogramming is invoked.

Code generation is a way out of that problem. It lets you do imperative dynamic metaprogramming, but then bakes the result back out to the file system where the rest of your tool ecosystem can see it.

It's not the most elegant solution in the world, but I think it beats trying to reinvent the entire world of tooling that programmers use all day: grep, text editors, IDEs, etc.

nmrm2 · on Aug 21, 2015

> but I think it beats trying to reinvent the entire world of tooling that programmers use all day: grep, text editors, IDEs, etc.

Exactly, but only because we don't have a any really good solutions to building tooling that supports meta-programming for typed languages.

> the window into the program that we have while editing doesn't see any of the results of that metaprogramming

I wonder if there's a way to build IDEs that provide support for metaprogramming.

(Also, this was a really excellent comment.)

fizx · on Aug 20, 2015

There's sort of a survivorship bias in play here, where you only noticed the brittle codegen being brittle.

Ologn · on Aug 20, 2015

Also, codegen exists on a solution continuum. Codegen is a superior solution to manually pumping out lots of boilerplate code with slight variations.

A solution that does not need lots of boilerplate code would be even better, but codegen is still better than the makeshift code that it originally replaced.

ArkyBeagle · on Aug 21, 2015

There's an intermediate approach - use a generator to spew out tables of data that an engine-type thing uses to adapt to boilerplate style variations.

This is totally a 'C' hacker's approach but it's not bad for some things - especially serialization and the like. But it will need some generic reference mechanism - as void * is used in 'C' - to work.

fapjacks · on Aug 20, 2015

At the risk of sounding obtuse, then, where is the robust codegen code?

nostrademons · on Aug 20, 2015

On a continuum from "so robust you don't even consider it codegen" to "well, some people use it": microcode, compilers, virtualization, visual interface builders like XCode and Android Studio, dynamic HTML pages, ORMs, protobufs & other schema definition languages, autoconf & friends, compile-to-C languages like early versions of C++ and Go, ES6 transpilers like Traceur and Babel, compile-to-JS languages like CoffeeScript. Probably others I've forgotten or don't know about.

When it works, you forget that the only thing that computers understand is their native instruction set, and everything on top of that is built on some form of codegen. When it doesn't, you curse out the programmer who left you a pile of buggy scripts to maintain.

kazinator · on Aug 21, 2015

Nice easter egg: "autoconf & friends" buried in the middle of a paragraph that appears to be about robust code generation.

nostrademons · on Aug 21, 2015

"Well, some people use it." Particularly toward the end of the list, there'll be more and more examples where people hate the code generation aspect (for example, I've personally had bad experiences with any JS technology that requires a build step). That doesn't change the fact that they're widely used; autoconf, for example, was mandated for all Google-owned open-source C/C++ libraries, because despite its kludginess, it will build on virtually everything.

_asummers · on Aug 21, 2015

Java annotations have been used pretty successfully for several versions. Not all of them are code generating, but many (from e.g. Spring, or something smaller like Retrofit) generate tons of code for you.

bartonfink · on Aug 20, 2015

Compilers, for start.

fapjacks · on Aug 23, 2015

Hah! Touche!

kazinator · on Aug 21, 2015

It's probably a .net and Java thing.

You can find some crufty Lisp macros from 1983 (code that writes code, par excellence) and they still work today.

k__ · on Aug 20, 2015

It isn't.

I used at least one PHP to JavaScript code-gen in the past and it was rather bad.

I also dislike this Rails/Ember-CLI scaffolding stuff. It simply feels like too much magic for me.

aikah · on Aug 20, 2015

> I also dislike this Rails/Ember-CLI scaffolding stuff. It simply feels like too much magic for me.

Good thing nobody forces you to use scaffolding in Rails. Code generation should always be a one shot , to get started on something.

On the other hand the situation in Go land with "go generate" is much much worse, as code generators become dependencies , right in the code with pragmas ...

The worst offender is PHP of course, with frameworks like Symfony or Doctrine (and its famous proxies because yes, Doctrine doesn't actually use the code you write but the one it generates and put in a proxy folder ) that use code compilers and generators for everything from routing to dependency injection to every configuration step . Although PHP frameworks usually do it on the fly so no manual step.

joshribakoff · on Aug 20, 2015

I agree. They're also generating code for a language they made up. Seems very risky. Whether using code generation or not, having lots of repetitive getters & setters is a code smell. I feel like people just do that to feel more "professional" since they heard OOP was good to use, when a simple associative array would have sufficed. The answer isn't to start generating the repetitive code in my opinion, the answer is to switch to a data structure which doesn't involve repetitive code in the first place. That just my opinion though & its subjective.

justifier · on Aug 21, 2015

do you consider a compiler 'codegen'?

FractalNerve · on Aug 20, 2015

Cheers for the efforts, but that's a poor DSL or an atempt to create an Aspect-oriented version of Hack. You've gorgeous tools at facebook like for example 'pfff' [0] to play with ASTs, these could be used to build wonderful things given enough time. I suppose the ultimative goal is to allow business people to design workflows and let the tool generate the required code automatically. (I guess that's what Flow Based Programming is trying to achieve.) Kinda possibly with high-level specifications defined in languages better suited to the task like Racket, Rebol/Red, Xtend/Scala etc.

There is an interesting presentation by ThoughtWorks – Neil Ford: Building DSLs in Static and Dynamic Languages [1]

However I thought you might find this the EPFL / ETH Summer School on DSL Design and Implementation very useful. They not only come with presentations, but also code in Scala, Racket, Haskell et. al. I am currently working in a huge project with about >30 people working on building an MDD (Model Driven Development) platform for visually/declaratively writing things for the systems we use internally. These will be able to create models out of code and even fix API incompatibilities automatically. [2]

Excuse the Unicode, I got bored

⬬══════━━━━━━──⋯𝅖܅܅܅܅﴿⌁⏧⌁﴾܅܅܅܅𝅖⋯──━━━━━━═══════⬬

[0] https://github.com/facebook/pfff

[1] http://www.intertech.com/resource/usergroup/Neal_Ford-Buildi...

[2] http://vjovanov.github.io/dsldi-summer-school/program.html

jeanetienne · on Aug 20, 2015

Am I the only one to be bothered by the fact that both examples of codegen/generated code shows that you have to write more codegen to generate less actual code? In the first example there more than twice the amount of code written in the codegen than in the generated code... I know they are simple example but still, if I have to write more code for all the simple getters/setters, it defeats the purpose of the codegen which is to "reduce boilerplate". Am I missing something?

zeroonetwothree · on Aug 21, 2015

Each codegen is run multiple times. Imagine you want to generate 100 similar classes with similar code except for a small difference.

epaga · on Aug 21, 2015

But that would be a code smell akin to copy-pasting code wouldn't it?

I'm having trouble seeing any real-world scenarios for this kind of codegen but since it seems to be in wide use at Facebook, I must just be missing something...

hnruss · on Aug 21, 2015

"Notice that the manual section uses an ID to match it with the corresponding section when regenerating the code so that is placed in the same location."

    public function getName(): string {
      /* BEGIN MANUAL SECTION User::geName */

Wonder how well that works if there's a typo in the section comment?

zeroonetwothree · on Aug 21, 2015

I believe the comment is generated by the codegen, so it won't have a typo.

hnruss · on Aug 21, 2015

That makes more sense, thanks.

EGreg · on Aug 21, 2015

Real codegen would be like "Computer, make me a companion." "Here you go, would you like to customize this hologram?"