Hacker News new | past | comments | ask | show | jobs | submit login

Tangentially related: I am currently scoping out an idea for how language models could be used to augment decompilers like Ghidra.

At a surface level, this was partially an intellectually interesting project because it is similar to a language translation project, however instead of parallel sentence pairs, I will probably probably be creating a parallel corpus of "decompiled" C code which will have to be aligned to the original source C code that produced the binary/object file.

Then I realized, the only way I could reasonably build this corpus would be by having some sort automated flow for building arbitrary open source C projects...

Perhaps I will attempt this project with a Go corpus instead.




an interesting project. go contains many source artifacts which make decompilation a bit more straight forward as well. I havent seen anyone really attempt this for go, but would be notable research


If it turns out that its easier for a language model to translate "Ghidra C" into readable Go code than to deal with CMake/Bazel/GNU autoconf/Ninja/Apache Meson/etc I wonder if that says more about the language model or the state of C/C++ toolchains...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: