Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Codemodder – A new codemod library for Java and Python (codemodder.io)
37 points by nahsra 5 months ago | hide | past | favorite | 8 comments
Hi HN, I’m here to show you a new codemod library. In case you’re not familiar with the term "codemod", here’s how it was originally defined AFAICT:

> Codemod is a tool/library to assist you with large-scale codebase refactors

Codemods are awesome, but I felt they were far from their potential, and so I’m very proud to show you all an early version of a codemod library we’ve built called Codemodder (https://codemodder.io) that we think moves the "field" forward. Codemodder supports both Python and Java (https://github.com/pixee/codemodder-python and https://github.com/pixee/codemodder-java). The license is AGPL, please don’t kill me.

Primarily, what makes Codemodder different is our design philosophy. Instead of trying to write a new library for both finding code and changing code, which is what traditional codemod libraries do, we aim to provide an easy-to-use orchestration library that helps connect idiomatic tools for querying source code and idiomatic tools for mutating source code.

So, if you love your current linter, Semgrep, Sonar, or PMD, CodeQL or whatever for querying source code – use them! If you love JavaParser or libCST for changing source code – use them! We’ll provide you with all the glue and make building, testing, packaging and orchestrating them easy.

Here are the problems with existing codemod libraries as they exist today, and how Codemodder solves them.

1. They’re not expressive enough. They tend to offer barebones APIs for querying code. There’s simply no way for these libraries to compete with purpose-built static analysis tools for querying code, so we should use them instead.

2. They produce changes without any context. Understanding why a code change is made is important. If the change was obvious to the developer receiving the code change, they probably wouldn’t have made the mistake in the first place! Storytelling is everything, and so we guide you towards making changes that are more likely to be merged.

3. They don’t handle injecting dependencies well. I have to say we’re not great at this yet either, but we have some of the basics and will invest more.

4. Most apps involve multiple languages, but all of today’s codemod libraries are for one language, so they are hard to orchestrate for a single project. We’ve put a lot of work into making sure these libraries are aligned with open source API contracts and formats (https://github.com/pixee/codemodder-specs) so they can be orchestrated similarly by downstream automation.

The idea is "don’t write another PR comment saying the same thing, write a codemod to just make the change automatically for you every time". We hope you like it, and are excited to get any feedback you might have!




How does libCST compare to e.g. pyCQA/redbaron? What about for EA Evolutionary Algorithms; does it preserve comments, or update docstrings and type annotations in mutating the code under test?

Is it necessary to run `black` (and `precommit run --all-files`) to format the code after mutating it?

Instagram/LibCST: https://github.com/Instagram/LibCST

PyCQA/redbaron: https://github.com/PyCQA/redbaron

E.g. PyCQA/bandit does static analysis for security issues in Python code: https://github.com/PyCQA/bandit

https://news.ycombinator.com/item?id=38677294

https://news.ycombinator.com/item?id=24511280 ... https://analysis-tools.dev/tools?languages=python


Hi! Great questions. I'm the lead maintainer of the Python version of the Codemodder framework so I'll do my best to answer.

> How does libCST compare to e.g. pyCQA/redbaron?

LibCST is similar to redbaron in the sense that it does preserve comments and whitespace. The "CST" in LibCST refers to "concrete syntax tree", which preserves comments and whitespace, as opposed to an "abstract syntax tree" or "AST", which does not. Our goal is to make the absolute minimal changes required to harden and improve code, and messing with whitespace would be counter to that goal. It's worth noting that redbaron no longer appears to be maintained and the most recent version of Python that it supported was 3.7 which is now itself EOL.

> What about for EA Evolutionary Algorithms

Can you elaborate? I am familiar with the concept of evolutionary algorithms but I'm not sure I understand what you mean in this context.

> does it preserve comments, or update docstrings and type annotations in mutating the code under test?

Codemodder does preserve comments. Currently none of our codemods update docstrings; I'm not sure we currently have any cases where that would make sense. We do make an effort to update type annotations where appropriate.

> Is it necessary to run `black` (and `precommit run --all-files`) to format the code after mutating it?

Yes, it is currently necessary to run `black` and `precommit` if you're using it on your project. While `black` is incredibly popular, we also can't assume that it's being used on any given project. Running `black` would cause each updated file to be completely reformatted which would lead to very noisy and difficult-to-review changes. I would like to explore better solutions to this issue going forward.

I am familiar with `bandit`. It's a fairly simple security linter and is useful for finding some common issues. It's also pretty prone to false positives and noisy findings. Not every problem identified by `bandit` is something that can be automatically fixed; for example I can't replace a hard-coded password without making a lot of (breaking) assumptions about the structure of your application and the manner in which it is deployed.

I'd love to get your feedback on Python Codemods! Give us a star on GitHub and feel free to open an issue or PR: https://github.com/pixee/codemodder-python


Thanks for your reply!

I think they called it an FST "Full Syntax Tree", which is probably very similar to a CST "Concrete Syntax Tree". At the time that moses was written, Python's internal AST hadn't sufficient code to mutate sufficiently for moses' designs.

MOSES: Meta-Optimizing Semantic Evolutionary Search :

https://wiki.opencog.org/w/Meta-Optimizing_Semantic_Evolutio... :

> All program evolution algorithms tend to produce bloated, convoluted, redundant programs ("spaghetti code"). To avoid this, MOSES performs reduction at each stage, to bring the program into normal form. The specific normalization used is based on Holman's "elegant normal form", which mixes alternate layers of linear and non-linear operators. The resulting form is far more compact than, say, for example, boolean disjunctive normal form. Normalization eliminates redundant terms, and tends to make the resulting code both more human-readable, and faster to execute.

> The above two techniques, optimization and normalization, allow MOSES to outperform standard genetic programming systems.

opencog/asmoses: https://github.com/opencog/asmoses

MOSES outputs Combo (a LISP), Python as an output transform IIUC, and now Atomese with asmoses, which links to a demo notebook: https://robert-haas.github.io/mevis-docs/code/examples/moses...

Evolutionary algorithm > Convergence: https://en.wikipedia.org/wiki/Evolutionary_algorithm#Converg...

/? mujoco learning to walk [with evolutionary selection / RL Reinforcement Learning] https://www.google.com/search?q=mujoco+learning+to+walk&tbm=...

...

Semgrep: https://en.wikipedia.org/wiki/Semgrep links to OWASP Source Code Analysis Tools: https://owasp.org/www-community/Source_Code_Analysis_Tools

But what's static analysis or dynamic analysis source code analysis without Formal Verification?

"Nagini: A Static Verifier for Python": https://pm.inf.ethz.ch/publications/EilersMueller18.pdf https://github.com/marcoeilers/nagini :

> However, there is currently virtually no tool support for reasoning about Python programs beyond type safety.

> We present Nagini, a sound verifier for statically-typed, concurrent Python programs. Nagini can prove memory safety, data race freedom, and user-supplied assertions. Nagini performs modular verification, which is important for verifi- cation to scale and to be able to verify libraries, and automates the verification process for programs annotated with specifications.

Deal > Formal verification > Background; Hoare logic, DbC Design by Contract, Dafny, Z3: https://deal.readthedocs.io/basic/verification.html#backgrou... :

> 2021. deal-solver. We released a tool that converts Python code (including deal contracts) into Z3 theorems that can be formally verified.


Interesting approach of basically providing a meta-layer on top of existing tools.

Do you have an example of how you inject context into the codemods? The approach we've taken at Grit is two-fold:

1. When something must be addressed (ex. `todo`), we have functions that wrap messages into the source code to ensure anyone sees the info until it's fixed. We pick up these messages automatically on our SaaS platform.

2. For non-blocking comments, we have a `log` function that any query can call to surface info into the result stream on the CLI + pull requests without it ending up in the final PR.

>4. all of today’s codemod libraries are for one language, so they are hard to orchestrate for a single project.

This isn't entirely true! Grit, my project, was built to be multi-language from the start: https://docs.grit.io/language/overview

[0] https://docs.grit.io/language/functions#todo


Grit looks cool! My apologies for the omission, I was unaware of it. I could have anchored too hard to the word "codemod" in my searches. Your tool looks awesome!

> Do you have an example of how you inject context into the codemods?

When you say "context", I want to make sure we're talking about the same thing, and the question makes me think we're not there yet. We're basically saying that storytelling about the changes is very important, so we bake invariance into the APIs of codemods themselves, so codemod authors are forced to provide descriptions, reasons, justification -- whatever -- at the key points.


Have you heard about Mixin? What advantages could Codemodder have over SpongePowered Mixins?


Codemodder and SpongePowered Mixin cater to different scenarios. Codemodder is ideal for transforming source code you own, based on specific patterns. Changing source code allows the changes to be tracked, reviewed, and analyzed using standard tools like compilers and static analysis. It's great for large-scale codebase refactoring.

Contrastingly, SpongePowered Mixin uses Java bytecode manipulation, to transform the bytecode of a specific type. Bytecode manipulation comes with added risks and complexity, so this method is typically reserved for when you need to change the behavior of some external library or framework type. For example, Mixin is useful in Minecraft modding, because it allows modders to change the behavior of externally defined Minecraft types.

In essence, choose Codemodder for large-scale refactors to your source code, and Mixin to modify the bytecode of external Java types.


Ah seems like I greatly misunderstood the purpose of CodeModder. Thanks for the clarification.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: