Show HN: Codemodder – A new codemod library for Java and Python

westurner · 2024-01-24T00:01:50

How does libCST compare to e.g. pyCQA/redbaron? What about for EA Evolutionary Algorithms; does it preserve comments, or update docstrings and type annotations in mutating the code under test?

Is it necessary to run `black` (and `precommit run --all-files`) to format the code after mutating it?

Instagram/LibCST: https://github.com/Instagram/LibCST

PyCQA/redbaron: https://github.com/PyCQA/redbaron

E.g. PyCQA/bandit does static analysis for security issues in Python code: https://github.com/PyCQA/bandit

https://news.ycombinator.com/item?id=38677294

https://news.ycombinator.com/item?id=24511280 ... https://analysis-tools.dev/tools?languages=python

drdavella · 2024-01-24T02:15:29

Hi! Great questions. I'm the lead maintainer of the Python version of the Codemodder framework so I'll do my best to answer.

> How does libCST compare to e.g. pyCQA/redbaron?

LibCST is similar to redbaron in the sense that it does preserve comments and whitespace. The "CST" in LibCST refers to "concrete syntax tree", which preserves comments and whitespace, as opposed to an "abstract syntax tree" or "AST", which does not. Our goal is to make the absolute minimal changes required to harden and improve code, and messing with whitespace would be counter to that goal. It's worth noting that redbaron no longer appears to be maintained and the most recent version of Python that it supported was 3.7 which is now itself EOL.

> What about for EA Evolutionary Algorithms

Can you elaborate? I am familiar with the concept of evolutionary algorithms but I'm not sure I understand what you mean in this context.

> does it preserve comments, or update docstrings and type annotations in mutating the code under test?

Codemodder does preserve comments. Currently none of our codemods update docstrings; I'm not sure we currently have any cases where that would make sense. We do make an effort to update type annotations where appropriate.

> Is it necessary to run `black` (and `precommit run --all-files`) to format the code after mutating it?

Yes, it is currently necessary to run `black` and `precommit` if you're using it on your project. While `black` is incredibly popular, we also can't assume that it's being used on any given project. Running `black` would cause each updated file to be completely reformatted which would lead to very noisy and difficult-to-review changes. I would like to explore better solutions to this issue going forward.

I am familiar with `bandit`. It's a fairly simple security linter and is useful for finding some common issues. It's also pretty prone to false positives and noisy findings. Not every problem identified by `bandit` is something that can be automatically fixed; for example I can't replace a hard-coded password without making a lot of (breaking) assumptions about the structure of your application and the manner in which it is deployed.

I'd love to get your feedback on Python Codemods! Give us a star on GitHub and feel free to open an issue or PR: https://github.com/pixee/codemodder-python

westurner · 2024-01-26T05:22:13

Thanks for your reply!

I think they called it an FST "Full Syntax Tree", which is probably very similar to a CST "Concrete Syntax Tree". At the time that moses was written, Python's internal AST hadn't sufficient code to mutate sufficiently for moses' designs.

MOSES: Meta-Optimizing Semantic Evolutionary Search :

https://wiki.opencog.org/w/Meta-Optimizing_Semantic_Evolutio... :

> All program evolution algorithms tend to produce bloated, convoluted, redundant programs ("spaghetti code"). To avoid this, MOSES performs reduction at each stage, to bring the program into normal form. The specific normalization used is based on Holman's "elegant normal form", which mixes alternate layers of linear and non-linear operators. The resulting form is far more compact than, say, for example, boolean disjunctive normal form. Normalization eliminates redundant terms, and tends to make the resulting code both more human-readable, and faster to execute.

> The above two techniques, optimization and normalization, allow MOSES to outperform standard genetic programming systems.

opencog/asmoses: https://github.com/opencog/asmoses

MOSES outputs Combo (a LISP), Python as an output transform IIUC, and now Atomese with asmoses, which links to a demo notebook: https://robert-haas.github.io/mevis-docs/code/examples/moses...

Evolutionary algorithm > Convergence: https://en.wikipedia.org/wiki/Evolutionary_algorithm#Converg...

/? mujoco learning to walk [with evolutionary selection / RL Reinforcement Learning] https://www.google.com/search?q=mujoco+learning+to+walk&tbm=...

...

Semgrep: https://en.wikipedia.org/wiki/Semgrep links to OWASP Source Code Analysis Tools: https://owasp.org/www-community/Source_Code_Analysis_Tools

But what's static analysis or dynamic analysis source code analysis without Formal Verification?

"Nagini: A Static Verifier for Python": https://pm.inf.ethz.ch/publications/EilersMueller18.pdf https://github.com/marcoeilers/nagini :

> However, there is currently virtually no tool support for reasoning about Python programs beyond type safety.

> We present Nagini, a sound verifier for statically-typed, concurrent Python programs. Nagini can prove memory safety, data race freedom, and user-supplied assertions. Nagini performs modular verification, which is important for verifi- cation to scale and to be able to verify libraries, and automates the verification process for programs annotated with specifications.

Deal > Formal verification > Background; Hoare logic, DbC Design by Contract, Dafny, Z3: https://deal.readthedocs.io/basic/verification.html#backgrou... :

> 2021. deal-solver. We released a tool that converts Python code (including deal contracts) into Z3 theorems that can be formally verified.

morgante · 2024-01-24T05:10:04

Interesting approach of basically providing a meta-layer on top of existing tools.

Do you have an example of how you inject context into the codemods? The approach we've taken at Grit is two-fold:

1. When something must be addressed (ex. `todo`), we have functions that wrap messages into the source code to ensure anyone sees the info until it's fixed. We pick up these messages automatically on our SaaS platform.

2. For non-blocking comments, we have a `log` function that any query can call to surface info into the result stream on the CLI + pull requests without it ending up in the final PR.

>4. all of today’s codemod libraries are for one language, so they are hard to orchestrate for a single project.

This isn't entirely true! Grit, my project, was built to be multi-language from the start: https://docs.grit.io/language/overview

[0] https://docs.grit.io/language/functions#todo

nahsra · 2024-01-25T04:01:13

Grit looks cool! My apologies for the omission, I was unaware of it. I could have anchored too hard to the word "codemod" in my searches. Your tool looks awesome!

> Do you have an example of how you inject context into the codemods?

When you say "context", I want to make sure we're talking about the same thing, and the question makes me think we're not there yet. We're basically saying that storytelling about the changes is very important, so we bake invariance into the APIs of codemods themselves, so codemod authors are forced to provide descriptions, reasons, justification -- whatever -- at the key points.

blackfur · 2024-01-24T08:56:04

Have you heard about Mixin? What advantages could Codemodder have over SpongePowered Mixins?

gilday · 2024-01-24T16:05:45

Codemodder and SpongePowered Mixin cater to different scenarios. Codemodder is ideal for transforming source code you own, based on specific patterns. Changing source code allows the changes to be tracked, reviewed, and analyzed using standard tools like compilers and static analysis. It's great for large-scale codebase refactoring.

Contrastingly, SpongePowered Mixin uses Java bytecode manipulation, to transform the bytecode of a specific type. Bytecode manipulation comes with added risks and complexity, so this method is typically reserved for when you need to change the behavior of some external library or framework type. For example, Mixin is useful in Minecraft modding, because it allows modders to change the behavior of externally defined Minecraft types.

In essence, choose Codemodder for large-scale refactors to your source code, and Mixin to modify the bytecode of external Java types.

blackfur · 2024-01-27T09:50:36

Ah seems like I greatly misunderstood the purpose of CodeModder. Thanks for the clarification.