Hacker News new | past | comments | ask | show | jobs | submit login

> Clang does not implicitly simplify code as it parses it like GCC does. Doing so causes many problems for source analysis tools: as one simple example, if you write "x-x" in your source code, the GCC AST will contain "0", with no mention of 'x'. This is extremely bad for a refactoring tool that wants to rename 'x'.

Can someone explain this Clang “pro”? If a refactoring tool wants to rename “x”, it does it to the source, not the AST, no? And if “x-x” is turned into 0 by the parser, why does it matter? Assuming “x” isn’t volatile, “x-x” is indeed 0!




Many refactoring tools need the AST (and more) from a frontend to do type-aware refactoring, among other things.

If we have, for example,

  struct a { int x; } sa;
  struct b { float x; } sb;
and we want to rename sa.x into sa.y, then the refactoring tool needs to understand types in order to leave sb.x alone. And if

  sa.x - sa.x
is not in the AST, it will not be changed into

  sa.y - sa.y
which will yield errors.


> If a refactoring tool wants to rename “x”, it does it to the source, not the AST, no?

A textual find and replace isn't sufficient because multiple variables might be named ‘x’ and you wouldn't want to rename all of them, just the one you're interested in. The only way to know which occurrences of ‘x’ are your ‘x’ is by parsing the source into an AST and then performing analysis on it. Once you know the file locations, then the source itself can be modified. So to summarize, yes you are correct that the source is modified, but an AST + analysis is needed to find the correct locations.


A refatoring tool might want to offer semantic renaming. If 'x' occurs in many places textually in the source, the only way to be sure that a rename operation only acts on the same variable, (rather than just the same string) is to (at least partially) parse the source, act on the resulting AST, and use source information in the AST to apply the change to the source.

So, to avoid maintaining a separate parser, it would be nice to grab an AST from the compiler - but that AST has to match 1-1 with the source to be useful for refactoring.


Can't you resolve this by doing a bit of extra work in scopes that have e.g. 'x' in source but not in the corresponding AST? I wasn't sure if generated ASTs were 1-1 matched with source top-level scopes, but apparently they are:

http://icps.u-strasbg.fr/~pop/images/front-end.png

It's a bit more work than cases where AST matches source level expectations, but can't you just walk that sub-tree and figure out that '0' was 'x - x'?

http://icps.u-strasbg.fr/~pop/images/plus-expr.png


Having an AST that maps (nearly?) 1:1 to the source code is important for debug info and doing proper error messages, so doing this at the AST and writing it back out is going to be much easier to test (prove, even).

The "source"-level rewriting tool is going to build some kind of code and data structure that... parses and performs semantic analysis of the code.

Also, if x-x gets turned into zero by the parser just scrap the project and bring out the firing squad.


No, normally refactoring is applied on the AST, which is then matched back to the source. The source is raw text, you can't refactor raw text directly - you need the AST to do any kind of meaningful refactoring (you often need more than the AST to do interesting refactoring).


That makes sense. But if gcc doesn’t expose the AST (like clang does), why does it matter what its parser does? This “comparison” even says such:

> GCC is built as a monolithic static compiler, which makes it extremely difficult to use as an API and integrate into other tools.

Obviously, if gcc exposed the AST with an API, changing "x - x" into 0 would be bad for refactoring, but they don’t (at least when this was written a decade ago).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: