For normal VCS's, you are absolutely right. And you're actually right for mine, ...

deadbeeves · on Nov 13, 2023

You're tracking things at the content level? How will you deal with files that are purposely broken, or which cause the parser to take impractical (but finite) times to complete? Also, tracking the history of a class makes sense to some extent, but you say you want to commit every time there's a save. How will you maintain a history when most commits are likely to contain unparseable code and so break the continuity of objects?

gavinhoward · on Nov 13, 2023

Good questions.

> How will you deal with files that are purposely broken, or which cause the parser to take impractical (but finite) times to complete?

I've never seen a language parser do that, but if I run into a language that does that, I'll probably have my VCS track it at the file level, based on tokens or lines.

Dumb languages don't get nice things. :)

> How will you maintain a history when most commits are likely to contain unparseable code and so break the continuity of objects?

This is less of a problem with binary files (assuming the source software does not have bugs in output), but with source files, you're right that that problem does exist.

As of right now, I would do a token-based approach. This approach removes the need for whitespace-only commits, and if I track the tokens right, I should be able to identify which right brace used to end the function until the broken code was saved. Then I would just save the function as broken using that same right brace.

For example, say you have this:

    int main() {
        return 0;
    }

My VCS would know that the right brace corresponds to the end of the function.

Then you write this:

    int main() {
        if (global_bool) {
        return 0;
    }

Yes, a dumb system might think that the right brace is for the `if`.

However, if you break it down by tokens, the VCS will see that `if (global_bool) {` were added before the return, so it should be able to tell that the right brace still ends the function.

I hope that makes sense.

Another plausible way to do it (at least in C) would be to look for things that look like declarations. The series of tokens `<type> <name> <left_paren>` is probably a function declaration. Java would be easier; its declarations are more wordy.

I still have to prove this is possible, but I think it is.

dmoy · on Nov 14, 2023

> As of right now, I would do a token-based approach

C++ is gonna get really funky there, with e.g. templates

gavinhoward · on Nov 14, 2023

Agreed. I'm starting with C.

nextaccountic · on Nov 13, 2023

In those cases you can just do error recovery in the parser (truncating an erroring function for example) and then store out-of-band information necessary to reconstruct the original file

This is also necessary to deal with whitespace for example (if you just reformat the code, you didnt change the ast but you changed the file)