Good questions. > How will you deal with files that are purposely broken, or whi... | Hacker News

Hacker News new | past | comments | ask | show | jobs | submit

login

gavinhoward on Nov 13, 2023 | parent | context | favorite | on: Reasons to Prefer Blake3 over Sha256

Good questions.

> How will you deal with files that are purposely broken, or which cause the parser to take impractical (but finite) times to complete?

I've never seen a language parser do that, but if I run into a language that does that, I'll probably have my VCS track it at the file level, based on tokens or lines.

Dumb languages don't get nice things. :)

> How will you maintain a history when most commits are likely to contain unparseable code and so break the continuity of objects?

This is less of a problem with binary files (assuming the source software does not have bugs in output), but with source files, you're right that that problem does exist.

As of right now, I would do a token-based approach. This approach removes the need for whitespace-only commits, and if I track the tokens right, I should be able to identify which right brace used to end the function until the broken code was saved. Then I would just save the function as broken using that same right brace.

For example, say you have this:

    int main() {
        return 0;
    }

My VCS would know that the right brace corresponds to the end of the function.

Then you write this:

    int main() {
        if (global_bool) {
        return 0;
    }

Yes, a dumb system might think that the right brace is for the `if`.

However, if you break it down by tokens, the VCS will see that `if (global_bool) {` were added before the return, so it should be able to tell that the right brace still ends the function.

I hope that makes sense.

Another plausible way to do it (at least in C) would be to look for things that look like declarations. The series of tokens `<type> <name> <left_paren>` is probably a function declaration. Java would be easier; its declarations are more wordy.

I still have to prove this is possible, but I think it is.

dmoy on Nov 14, 2023 [–]

> As of right now, I would do a token-based approach

C++ is gonna get really funky there, with e.g. templates

gavinhoward on Nov 14, 2023 | [–]

Agreed. I'm starting with C.

Consider applying for YC's Spring batch! Applications are open till Feb 11.
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact