> How will you deal with files that are purposely broken, or which cause the parser to take impractical (but finite) times to complete?
I've never seen a language parser do that, but if I run into a language that does that, I'll probably have my VCS track it at the file level, based on tokens or lines.
Dumb languages don't get nice things. :)
> How will you maintain a history when most commits are likely to contain unparseable code and so break the continuity of objects?
This is less of a problem with binary files (assuming the source software does not have bugs in output), but with source files, you're right that that problem does exist.
As of right now, I would do a token-based approach. This approach removes the need for whitespace-only commits, and if I track the tokens right, I should be able to identify which right brace used to end the function until the broken code was saved. Then I would just save the function as broken using that same right brace.
For example, say you have this:
int main() {
return 0;
}
My VCS would know that the right brace corresponds to the end of the function.
Then you write this:
int main() {
if (global_bool) {
return 0;
}
Yes, a dumb system might think that the right brace is for the `if`.
However, if you break it down by tokens, the VCS will see that `if (global_bool) {` were added before the return, so it should be able to tell that the right brace still ends the function.
I hope that makes sense.
Another plausible way to do it (at least in C) would be to look for things that look like declarations. The series of tokens `<type> <name> <left_paren>` is probably a function declaration. Java would be easier; its declarations are more wordy.
I still have to prove this is possible, but I think it is.
> How will you deal with files that are purposely broken, or which cause the parser to take impractical (but finite) times to complete?
I've never seen a language parser do that, but if I run into a language that does that, I'll probably have my VCS track it at the file level, based on tokens or lines.
Dumb languages don't get nice things. :)
> How will you maintain a history when most commits are likely to contain unparseable code and so break the continuity of objects?
This is less of a problem with binary files (assuming the source software does not have bugs in output), but with source files, you're right that that problem does exist.
As of right now, I would do a token-based approach. This approach removes the need for whitespace-only commits, and if I track the tokens right, I should be able to identify which right brace used to end the function until the broken code was saved. Then I would just save the function as broken using that same right brace.
For example, say you have this:
My VCS would know that the right brace corresponds to the end of the function.Then you write this:
Yes, a dumb system might think that the right brace is for the `if`.However, if you break it down by tokens, the VCS will see that `if (global_bool) {` were added before the return, so it should be able to tell that the right brace still ends the function.
I hope that makes sense.
Another plausible way to do it (at least in C) would be to look for things that look like declarations. The series of tokens `<type> <name> <left_paren>` is probably a function declaration. Java would be easier; its declarations are more wordy.
I still have to prove this is possible, but I think it is.