This is simply not absolutely correct, only conditionally so.
The only way in which
"Tarzan"|(Tarzan)
extracts a Tarzan that is not in quotes is when it is used for scanning the input for non-overlapping matches.
(We know from lexical analysis with regexes, a form of non-overlapping extraction, that the "Tarzan" token is different from a Tarzan token. An identifier won't be recognized if it is in the middle of a string literal.)
It's not the regex itself, but a particular way of using it.
If the regex is used for finding all maximally long matching substrings, then it won't work. It will find "Tarzan" and it will find the Tarzan also within those quotes.
Notably, the regex will also fail if it is used to find a single match, like the leftmost. If the datum is a string like
"Tarzan", said Jane; Tarzan turned.
then the leftmost "Tarzan" will be found, and that's it. The regex will not find the leftmost Tarzan that is not wrapped in quotes.
We cannot even use this to simply grep files for lines that have Tarzan that is not in quotes.
The only way in which
extracts a Tarzan that is not in quotes is when it is used for scanning the input for non-overlapping matches.(We know from lexical analysis with regexes, a form of non-overlapping extraction, that the "Tarzan" token is different from a Tarzan token. An identifier won't be recognized if it is in the middle of a string literal.)
It's not the regex itself, but a particular way of using it.
If the regex is used for finding all maximally long matching substrings, then it won't work. It will find "Tarzan" and it will find the Tarzan also within those quotes.
Notably, the regex will also fail if it is used to find a single match, like the leftmost. If the datum is a string like
then the leftmost "Tarzan" will be found, and that's it. The regex will not find the leftmost Tarzan that is not wrapped in quotes.We cannot even use this to simply grep files for lines that have Tarzan that is not in quotes.