Not GP, but I'd go a very simple and verbose way, maybe that's what they meant to. Match:
(.)Tarzan(.)
Then in an additional line of code assert
(Group 1 == Group 2) ≠ "
This shifts the logic out of regex and into the surrounding programming language context. That's arguably better, but the resulting regex is extremely dull and unclever.
I think dumb, brute force, simple approaches like this are underrated. Writing elegant, pithy code that pleases you aesthetically is nice but writing code that's explicit and obvious and can be maintained by the new kid is often more pragmatic.
I mean, I guess if nobody on your team understands regexes.
But generally, once you decide to use a regex in the first place, you might as well put as much regular everyday logic as you can in it. Otherwise you might as well look for "Tarzan" with a dumb string search.
Lookbehinds and lookaheads aren't rocket science. And you can always leave a comment about what they're doing if you're worried other team members won't grok the syntax.
> Lookbehinds and lookaheads aren't rocket science.
Lookbehinds and lookaheads (especially negative lookbehinds) are rocket science.
What is "rocket science?" "Rocket science" is the feeling you get in math class where the instructor explains a proof to you in the clearest possible terms and you just don't get it. You have to listen to the explanation multiple times, preferably in a few different ways, and then you have to sleep on it, and then you get it, maybe.
But "rocket science" isn't just hard to understand. It's a hard problem where the consequences for failure are catastrophic. When you fail at rocket science, a multi-million dollar rocket explodes.
Anyone who's ever tried to teach lookbehinds to a newbie has seen it: you explain how lookbehinds work, and then ask the newbie to create a regex with negative lookbehind, to demonstrate mastery. I've done it a few times, and they never get it right, ever.
At best, they flub the syntax, but even once they get over that, they usually write the worst possible regex: a regex that works correctly on desired inputs but does the wrong thing on the input the regex is designed to reject.
This is a notorious problem with writing regexes, but it's way worse for negative lookbehind, because it's asserting that something isn't there, rather than querying for something that is there.
When I see a regex with negative lookbehind during code review, I ask for unit tests, not just comments. Reliably, regexes get even more complex when unit tests are added, because it's just so damn hard to write a correct regex with negative lookbeind.
I've never used the "trick" from TFA before, but it already sounds way easier to use than negative lookbehinds, and I'm curious to try it.
I agree on unit tests for non-trivial regexes as a general rule, but respectfully disagree on lookaheads and lookbehinds.
Things like greedy vs. non-greedy matching, matching newlines or not, handling Unicode correctly, inserting a capturing group when you actually needed a non-capturing group, making sure your regex works if it matches the start or end of a string, escaping characters -- those can be tricky.
On the other hand, lookaheads and lookbehinds are conceptually extremely straightforward, you just need a cheatsheet to remember the syntax is all.
Yeah, that's more or less what I meant. Write a regex (plus line of code) to make sure `Tarzan` appears. Then write another regex and line of code to make sure `"Tarzan"` doesn't appear.
Maybe at this point you aren't using regex even. Nice, you solved two problems.
(I do appreciate regex and even use them a lot. But, I use them enough to avoid them as much as possible.)