I'm not following?

diarrhea · on July 8, 2021

Not GP, but I'd go a very simple and verbose way, maybe that's what they meant to. Match:

    (.)Tarzan(.)

Then in an additional line of code assert

    (Group 1 == Group 2) ≠ "

This shifts the logic out of regex and into the surrounding programming language context. That's arguably better, but the resulting regex is extremely dull and unclever.

pimlottc · on July 8, 2021

Don’t forget to look out for matches at the boundaries of the original string. I think it should be something like:

    (^|.)Tarzan(.|$)

Though I’m not 100% sure offhand what the result in the capturing groups would be.

taneq · on July 9, 2021

I think dumb, brute force, simple approaches like this are underrated. Writing elegant, pithy code that pleases you aesthetically is nice but writing code that's explicit and obvious and can be maintained by the new kid is often more pragmatic.

Save the clever stuff for where it's needed.

crazygringo · on July 8, 2021

I mean, I guess if nobody on your team understands regexes.

But generally, once you decide to use a regex in the first place, you might as well put as much regular everyday logic as you can in it. Otherwise you might as well look for "Tarzan" with a dumb string search.

Lookbehinds and lookaheads aren't rocket science. And you can always leave a comment about what they're doing if you're worried other team members won't grok the syntax.

dfabulich · on July 8, 2021

> Lookbehinds and lookaheads aren't rocket science.

Lookbehinds and lookaheads (especially negative lookbehinds) are rocket science.

What is "rocket science?" "Rocket science" is the feeling you get in math class where the instructor explains a proof to you in the clearest possible terms and you just don't get it. You have to listen to the explanation multiple times, preferably in a few different ways, and then you have to sleep on it, and then you get it, maybe.

But "rocket science" isn't just hard to understand. It's a hard problem where the consequences for failure are catastrophic. When you fail at rocket science, a multi-million dollar rocket explodes.

Anyone who's ever tried to teach lookbehinds to a newbie has seen it: you explain how lookbehinds work, and then ask the newbie to create a regex with negative lookbehind, to demonstrate mastery. I've done it a few times, and they never get it right, ever.

At best, they flub the syntax, but even once they get over that, they usually write the worst possible regex: a regex that works correctly on desired inputs but does the wrong thing on the input the regex is designed to reject.

This is a notorious problem with writing regexes, but it's way worse for negative lookbehind, because it's asserting that something isn't there, rather than querying for something that is there.

When I see a regex with negative lookbehind during code review, I ask for unit tests, not just comments. Reliably, regexes get even more complex when unit tests are added, because it's just so damn hard to write a correct regex with negative lookbeind.

I've never used the "trick" from TFA before, but it already sounds way easier to use than negative lookbehinds, and I'm curious to try it.

crazygringo · on July 9, 2021

I agree on unit tests for non-trivial regexes as a general rule, but respectfully disagree on lookaheads and lookbehinds.

Things like greedy vs. non-greedy matching, matching newlines or not, handling Unicode correctly, inserting a capturing group when you actually needed a non-capturing group, making sure your regex works if it matches the start or end of a string, escaping characters -- those can be tricky.

On the other hand, lookaheads and lookbehinds are conceptually extremely straightforward, you just need a cheatsheet to remember the syntax is all.

IgorPartola · on July 9, 2021

Ha. Of all the things I learned at university, rocket science was the easiest to get. Quantum mechanics on the other hand sucked.

RheingoldRiver · on July 8, 2021

> Otherwise you might as well look for "Tarzan" with a dumb string search.

Yes, this was sort of the idea as well (also see sibling response). I'd just as soon have 2 lines of code rather than a regex.

taneq · on July 9, 2021

> I mean, I guess if nobody on your team understands regexes.

If anybody on your team doesn't understand regexes, you mean.

RheingoldRiver · on July 8, 2021

Yeah, that's more or less what I meant. Write a regex (plus line of code) to make sure `Tarzan` appears. Then write another regex and line of code to make sure `"Tarzan"` doesn't appear.

Maybe at this point you aren't using regex even. Nice, you solved two problems.

(I do appreciate regex and even use them a lot. But, I use them enough to avoid them as much as possible.)