Hacker News new | past | comments | ask | show | jobs | submit login
I found and fixed a bug in PHP's standard library (miguelxpn.com)
167 points by miguelxpn on March 1, 2020 | hide | past | favorite | 32 comments



Notably, the bugfix made PHP maintainers realize that the header handler had additional behaviors that they did not understand nor had a rationale for, so those behaviors were removed as well:

https://github.com/php/php-src/pull/5201


Feels like this could be a case where DRY could fix a bug in one place


DRY is a good principle, but sometimes I’d rather repeat myself a little bit than build more bug-prone scaffolding to avoid repetition—especially when the repetition isn’t line-for-line identical. Just remember to always cite the repetition in comments.


I’ve never come across a bug due to application of DRY, but I have seen many bugs because of “harmless” duplication resulting in inconsistent code changes several months later.

Even aside from DRY, extracting a block of code to a function or method gives you an opportunity to name the block of code, which often significantly clarifies the intent of the code so, I’ve always tried to error on the side of overly DRY


I haven't seen bugs due to DRY, but I've caused difficulty of adding new features and poor maintainability because of it.

When I first took over maintaining Red Moon, I was a very new dev and went a little DRY crazy. In particular, there's a state machine for the different filter states (running, paused, stopped, etc), that had a bunch of classes (one per state) that had a lot of overlap. I pulled some common behaviors out into a supertype that the classes with those behaviors now inherited from.

Except, it turns out some of those states ought to act differently (automatic pauses in secure apps should disable the overlay but not restore the backlight; manual pauses should do both), and now it's going to be extra work untangling the various states and pause logic. If I'd left the state machine (amongst other bits) as it was, this feature/bugfix would be implemented already. Also, the current state of things (pun intended) is a bit less readable, in my opinion.


I have see a dry bug. It happens when you deduplicate code which isn’t an exact duplicate, but only appears to be.

For instance, one piece of code is supposed to do the same thing as another, but in context it caused a side effect that some other part of the code was inadvertently relying on.


>> It happens when you deduplicate code which isn’t an exact duplicate, but only appears to be.

Embarrassing to admit, but until now I didn't think in this perspective - and on hindsight it should be obvious, sometimes code might look duplicate, but is not


Or it starts as duplicate but diverges over time. I've been bitten by the premature deduplication.


Or worse, the natural progression should be that they diverge, but because they now share some kind of "reused" code in a library or class, an abomination is added to one or both users, just to keep the original "shared" feature.

Great fun.


I’ve never really understood this, because undoing DRY is relatively easy: you either copy the new function/class and rename it or you inline it in the mistaken case and adjust the code to match.


Easy, but it's still work. As a thought experiment: if it's so easy, would you be interested in doing it for me? (edit: Obviously not, but compare your internal resistance to if I'd asked you to do something trivial like fix a typo in the readme)

Here's the file I was referencing (only 100 lines!): https://github.com/LibreShift/red-moon/blob/master/app/src/m...

Here's the issue I've been putting off fixing because of it: https://github.com/LibreShift/red-moon/issues/208

The problem is that right now there is one PAUSE state, which is triggered by both manual pauses (brightness should be restored) and automatic pauses (in secure apps; brightness should not be restored). The problem wasn't exactly DRY, it's the misapplication: I merged bits that had identical code, but turned out not to have identical purpose.

And, for posterity, here's what it used to look like way back (DRYing was not the only change, I also split out notification stuff, so it's not a 1:1 length comparison): https://github.com/LibreShift/red-moon/blob/11ae916955ff8c36...

edit: and here's the original file, back mostly before I touched it at all: https://github.com/LibreShift/red-moon/blob/ed2ec4fd1c68611d...


I'm not sure what you don't understand - that doing things unnecessarily and then having to undo them is a waste of time?

Nobody here is saying DRY is bad, just that it's not universally the right thing to do without consideration.


It can become much more convoluted than just having a function used in many places. It usually takes the form of multiple levels of inheritance. And it's a pain to deal with, and somewhat of a pain to refactor as well.


Thanks for maintaining the app btw! Makes it a lot easier to get to sleep after staring at HN for an hour before bed.


I've seen people over-engineering a piece of code in order to make it "flexible" enough to be able to reuse it in like 2-3 scenarios that are not really well aligned - all in the name of making their code super DRY.

Sometimes it's better to have just a few separate, differently shaped pegs, than one flexible peg that fits any hole but is so complex that it might introduce 10 new bugs along the way.


I’ve seen plenty of DRY bugs due to premature abstraction and an unwillingness to refactor later on.

In many cases, the opportunity for DRY was misleading as the code was only superficially similar, so the implementation became over-complicated to handle various corner cases that otherwise wouldn’t have existed.


Last time this caused memory corruption, I did a rewrite using eacaping-m4 to encode this difference in behavior. Gnu m4 has a changeword feature you can compile in for that.


I was thinking the same so I checked and they removed the duplication:

https://github.com/php/php-src/commit/3d9c02364db62a6d8e2794...

They also fixed the whitespace handling that let something like "RandomHeader: hello host:8080" mistakenly set the flag.

https://github.com/php/php-src/commit/56cdbe63c24b86c2f1d60b...


That looks so much better to me. The cognitive load is significantly reduced


Looking at lines 460-521 in the modified file (https://github.com/miguelxpn/php-src/blob/f4b2089b642d504be3...), is there not a benefit to `break` out of the while loop in the nested if statements?

Otherwise, it looks like it will call strstr() one extra time, even though you may have already determined that the specific header is present.


Good catch! That's indeed the case. Another commit was made where that piece of code was refactored into a function and it returns 1 in case the header is present so strstr isn't being called an extra time in the current code.

[1] https://github.com/php/php-src/commit/3d9c02364db62a6d8e2794...


Ah, much cleaner!


Interesting bug, but even better it's nice to see encouragement of people to get involved in open source. The walk through of the process was great.


Yeah! Before I started contributing I always thought that I wasn't good enough to even try. When I started I noticed how welcoming the communities usually are and that motivated me to contribute me even more. Nowadays I try to motivate people whenever I can.


Humble and didactic, totally agree that we should thank the author. I for one felt compelled to contribute to OS after reading the article.


  while ((s = strstr(s, "host:"))) {
    if (s == t || *(s-1) == '\r' || *(s-1) == '\n' ||
        *(s-1) == '\t' || *(s-1) == ' ') {
      have_header |= HTTP_HEADER_HOST;
    }
    s++;
  }
This s++ could be s + sizeof "host:" - 1.

Reason being: if you have just found "host:" at address s, you will not find another one at s+1, s+2, s+3, s+4 or s+4; "host:" supports no overlapped matches.

Actually since, we are really looking for "host:" preceded by whitespace, we can skip by just sizeof "host", (five bytes). Because if s points at "host:host:", the second "host:" is not of interest; it is not preceded by a whitespace character; we won't be setting the HTTP_HEADER_HOST bit for that one.

Also, once we set the HTTP_HEADER_HOST bit, we can break out of the loop; there is no value in continuing it. The point of the loop is not to have a false negative: not to stop on a false match on "host:" which doesn't meet the HTTP_HEADER_HOST conditions, and thereby fail to find the real one later in the string. If we find the real one, we are done.

By the way, this test shows how verbose of a language C is:

        *(s-1) == '\r' || *(s-1) == '\n' ||
        *(s-1) == '\t' || *(s-1) == ' '
If we could use Javascript, look how nice it would be:

        strchr("\r\n\t ", s[-1])


Thanks for sharing it! Really interesting, and well done!


I feel like this bug should be solved by using a proper parsing, instead of greedily looking for a field...


Headers are just text mixed with body and hard to figure out where they stop


Headers stop when you have 2 line endings consecutively[1], I don't see why it's hard.

[1] https://www.w3.org/Protocols/rfc2616/rfc2616-sec4.html#sec4....


Easy. They stop after \r\n\r\n.

There is no other RFC compliant way to end headers.


That ends suddenly.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: