Pretty funny how the entire HN rallied together to solve some github issue on someone's personal repo. I'll try posting some of my bugs here to see if it'll work for me as well.
There's a power-law dropoff when it comes to follow-up submissions, since the hivemind craves novelty. But it would be fun to try. Maybe it could become a thing: Bug HN!
The trick would be to make the bugs interesting: weird or hard, preferably both.
I think the biggest dragon to slay would be the legendary GTK issue #233 which has been open for 15 years and has become sort of a meme by now. So many people try to solve it but fail since it requires intricate knowledge of the framework.
My impression is the opposite: random users demanding that some devs spend their time on a tricky corner issue and getting angry about it. After reading their comments, I'd be very hard pressed to motivate myself to work on it.
It seemed unclear that the patch addressed all the issues raised in the ticket, and looking at random patches still requires dev time. It is a different story when someone else actively works on getting it upstreamed and working on resolving any issues. It's about who volunteers to do the hard work.
Given that no one stepped up to do it yet and random users chimed in with aggressive and/or unhelpful noise, locking the issue was adequate.
I find it concerning how much abuse GNOME devs get, and I can completely understand that they respond like this.
Uh, as far as I can see, all the hard work is done, and has been merged into other forks. The Gnome devs just ideologically disagree. The whole ticket is a perfect example of “don’t let perfect be the enemy of good”.
There are a lot of open source projects with decade+ old bugs. Could be a community sport to tackle one ever month or something and accidentally improve the world as a side effect.
Didn't one of the other gtk derived DM replaced the file picker with their own which had thumbnail view? I thought Gnome never changed it because it didn't fit their vision.
It's something else entirely. Almost all the posts on /r/legaladvice are shouting into a void. Hopefully it helps, maybe there's some interesting side discussion...
But an update? That means there's actual feedback and conclusions! Something that's desperately lacking on that particular board.
Part of the reason this got a good response is due to Ned himself. nedbat@ has a great reputation in several communities. Also he showed his homework. And then posted with a good, tight repro case.
Fun to see the problem and the solution. Thanks Ned and thanks HN.
To be fair, it was a great nerd snipe, as it was a mysterious bug that happened when things aligned perfectly, yet still had a very good repro making it easy to test.
Yeah that’s how many personal repos work. I assume you don’t use Node.js or Go or Python where most nontrivial dependencies often come from non-corporate third parties.
The first really bizarre bug I fixed was a file descriptor bug. The code was streaming data and in some cases the app would just stop processing network packets. He or I figured out that keyboard input would temporarily unblock things.
He had a compound conditional without enough parentheses in it, and one of the possible outcomes resulted in fd=1, which is stdin on many operating systems. My general advice now is to assume a 5th grade reading level for math (which consists of +-x/), and use parentheses for everything else. Especially if you think yourself to be clever (narrator: he did).
The very low precedence of & is the source of an unbelievable number of bugs, especially in embedded programming. For example:
if(PORTA & 3 != 0)
looks like it should be true if either of the low two bits is set. But actually, it only fires if the low bit is set because this resolves to PORTA & (3 != 0), which is PORTA & 1.
Some C compilers will warn you about this. Embedded C compilers are not usually the sort of compilers that are nice enough to have these warnings.
As a bonus, different languages handle precedence of &,|,^ differently: Python, for instance, binds & more tightly than ==.
That should give a warning at least because it (and this is probably a C thing) treats or implicitly casts a boolean true / false to a 0/1. You shouldn't be able to do PORTA & true.
Javascript and PHP deal with HTML, where input forms can only request text* but are often used for numbers, so the conversion rules are an attempt to automatically treat numbers as numbers instead of forcing programmers to always do the conversion themselves (as other frameworks rely on instead).
With this kept in mind, I've personally found almost all the implicit conversions in both languages to be intuitive.
* You can limit the values to numerics, and nowadays use attributes to handle javascript, but that wouldn't help PHP where it gets encoded to querystring anyway.
In most cases, up or down casting 32 and 64 bit integers would be a shitton of work that barely adds anything.
A similar case could be made for casting ints to floats (i.e. 2 + 3.1 should just work).
Beyond that though, most implicit casts are just bad.
> Especially if you think yourself to be clever (narrator: he did).
"Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it."
— Brian W. Kernighan and P. J. Plauger in The Elements of Programming Style.
I don't contribute much to open source, mostly due to employment agreements, but a couple I was able to make:
1. Bulk copying null values with Python Sybase module (initialized variable used)
2. ACE high res timer bug under x64. Inline ASM was using the wrong registers in x64 mode (reading garbage upper 32 bits of 64 bit value).
Both of these were bizarre bugs, and neither was immediately evident.
#1 was presented by an error saying something to the effect of "received xXxX bytes, expecting yyyy".
#2 was harder. It became evident because of logging. The high res timer was used gor logging amd intermittently, I was seeing +200 years supposedly elapsed during a db operation. Obviously wrong. It was a bug due to how registers are handled between x86 and x64 code.
So the final answer is basically, there's nothing the author can do. The problem was half with the user's code mocking with internals, and half with downstream library not cleaning up properly.
Monkey patching is a key feature of the python mock library. Get used to it if you’re getting into python development because you’re going to come across it.
A code that looks simple and correct has been hiding a very subtle bug because no one understood the implications of an arbitrary exception being thrown at any point under the caller.
And of all the people who looked at the bug not a single one looked at this seemingly simple code and said "oh yeah, that's because this code is obviously wrong". It took someone skilled enough to script gdb to figure out the buggy interactions.
So the next time someone makes fun of `if err != nil`, send them the buggy implementation of tempfile.NamedTemporaryFile and ask them to spot the bug.
Also, KeyboardInterrupt should not be an exception. It's a signal, not an error or exceptional situation that code wants to report to the caller. But once you have a hammer, everything looks like like a nail. In addition to making the code impossible to reason about, no-one can agree what exactly is an exception so they randomly commingle expected errors with exceptional circumstances and, in case of Python, out-of-band signals.
("randomly" as in: why is "open file" throwing an exception when file doesn't exist but "find substring" returns -1 when substring doesn't exist?)
An extremely obscure and rare scenario that exposes a flaw with a convenient tool does not negate the value of the tool in all other cases.
Are you also going to throw out all use of automatic garbage collection in languages because of a bug caused by a GC pause.
Should we stop bothering with encryption because people discover weaknesses and attack vectors?
Come on, man.
> Also, KeyboardInterrupt should not be an exception. It's a signal, not an error or exceptional situation that code wants to report to the caller.
OP doesn't mention anything about KeyboardInterrupt, but I'll bite.
Some languages like C# make "Events" a 1st party citizen, making it really easy to write non-sequential code and handling such events.
Most don't, so when people want to GUARANTEE a caller handles a particular signal, they use the Exception tool because it posesses the necessary qualities.
"Using Exceptions as flow control" is a well-known anti-pattern but even anti-patterns have their use in exceptional circumstances.
> ("randomly" as in: why is "open file" throwing an exception when file doesn't exist but "find substring" returns -1 when substring doesn't exist?)
I'll concede this one point. In-memory utility methods shouldn't throw exceptions when a useful error code would be just as good - it keeps code cleaner without trycatch blocks. NumberFormatException is my bane in Java.
It actually is. Abusing exceptions as the normal control flow is a crude anti-pattern. Obviously the normal flow of control is not exceptional behavior. Therefore, abusing exceptions and exception-handling mechanisms to implement the happy path is simply wrong at many levels, from conceptual to practical, and in the end actually cause problems and hard to find buga such as the problems described in this bug report.
As a fan of Python, I agree that it’s a problem, and probably one of Python’s worst design decisions. It doesn’t achieve anything over sentinels, and results in lots of potential for hidden bugs (especially in the case of __getattr__, and before PEP 479). Comparison overloading even does use NotImplemented instead of an exception.
Especially on StopIteration this is a good way to keep the interface small when lacking static typing.
A generator in Java has the has_next() and next() method and this can be statically checked and you're basically only allowed to call next is hasnext is true.
In python the generator only has to provide the next-method and raises an exception when it's exhausted. I don't find this particularly unclean.
Every time there’s a subtle bug with 7 contributing factors, people pick the one which is caused by a design decision they happen to hate, and place all of the blame squarely on that one.
> ("randomly" as in: why is "open file" throwing an exception when file doesn't exist but "find substring" returns -1 when substring doesn't exist?)
Because there can be lots of ways for `open` to throw an exception (encoding issues, not found, permissions, etc.). The underlying open syscall has 39 error codes. Python adds a few more failure modes atop that. I'd much rather deal with named error conditions than deal with return values -1 through -50.
My question is: what is your alternative to exceptions, and how would they would have made the error easier to spot? Its not obvious that the error would have been easier to spot if it had been written using Go's `!= nil` pattern or Rust's `?` pattern.
My experience is that while its true that its easy to have bugs in implicit exception handling logic, its just as easy, maybe easier, to have bugs in the explicit boilerplate its replacing.
> Its not obvious that the error would have been easier to spot if it had been written using Go's `!= nil` pattern or Rust's `?` pattern.
I like Rust's `?` operator (and Swift's `guard` statement also), but it's worth noting that the bug wouldn't have existed in Rust in the first place because the file object wouldn't have outlived the scope it was defined in.
Also and really because it’s quite hard to go and replace implementation details with code which behaves completely differently and implements an unrelated contract entirely.
> how would they would have made the error easier to spot?
There was a try block with two function calls, and it seems like the author assumed only the first function call could throw.
With return-based errors, the error handling wouldn't implicitly be shared between the two functions. The author would write the handling for the first function, and then they would notice the second function could return an error state too. Then they might write a correct handler, or at least we should expect them to say "this can't happen, so panic". This bug would have been trivial to fix if it panicked instead of corrupting fds.
What makes you think that "the author assumed only the first function call could throw?"
Programmers in exception languages never think (or at least should never think) "X cannot throw" because it is never true. Any function has the potential of throwing an exception. This is even more true in Python than in other exception throwing languages.
No, the problem here is that the author has realized that they need special cleanup logic in the case that a file fails to be opened, but they applied that logic too broadly.
That's precisely the point I was making: there is no python code which is guaranteed not to throw. If the original author thought that the line of code in question wouldn't throw, they were were wrong and fundamentally misunderstood Python.
Well the code wouldn't have caused any errors if it only caught normal exceptions, at least.
> That's precisely the point I was making: there is no python code which is guaranteed not to throw.
Well the first thing you said is "what is your alternative to exceptions, and how would they would have made the error easier to spot? Its not obvious that the error would have been easier to spot if it had been written using Go's `!= nil` pattern or Rust's `?` pattern."
In those languages, you can't have an error state trigger between lines of code, or inside a line of code that does basic things like assigning to a local variable or returning.
It won't happen in C++ either (with certain assumptions).
Python is a language that's supposed to be straightforward, and very few coders are going to understand exactly how hostile exceptions can be. So while these assumptions shouldn't have been made, the blame goes toward the design of the language. An alternative system would be better.
> Well the code wouldn't have caused any errors if it only caught normal exceptions, at least.
Uhh... it was a normal exception that triggered the bug in question.
> Well the first thing you said is
Yes, that's the first thing I said. But the point I'm making here is that your initial response to that is probably wrong: "the author assumed only the first function call could throw" That's just not how you think in Python. (Or at least it is not how you should think.)
My point is this: the bug wasn't assuming that the second function wouldn't fail. The bug was assuming that the same cleanup code was appropriate in the case of either failure. That same bug could easily exist with return codes as it does in exceptions.
> Python is a language that's supposed to be straightforward
In fairness to Python, this is an atypical case. The problem arises because this is low level code dealing directly with fds. Most of the time, code will deal with higher level objects which will properly clean up after themselves regardless of what crazy stuff you do with them.
> Uhh... it was a normal exception that triggered the bug in question.
That doesn't count because it was an invasive monkeypatch that took a function that could not possibly cause an error and made it throw.
> My point is this: the bug wasn't assuming that the second function wouldn't fail. The bug was assuming that the same cleanup code was appropriate in the case of either failure.
Maybe. Hard to know for sure.
> In fairness to Python, this is an atypical case.
I dunno, pretty much any code that uses a catch block to undo things has to be very carefully written. Even if you're wrapping things in higher level objects, you still have ugly scenarios where you're mutating a data structure and have to unwind the changes halfway through. I bet tons of that code is written under the assumption that there will be no exceptions.
I guess if we consider "trying to catch KeyboardInterrupt at any level without immediate exit" as a weird barely-supported case, then things are fine.
> That doesn't count because it was an invasive monkeypatch that took a function that could not possibly cause an error and made it throw.
The original claim was based on that possibility, its pretty goofy to declare it doesn't count.
> I dunno, pretty much any code that uses a catch block to undo things has to be very carefully written. Even if you're wrapping things in higher level objects, you still have ugly scenarios where you're mutating a data structure and have to unwind the changes halfway through. I bet tons of that code is written under the assumption that there will be no exceptions.
Is it any different for return codes than for exceptions? I'd say that any code which attempts to undo change in the case of an error has to be carefully written. Every special case on the ordinary code path, has to have matching special cases on the error paths. In fact, I've never seen code in any language that does such a thing. And I wouldn't trust if it I did, since its extremely difficult to get right for all cases, and the errors paths almost certainly hasn't been well tested.
> I guess if we consider "trying to catch KeyboardInterrupt at any level without immediate exit" as a weird barely-supported case, then things are fine.
Well, trying to continue running after KeyboardInterrupt is dubious, the user just asked the program to terminate.
A non-sequitur me thinks, as the code doesn't look simple and correct. File needs to be closed on error if it exists, fd if not. i.e. The broken version looks broken.
While there is lots of meta-complexity around this, the bug itself is relatively simple. Make sure files are closed properly on error.
Also find -1 is a wrapper from C, probably kept for compatibility reasons.
I do not see how exceptions caused the bug, seems the cause is a bug in a GC language d that is wrapping a low level code/resource and not releasing it properly.
In a langage without exceptions the file wrapper being trivial it would not have returned any sort of error condition and thus would not have been checked for it.
It is an issue of exceptions that try clauses can easily be overly broad in the amount of code they cover.
I still do not understand, I did not see where the exception caused the issue in this case, you can have bugs with error codes that are not checked and that blow up much later so I am confused by your vague response (maybe you prefer checked excpetion so you don't ignore problems, I prefer those because they force correctness over developer comfort).
The except clause was built under the assumption that only opening the fd would fail, and thus the cleanup it performs is in those terms, especially the part about closing the file descriptor and ignoring the file object (since that's assumed not to possibly exist).
However due to the mock it's the wrapper creation which fails. This means the cleanup runs, closes the fd, but leaves the file object (created by io.open) around and assuming it owns an fd, the file object later on gets collected by the GC and closes the fd a second time, breaking anyone unlucky enough to have gotten its reuse.
This issue would not happen if the wrapper were created outside the `try` statement (though the file would be leaking to be eventually closed by the GC which is not great either)
You could get a very similar issue in C++ where a destructor closes an FD that was already erroneously closed by an earlier call to a member function due to an exception.
The issue here is a bug in resource management. The specific bug was caused by an exception that shouldn't be caught still being caught. Which erroneously messed with resource management.
I did not see that in the snippets I will try to find the section, was the bug in the code that was catching the exception or it was thrown by mistake. I agree that is a resource management issue and you often make this kind of bugs when you try to reuse objects (for performance reasons most of the time)
> why is "open file" throwing an exception when file doesn't exist but "find substring" returns -1 when substring doesn't exist?
Well, first, there's more to it than that. "Open file" will throw an exception when you open the file to read and it doesn't exist. If you open the file to write and it doesn't exist, you won't get an exception because that's expected behavior. Instead, the file will be automatically created behind the scenes. The complementary scenario, where you're opening a file to write and the file already exists, is more likely to be a bug.
Exceptions have semantic qualities and depend on what you're trying to do. Trying to read a file that doesn't exist is obviously an error state because it is inherently impossible to do. The same goes for detecting the location of a substring within a string that doesn't contain it. But examining a string to see whether it contains a particular substring has no such problem. You can always test a string for the inclusion of some other string. So the judgment of whether "find substring" should throw an exception or not depends on whether you think the question being asked when someone invokes "find substring" is "is substring in there?" (answer of "no" is not exceptional) or "I know substring is in there; where is it?" (answer of "it isn't in there, you doofus" is exceptional). Both questions are common; this isn't really a call the language should be making.
The same problem is more frequently discussed in the context of hash tables. If I look up a key in a hash table and I get a null value, is that because there is no value stored for that key, or is it because null is the value stored for that key? People would like accessing the table to be convenient while distinguishing these two cases, and python doesn't allow that. You can choose from some different behaviors:
table[key] # Absence of the key is exceptional.
table.get(key) # Absence of the key will yield None...
# but this is indistinguishable from
# a key with the stored value None.
key in table # This directly tests whether key is
# present, but won't tell you the stored value.
The true solution is for lookups to return multiple values, one to tell you whether the key was present and the other to tell you what its value was, if anything. But this is rare[1] because people don't want to handle multiple return values from every hash lookup.
But you can store any value, even null, in a hash table -- that's their thing. Finding substrings doesn't have this problem; a legal result must be a valid index into the string. We can solve the how-do-we-want-the-hash-table-to-behave problem perfectly by returning an invalid index when the substring isn't present, and... that's what we do.
[1] Common Lisp addresses this by allowing functions to return multiple values (distinct from a single value with multiple subvalues) -- it's possible to treat the return value of a multiple-valued function as a single value, in which case you get only the first value. This is great for the hash table example, but in general it makes this easier while making it much more annoying to handle the multiple values when you do want them all.
> The same problem is more frequently discussed in the context of hash tables. If I look up a key in a hash table and I get a null value, is that because there is no value stored for that key, or is it because null is the value stored for that key? People would like accessing the table to be convenient while distinguishing these two cases, and python doesn't allow that. You can choose from some different behaviors:
Or, you can create a second null-like type in the language, name it 'undefined' perhaps, and return that when the key doesn't exist.
Of course, that may be considered kicking the can down the road as now you will have to trust developers to be competent enough to never store 'undefined' in the hash table, which I suppose is the reason you didn't mention JavaScript's solution to the problem.
The actual fix is to have a completely separate wrapper type (usually an option type) which signals in a type-safe and unambiguous manner that the key was missing.
Then you don’t have the issues of in-band signalling.
This isn't really different from the multiple values approach. For example, python could have table accesses return (None, True) for a None value that was present, (None, False) for a value that wasn't present, (5, True) for a 5 value that was present...
Is that multiple values? (Yes.) Or a wrapper type? (Also yes, it's a tuple, a collection of values.) This works, but it's unpopular because it makes every table access more difficult.
> This isn't really different from the multiple values approach.
If you ignore most of what I said then it’s not.
> Is that multiple values? (Yes.) Or a wrapper type? (Also yes, it's a tuple, a collection of values.)
It’s neither type-safe nor unambiguous.
> This works
For low value of works. Which I guess is all you can ask for in a dynamically typed langage (eg Erlang uses this pattern to fairly good effect, but it has very good support for it).
But it should not be confused with an actual solution to the issue. It can make things less bad (again given good support for this pattern which Python does not have), not actually good.
> Or, you can create a second null-like type in the language, name it 'undefined' perhaps, and return that when the key doesn't exist.
No, this isn't a solution. This is like terminating C strings with 0 characters, or terminating TSV fields with tabs. It sounds like it makes sense, until you ask "what if the terminator occurs within the data?"
Eventually someone's going to need to store undefined. It's no different than null. If it were a competence issue (which it isn't), we'd be saying that storing null in the hash table was a mistake, not something to work around with null2.
> We can solve the how-do-we-want-the-hash-table-to-behave problem perfectly by returning an invalid index when the substring isn't present, and... that's what we do.
Negative indices are valid in python. Also str.index raises.
And product types are not a good way to signal exclusive conditions.
It's a followup to a story that got nearly 600 votes two days ago and spent a long time on the front page, so most of us are well aware of the context, but which project it is is beside the point. The bug was interesting and the debugging and conversation about it was interesting. https://news.ycombinator.com/item?id=22028581