Hacker News new | past | comments | ask | show | jobs | submit login
OpenBSD: Malloc leak detection available in -current (undeadly.org)
147 points by peter_hansteen on April 17, 2023 | hide | past | favorite | 60 comments



To quota GNU libc manual:

> There is no point in freeing blocks at the end of a program, because all of the program’s space is given back to the system when the process terminates.

https://www.gnu.org/software/libc/manual/html_node/Freeing-a...

I think many GNU tools just never free any memory.

For example, GCC : https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66339

-- edit: added GCC as example.


> There is no point in freeing blocks at the end of a program

...unless you are trying to find memory leaks in your program. In that case it would be very helpful if the program, and the libraries it uses, were written to actively free all allocated memory.

> I think many GNU tools just never free any memory

It is absolutely true that some GNU software (including libraries like glib) allocate memory that they never intend to free. This causes leak analysis of programs that link with these libraries to be more painful than necessary.


To work around leaks, just don’t run your GNU programs for very long (:


Many GNU programs will helpfully terminate periodically with SIGSEGV to help prevent memory leaks from becoming a problem


Maybe this is the intended use case of timeout(1).


The last time I used a leak finder, a memory allocation which still has a ptr to it is not a leak.

You are conflating unfreed memory with unreferencable memory.


Reachability is one strategy for detecting leaks but it's not the only one. Checking for unfreed memory is easier to implement.


On GCC, the memory usage on compiling C++ vs Clang can be night and day.

You can compile large C++ projects in a GB of RAM based netbook with Clang and ZRAM.


Well yeah if your program does something and then terminates then memory leaks are not an issue.

If your process is a server or a daemon or something then restarting each instance every N hours is a nice backstop but memory leaks are still a frightening spanner in the works!


"I run gmake and gcc, And I ain't never called malloc without calling free."


One point would be to be able to detect memory leaks with tools.


This is great news to me, it is the one think I was hoping for.

I use OpenBSD to test objects I create and testing there discovered issues that Linux and AIX happily ignored. But I used valgrind on Linux to look for leaks. With this I can now test for all "my issues" on OpenBSD :)


Could you share more about how nd why you use AIX? How is it in production? Im very interested in corp Unixes but have never met anyone still using one, alrhough new features keep getting added.


Did you used xlC tooling on Aix?

Just curious how they have changed since RS/6000 days.


Yes, but I doubt it changed that much. It did work better for me than gcc on AIX. But I expect the admins messed up gcc when the installed it.


> The null "f" values (call sites) are due to the sampling nature of small allocations. Recording all call sites of all potential leaks introduces too much overhead.

Hmmm.... "too much" feels like a trade-off or value judgement that won't apply to all cases, and some people would probably like to be able to take the performance hit in exchange for a complete trace. Seems a bit odd that that's not even available as an option, as well as the current behaviour.


OpenBSD had malloc.conf settings to set more strict settings. For instance, programs like detox might crash on complex input. I wonder how could OpenBSD work with both enabled.


Neat! Although, I'm curious why the tool doesn't just run addr2line for you?


Are you asking why doesn't it execv(2) addr2line deep within the libc malloc implementation? Because calling execv(2) within libraries is frowned upon.. ;-)

The leak report is being generated internally by malloc. It is then logged via utrace(2) when a process is traced through ktrace(1).

The kdump utility simply dumps the report, strvis(3) escaping any potentially unsafe characters. As this is untrusted user data, passing it as the input/args to another command is unwise. Also kdump(1) uses pledge(2) and cannot execute commands.


I guess I’m confused what’s printing the output here. On other OSes typically things like this are implemented by malloc recording the addresses and whatever parses the report later doing the symbolication when displaying it. I guess kdump just dumps the report but is there no porcelain that takes the kdump data and cleans it up a bit more for human consumption?


Just link to the code instead of executing it as a separate binary


I'm pretty sure parsing ELF binaries is out of scope for kdump(1), sorry, but I don't think that's going to happen.

It's not that difficult to run addr2line yourself with the information provided, and that's really for the best.


You are arguing for a worse UX because of an arbitrary reason. Just link against the code in addr2line. Providing a good UX should always be in scope for a project.


How does one "just link against the code in addr2line" ?


You rename them main symbol to something else then you just call it like any other library.


addr2line is unlikely to be a trivial self-contained program without dependencies.


Then link to those dependencies.


Unnecessary. Most people's hands will be able to type up the required shell one-liner to do this.


Why would openbsd want to taint kdump with GPL code?


LLVM provides llvm-addr2line and llvm-symbolizer. Of course I understand not wanting to link with LLVM code, even when the licence is ok. :)


Then find or write an alternative. Asking people to run a command to see the file and line number is a joke.


Why duplicate the efforts of valgrind and address sanitizer?


In addition to what others said, Valgrind is GPL-licensed. That conflicts with the OpenBSD copyright policy (https://www.openbsd.org/policy.html), which says:

“The GNU Public License and licenses modeled on it impose the restriction that source code must be distributed or made available for all works that are derivatives of the GNU copyrighted code.

While this may superficially look like a noble strategy, it is a condition that is typically unacceptable for commercial use of software. So in practice, it usually ends up hindering free sharing and reuse of code and ideas rather than encouraging it. As a consequence, no additional software bound by the GPL terms will be considered for inclusion into the OpenBSD base system.”

As to clang’s Address Sanitizer, that’s under the Apache License v2.0 with LLVM Exceptions (https://github.com/google/sanitizers/blob/master/LICENSE.TXT), of which the same page says:

“The original Apache license was similar to the Berkeley license, but source code published under version 2 of the Apache license is subject to additional restrictions and cannot be included into OpenBSD. In particular, if you use code under the Apache 2 license, some of your rights will terminate if you claim in court that the code violates a patent.”


OpenBSD begrudgingly made an exception for LLVM/Clang, after vocal opposition to the re-licencing. It currently uses LLVM/Clang 13 and has been making progress towards 15. Licensing is not the problem here. Most of the sanitizers are simply not enabled in the version shipped in base, and require runtime libraries that have not been ported to OpenBSD.

Valgrind exists in ports, but it is ancient and broken. It does not play well with various security mitigations.


> OpenBSD begrudgingly made an exception for LLVM/Clang […] Licensing is not the problem here

If it isn’t a problem, why do you say “begrudgingly”?

I think they are pragmatic but also do find it a problem. Why else would they say “source code published under version 2 of the Apache license is subject to additional restrictions and cannot be included into OpenBSD”?


I didn't say it wasn't a problem. I said it was not the problem here. Important distinction.

Licensing is not the reason for the sanitizers not being enabled in the default build, a lot of stuff isn't. If it were supported, it would probably be delegated to the ports version, along with the analyzer, additional llvm tools, cross-compiling, etc.


True, but I think having this check in the kernel may work better than valgrind on Linux.

With that said, valgrind works great and I like its output, time will tell if I will like the output from this OpenBSD change.


I had no idea that OpenBSD didn't accept Apache 2 code

https://www.openbsd.org/policy.html


They maintained apache 1.x in base for a long time. For a small number of releases they switched to nginx and then i think they wrote their own.



Parent is referring to the Apache 2 license, not version 2 of the Apache HTTPD server.


I think that’s what your parent is saying too. The last non-Apache-2-licensed version of the Apache HTTP server was version 1.3.x, and because the OpenBSD project did not accept the Apache 2 license, they forked the Apache HTTP server’s 1.3.x code base.


It's not coincidental. It was early in the Apache 2.0 lifecycle that the Apache License 2.0 came about (google says 2.0.49). OpenBSD's continued maintenance of a 1.x fork from 2004-2011 or so was mainly about licensing.


> it is a condition that is typically unacceptable for commercial use of software

This is entirely untrue, though. Linux is GPL and it gets way more use than any of the pushover-licensed BSDs do.


IMO, even if they hadn’t included ’typically’ to weaken their claim, one counterexample doesn’t make that _entirely_ untrue.

Also, why use ‘pushover’? OpenBSD has strong principles that they’re willing to give up things for, so implying they’re weak is derogatory and unfair.


I feel like Linux is so incredibly popular that it does make it untrue, similar to the punchline of <https://what-if.xkcd.com/49/>.

I use "pushover" because that's what the FSF uses: <https://www.gnu.org/licenses/license-compatibility.en.html>

> we call them “pushover licenses” because they can't say “no” when one user tries to deny freedom to others.


The point that Theo de Raadt is making is that this line of thinking is wrong*. Most of the FSF/GPL advocates never seem to look at the why's around people disliking their approach. See https://lkml.org/lkml/2007/9/1/102 as an example. OpenBSD helped the Linux project dual license the ath5k driver. They were thanked by GPL advocates illegally trying to strip the BSD license after the fact. Leaves a bad taste, and one that many of us won't forget. You can have your forced freedom.

* "GPL fans said the great problem we would face is that companies would take our BSD code, modify it, and not give back. Nope—the great problem we face is that people would wrap the GPL around our code, and lock us out in the same way that these supposed companies would lock us out. Just like the Linux community, we have many companies giving us code back, all the time.

But once the code is GPL'd, we cannot get it back."

EDIT: The pejoratives are why the FSF message gets lost on an awful lot of people. I admire what they are trying to achieve, the manner in which they go about it makes me not want to engage.


Theo de Raadt is the one who's wrong. When code goes closed source, you're locked out and can't have it at all anymore. When it goes GPL, you can still have it just by going GPL yourself.


> "When it goes GPL, you can still have it just by going GPL yourself."

Which in some people's opinion, is no better. Just a different set of handcuffs.


“This is madness! He has lost his mind! This defies the first law of free trade. Rule zero came before this rule one. Freedom means you cannot dictate to anyone.” (emphasis mine)

https://www.openbsd.org/lyrics.html#43


The story about RMS on the airplane is just an ad hominem and has nothing to do with his opinions on software freedom.

> Some of the software which is fetched and compiled is not as free as we would like, but what can we do.

They could choose to not put such software in their ports tree.

> Meanwhile, Richard has personally made sure that all the official GNU software — including Emacs — compiles and runs on Windows.

There's a big difference between making free software support a non-free system and distributing non-free software.

> Rule 1: You cannot sell your code! Rule 2: You must give it only to me

Straw man. The GPL doesn't have anything even resembling either of those rules.

> You cannot give your code away

This is clearly completely absurd and not required by any free license.


I provided a reference for a portion of lyrics that fairly elegantly summarised the BSD position on freedom – that is it. Besides, it is a song and obviously it will take artistic liberties compared to an essay.

If you want to argue minutia related to things on that page other than the tiny lyric snippet, it is probably better that you send an e-mail to misc@.


Linux makes intentional exceptions in the application of the GPLv2 to accomplish this e.g. vDSOs. There’s a reason Linux refused to move to GPLv3.

Now apply these exceptions to user space, and you’ve basically reinvented the LGPL.


Minix uses a BSD-alike license and is built into every Intel processor sold these days.


They're describing libraries. By 'Linux', I'm assuming that you're referring to the OS distributions and not the kernel on it's own. And please drop the pejoratives.


There is seemingly a movement or line of thought that blames the success of “big tech” on permissive licenses. You do not see it as much on Hacker News, but it certainly has spread across IRC and many other forums. What is even odder is that a subset of it uses alt-right terminology to refer to both permissive licenses and their proponents. It is all very weird to experience as someone that entered the FLOSS community in the early 00s. Not to mention having had a personal conversation with rms bemoaning the license schisms and him explicitly expressing gratitude for the BSDs contributing to the larger FLOSS movement.


I'm beginning to see and hear it more from younger folks--mainly free-speach absolutists. It is indeed a worrying development.


Way more visible use, maybe. BSD is otherwise everywhere (e.g. Sony).


Both of those are very useful, but slow, especially valgrind.

I don't know OpenBSD, but presumably this is analogous to glibc's relatively faster memory sanity checking.

For GCC and clang you'd also want LSAN for this, not ASAN. ASAN is more accurate for edge cases, but much slower.

"Slow" here means that even for development the runtime can be prohibitively expensive.

E.g. I regularly run git's test suite, optimized/LSAN/ASAN/valgrind runtime is on the order of 3m/15m/30m/24 hours. The basic glibc sanity checking only adds a minute or two to the optimized run.

An advantage of any malloc based detection is also that you can run it on any existing binary. Whereas the likes of LSAN and ASAN require a custom debugging build (or to have that tracing overhead present in your production build).


This is available by default and integrates with existing tools (notably ktrace), making it easier to detect memory leaks on all platforms OpenBSD supports.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: