Hacker News new | past | comments | ask | show | jobs | submit login
Dissecting QNX [pdf] (blackhat.com)
120 points by adamnemecek on Sept 19, 2018 | hide | past | favorite | 71 comments



I still have the QNX demo disk image (including GUI!) which fits on a floppy drive.

It's a shame there's no comparable open source OS :(



A word of caution: the first URL doesn't have TLS (HTTPS). None of the links to the demos have either. The links on Win World PC don't have any checksums.

I actually remember using this demo floppy though, back in those days (the other impressive Unix-like OS I used back then was BeOS). It was the first time I stumbled upon Towers of Hanoi. Sadly the main developer of QNX passed away.


I think the 32-bit MenuetOS still fits on a floppy (and is open source).


graphical os smaller than your average favicon


Related 34C3 talk [1]. Not sure if there's any news in this PDF. Skimming through the CVEs mentions 2017 but none from 2018 (34C4 was in December 2017).

[1] https://media.ccc.de/v/34c3-8730-taking_a_scalpel_to_qnx


The article seems to be a more academic rephrasing of said talk.


Anyone working on QNX based system here?


Yes. Working for a HW vendor catering to automotive and embedded customers. QNX is one of the five or so OSes we work with.


What are the other ones?


Linux (several flavors), Windows, QNX, GH Integrity, FreeRTOS and a handful of in-house and clients' proprietary special-purpose OSes.


Interesting. I would have expected that also

- VxWorks (https://en.wikipedia.org/w/index.php?title=VxWorks&oldid=857...)

- Nucleus (https://en.wikipedia.org/w/index.php?title=Nucleus_RTOS&oldi...)

- perhaps ThreadX (the "spiritual successor of Nucleus") (https://en.wikipedia.org/w/index.php?title=ThreadX&oldid=852...)

are on this list.


Cisco's high-end router operating system, IOS-XR, has two flavors: an older one built on top of a QNX kernel and the current one built on top of Linux. The QNX version is still supported but is not in active development.


Do you know any details on the reason why Cisco switched from QNX to Linux?


The official line...

"The 64-bit Linux infrastructure is the de facto standard that is being used across the industry today...So it gives us more development tools, more tool chains and also more access into the third party development ecosystem."

http://mobile.enterprisenetworkingplanet.com/netos/cisco-evo...


That seems to agree with what I've heard internally. Granted, I'm new, so I didn't go through the migration phase.

XR itself provides abstractions on top of QNX and Linux, so devs working on platform independent code are not really affected.


Maybe also because things like DPDK (or similar) made it possible for Linix to play at that scale?


The routers I am talking cannot do packet processing on the CPU. We are talking about 3.2 Tbps per linecard with 10 linecards per router.

All data plane activity is done on custom in-house ASICs called network processors (NPs). The CPU only handles control plane traffic and general administration.


One of my former employers was developing in QNX, as of 2017 they were still using some 2005 compiler for 32-bit.


What general space were they working in? Networking, for example?


Medical devices


I don't myself, but my brother, who designs and builds stages for U2 and other musicians, uses it for automation (lights etc.)


Does he use the AMX systems. That is Harman, Harman used to own qnx before.


Yes, I maintain a few legacy system that still operate partially on QNX 4.25

It did a few really cool things for its time, considering that TCP/IP networking was an optional add-on when we originally deployed.


Yes. I work for QNX. I can answer any technical questions.


What's your goto intro/reference to QNX's internals?


I have access to the code. There is no more reliable reference. The source code is remarkably well organized and clearly coded and has massive amounts of unit tests.

When the code is not enough, I go to http://www.qnx.com/developers/docs/7.0.0/#com.qnx.doc.qnxsdp...


How much is the difference of qnx7 and 6.4 at kernel source level? 6.4 source is available in github from time qnx was made open source.


The biggest difference between 6.6 and 7.0 in the kernel is mostly that SMP is always enabled and 64-bit targets are first-class citizens. There are plenty of minor changes, but the kernel itself is very, very small.

There are plenty of changes to the userspace runtime, but the question was about the kernel.

Oh, also, 7.0 has an ISO 26262 certified variant. Not a technical difference, but an important one.


Hi! I have a couple questions I've been wondering about for years. Appreciate any insight/answers you can provide.

--

QNX was momentarily open for a little while until the license change with BlackBerry et al. My question: did this license change only apply to future QNX versions, or did it also retroactively apply to the "open"-sourced code as well?

My primary current interest in QNX stems from being fascinated with old and/or unusual operating systems. I'd love to be able to go fishing for 6.4 et al, maybe even compile what I find from source (or maybe not), and basically just play with the system to see how it works. If it's fine for me to go and find 6.4 and poke at it for noncommercial purposes - well, that'd be awesome to know. Obviously such a usage model would not incorporate any official agreement or warranty, and I understand that.

In a somewhat related vein, at some point I may find it useful to observe how QNX handles certain technical minutiae as part of my own (hopeful) OS development work. Obviously sourcing the latest versions of the QNX source for this purpose would offer support options, not to mention a more relevant codebase; but it would be great to know that I'd be able to safely make do with the older releases as long as I don't seek/expect any form of support.

--

I was unfortunately out of the loop with the QNX scene during the period it was open so I never got a chance to grab any of the repos. (And a quick search turned up what appears to be some QNX 6.4 bits and pieces on GitHub (as the previous comment hinted at) but it doesn't look very official, so I don't want to sift through it in case I waste my time.) Of course official repo access has since been closed, so I can't check that. So: I figure why not ask, what can it hurt.

When the repo was opened, did it include full commit history, or a large portion of it?

QNX 4.x is really cool. I managed to get an old copy working in a VM some time ago (took a bit of thinking; there's so little documentation out there). 6.x is nicer, but 4.x feels faster (which kind of makes sense).

--

I've been fascinated with QNX for years, and to be honest I want to say I find the "closed > yay open!! > closed" timeline incredibly frustrating; but besides wistfulness, this has also generated a fair bit of confusion regarding the current status quo.


I can tell you some of that. We ported all our applications from QNX6.5 to QNX6.6 and then QNX7.0.

They deprecated some POSIX calls like posix_spawn_file_actions_addopen (at least in QNX6.6 documentation they mention it's not fully implemented and it is indeed broken).

They made the QNX7.0 kernel instrumented-only (there are no longer non-instrumented versions).

They changed lots of low level stuff that is outside the kernel, like throwing out photon and replacing it with screen library, completely replacing the PCI server, making changes to the console, security patches, and so on. But since it's a microkernel, this is mostly in userland.



Yup, my companies product is actually mentioned on the first page of the link


I've used it on medical devices.


Yes. I work with QNX6.6 daily. I also worked with QNX6.5 and QNX7.0. It's a real blast to work with the mikrokernel. It's a really cool system, and the kernel stability is excellent. Feels good that when a low level driver crashes, your system just goes on as if nothing happened (except, of course, the applications that need this driver). Speaking of drivers, the driver situation on QNX is a bit sad, simply because there are few drivers and the ones that exist aren't as high quality and well tested because QNX is niche. Still, if there was a QNX environment that is as progressive as Linux Ubuntu (concerning GUI, drivers, etc.) and OpenSource, I'd definitely use QNX as my main OS.


Can not agree more. Once you know qnx, nothing matches up anymore. Linux feels like dinosaur.


>QNX is niche

What is the most mainstream or perhaps "least niche" of the real time embedded OSes?


Hmm... BlackBerry 10, perhaps? https://help.blackberry.com/id/blackberry-security-overview/...

BlackBerry nowadays is... well let's say the mobile world is basically divided into 2 sides these days: Android and iOS :p


What? BlackBerry is realtime? Why?


So you have the option of running the baseband on the application processor to save costs if need be, among other reasons.


You need a hypervisor for that. Qnx brought that later.


You don't. Symbian EKA2 platforms (eg. S60 3rd edition and later) used to run baseband as userspace threads on same core as user applications. I would not be that surprised if that was main enabler for Nokia E51/52, ie. full-featured smartphone in executive-phone form factor (small and in the first place thin candy-bar).


There is linux-rt, a fork of linux but I don't think it's competitive in the industry.


VxWorks/FreeRTOS

There are also more mainstream real time Linux distros e.g. Yocto Linux



Opening the link in Google Chrome 68 shows the PDF front page momentarily and then the browser shows a blank screen.

Is anybody else encountering this problem?



Interestingly when I relaunched Chrome and it auto-updated to Chrome 69 I am able to see the PDF without a problem. Not sure where the issue lies now.


I have this problem too with Chrome on macOS. Refreshing the page makes it appear, but this happens frequently enough to be annoying.


Very insightful. QNX 7 disable aslr maybe because of ASIL support.


Are you talking about Automotive Safety Integrity Level? If so, how does this have anything to do with ASLR?


I don't know why they disabled ASLR, but safety critical systems (and functional safety people) tend avoid randomization...


Because such small embedded systems tend to avoid a stack and recursion, and more, tend to disable malloc at all.

Variables and places are predefined. ASLR is a problem there, not a solution.


Having code that behaves differently if it's loaded at different addresses seems like a bug. So by not doing that, aren't you just masking it?


Presume that you are a software engineer. Your career and other people's lives depend from your producing systems that operate safely. You also have to make risk analyses and meet performance goals.

Your operating system executes different program images for every successive execution of your program, picked in an unpredictable manner.

How do you prove that every possibility passes the safety tests? How do you measure the risk of this random selection? How do you know when you have done enough simulation?

How do you match up software randomization with the ISO 26262 concept that all software faults are systematic and not random as (some) hardware faults are?

How do you prove that memory allocation and execution always meet performance goals? How do you construct and perform reproducible performance tests? How do you demonstrate that your measurements are meaningful?

Software engineering in this case involves thinking about all of these questions and more besides.

* https://hal.archives-ouvertes.fr/hal-01375451/document

* https://www.usenix.org/sites/default/files/conference/protec...

It appears (to me, at least) that the current state of the literature on ASLR is that it is treated as a succession of theoretical arms races, which new defence militates against which new attack, and almost no attention is paid to the concerns of actually deploying it in a larger system; and the current state of the literature on functional safety is simply "we will assume that there are no randomization processes in the software" (from an actual paper presented at ESREL 2016).


> It appears (to me, at least) that the current state of the literature on ASLR is that it is treated as a succession of theoretical arms races, which new defence militates against which new attack, and almost no attention is paid to the concerns of actually deploying it in a larger system; and the current state of the literature on functional safety is simply "we will assume that there are no randomization processes in the software" (from an actual paper presented at ESREL 2016).

Thanks for your explanation. To give a slightly different perspective on the quoted paragraph: mitigations such ASLR etc. do not protect against security bugs, they just make them more "inconvenient" to exploit. So "average script kiddie" will probably not be able to write an exploit for them. On the other hand, for well-founded agencies (think 3-letter agencies), these are no serious hurdles. In this sense, mitigations do not improve security in the sense of "less security holes". Instead their (probably unintended, though not undesired) consequence is that mostly well-founded agencies are able to exploit security holes. Whether this new situation is good or bad for software security is up to the reader to think about.


To be more specific about "more inconvenient": I believe part of the intended effect of ASLR is to make ROP exploit attempts typicaly crash the process instead of successfully gaining control. This (ideally) brings admin attention to the system, which attackers generally want to avoid.


> To be more specific about "more inconvenient": I believe part of the intended effect of ASLR is to make ROP exploit attempts typicaly crash the process instead of successfully gaining control.

Keep in mind that before ASLR came, there was (and still is) DEP and its claims that lots of classes of attack were now impossible. The end of this story was that ROP was invented and hardly anything has changed, except that ROP code is much more tedious to write (i.e. no problem for well-funded attackers).

Now we have ASLR and you are probably right that now ROP exploits lead to process crashes instead. But attackers have already invented new techniques for circumventing ASLR, such as return-to-plt, GOT overwrite or GOT dereferencing. Again making it more inconvenient for script kiddies to write exploits, but again no problem for an attacker who can throw lots of money and people at the problem.


> But attackers have already invented new techniques for circumventing ASLR, such as return-to-plt, GOT overwrite or GOT dereferencing. Again making it more inconvenient for script kiddies to write exploits, but again no problem for an attacker who can throw lots of money and people at the problem.

Helmets and bulletproof vests is no match for powerful rifles.

I'm a bit tired of this reasoning here on HN: If it isn't perfect it is worthless.

I think I can see reasons why a vendor might want to avoid ASLR in safety critical systems.

But we shouldn't talk down decent protection tecniques that will often save us.


> I'm a bit tired of this reasoning here on HN: If it isn't perfect it is worthless.

This argument (that "If it isn't perfect it is worthless" does not hold) is suitable for many topics in life, but in my opinion not for IT security. I can conceive that this might be one reason, why so many people (explicitly including politicians) make such bad decisions about IT security.

I might be somewhat paranoid regarding this topic (which is not a bad trait if you want to work in this area), but let me give my arguments:

First: the fight for secure systems is deeply asymmetric. The attacker side just needs one working exploit, while the defender side has to ensure that there exists no security hole. This strong asymmetry really makes it necessary that the security is as perfect as possible.

Second: if the device is connected to the internet, everyone/every device that exists in the world can be an attacker. So what you are fighting against is the whole world. Or in other words: the security of the system that you use has to withstand the smartness of some of the smartest people in the world.

Let it be stated clearly that this fight is not hopeless as it looks based on these arguments: for designing the security of your system, you can resort to the knowledge of many really, really smart people, too: this is what the various standards (e.g. for cryptography) are about. What you cannot afford is to tolerate the slightest bit of imperfection in the security architecture of the system.

TLDR: In security, at least "If it isn't at least nearly perfect, it is worthless" does indeed hold.


Cryptography isn't perfect; someone could always guess your private key. But that doesn't make it useless, since you're hoping that it's just sufficiently improbable that nobody in their right mind will even try doing it.


> Cryptography isn't perfect; someone could always guess your private key.

For the accepted standards, even the smartest people working in this area have not yet found a method to find the private key sufficiently fast (at least such a method has not been published). So to the best of our current knowledge, those methods are at least very near to the perfection that is possible with our current technology.


> Cryptography isn't perfect; someone could always guess your private key.

Cryptography is a branch of mathematics, and cryptographic systems can be formally proved to have certain properties, such as being unable to derive the private key from the content of the encrypted message. That the private key can be guessed is a trivial observation, and a bad argument for dismissing formal proofs. ASLR is a hack on a hack that does not tell you anything about the formal properties of the system.


> Cryptography is a branch of mathematics, and cryptographic systems can be formally proved to have certain properties, such as being unable to derive the private key from the content of the encrypted message.

A small correction: All those proofs (if they exist) are relative to complexity-theoretical conjectures that are (ideally) widely believed to be true, but open. The only system that I am aware of where an "absolute" security proof exists is OTP, but this is hardly suitable to use in practice.


> Having code that behaves differently if it's loaded at different addresses seems like a bug.

Why? This only sounds like a bug to me if it is intended to be position-independent code (PIC).

A reason why in safety-critical code ASLR is avoided is that it introduces another source of non-determinacy and potential bugs, which you want to avoid.

UPDATE: So you really want to keep the system as simple and small as possible and avoid to add anything to it that can introduce new bugs.


At what point do you want general purpose code to be position dependent?


When you program for platform where writing PIC involves ugly hacks with measurable performance impact, for example i386.


Its not i386 that's the issue, it's the ABI


Would you rather die to expose a masked error or live and leave in the masked error? That's what rules for critical systems are about. The time to fail fast is before production.


What is ASIL? Automotive Safety Integrity Level? How does ASIL relate to ASLR?


Yes, with ASLR, qnx loses the predictability. BTW, qnx7 is ASIL-D on SEooC.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: