Hacker News new | past | comments | ask | show | jobs | submit login
I Compiled NASA‘s Workmanship Standards into a Single PDF (archive.org)
283 points by muunbo on Aug 29, 2020 | hide | past | favorite | 55 comments



Wow this gives me flashbacks. I interned there one summer, working on a project that was in absolute crunch mode with no budget left for the fiscal year.

I and some other interns got press-ganged into helping assemble a prototype device ahead of a visit from some higher-ups at (I think) JSC, and I ended doing a lot of soldering - something I had only a little experience with before that summer. As I was learning, I spent a lot of time with one of the electrical engineers scrutinizing my practice pieces under microscope for tiny scratches in the conductor and little burn marks. I also distinctly remember burning my fingers over and over using the thermal wire-stripper, since it tended to made the conductor extremely hot.

I ultimately ended up having very little confidence in my workmanship, and when I found out that the thing we were working on might end up flying on the ISS, I became terrified that I would read about a fire on the station someday and not know if it was because of something I had messed up. That was also the summer I got my first gray hair.


that's an incredible experience, thanks for sharing!

if you don't mind me asking how long ago was this?


It was summer of 2014, so I'm not some old graybeard if that's what you're thinking.

Yeah, it was incredible - very stressful, but I learned an enormous amount in just a few months. And I was surrounded by a ton of extremely smart, motivated (though sometimes rather dysfunctional) people. I just wish I had the knowledge then that I have now. In retrospect, the project I worked on really touched on almost every single aspect of mechanical engineering, and it was really what steered me in that direction. I wish I could talk more about it. It's not classified or anything, but it's very specific, and I'd rather not dox myself by accident.


This goes well with https://llis.nasa.gov/

"The NASA Lessons Learned system provides access to official, reviewed lessons learned from NASA programs and projects. These lessons have been made available to the public by the NASA Office of the Chief Engineer and the NASA Engineering Network. Each lesson describes the original driving event and provides recommendations that feed into NASA’s continual improvement via training, best practices, policies, and procedures."


This lessons learned database is often met with a sigh and eye roll in industry unfortunately


Why is that? Is NASA drawing questionable conclusions from the incidents, or what?


No, in general the search-ability is considered lacking. The findings are often quite good. It’s often just difficult to find relevant results. Finding a needle in a needle stack. Think of poorly managed orgs that just throw everything on SharePoint.

On top of that, there’s often so much schedule pressure that project managers may only do a cursory look (if any at all) and unlikely to add anything of their own


TIL that most of my electronics work is well represented in the "bad" sections of the NASA manual. Then again, I'm not launching it to Mars, so it's OK.


Haha! It’s helpful regardless for non-space projects. For e.g. I made “rave skates” (colour shifting LEDs on ice skates) and the connections on it have to be robust to abrupt movements and forces.


I do intend to read the manual cover to cover. A lot of the advice in it isn't difficult or labor intensive to apply, and a lot of it makes sense outside aerospace. I do want an "excimer laser wire stripper" now though.


Interesting, thanks for sharing! It would be even nicer with some kind of table of contents: named chapters and the like, especially in the PDF structure so that PDF viewers can show and provide direct access.

How did you create this compilation?

Also, pages seem out of order, with blank pages, but I guess that's to print that to leaflets?


There have been a lot of requests for the table of contents. It’s on my todo list; I’ll post again when it’s ready :)


If anyone is interested, the IPC J standard is often followed for flight hardware. Advantage being you can get a certification in this standard.

https://www.ipc.org/ContentPage.aspx?pageid=J-STD-001


Most of the time you learn the IPC-A-610 stuff first...then J-standard is an addition.

One thing to remember though is often Mil spec or custom builds actually require better quality than IPC or J-std gives. IPC/J-std are "Minimum acceptability requirements"...meaning "How bad can it be before I am required to fix/scrap the part?" Every time you rework a board you damage it (they are actually being damaged every time they are heated) so you only want to rework if absolutely required.


I can only speak for my personal experience, but we started with the J std but that may be just because the nature of the work was almost exclusively spaceflight.

As far as “minimum requirements” I completely agree. As with all specs, they define the lowest level of acceptable characteristics. Especially if done by contractors this will be the aim since they rarely make additional money going above and beyond minimum standards, but can often lose money doing so


Quite impressive. Good job dude!


Not to sound too harsh, but ~ 300 MB seems a bit too much. Maybe you forgot to enable image compression.

I did some poking around and discovered that the problem is that all those drawings are incredibly detailed vector art. The "proper solution" is to open each page with Inkscape and convert those vector art drawings into regular bitmap images, probably encoded in JPEG to save space.

Alternatively, you can get a reasonable quality DJVU file for under 50 MB with:

pdf2djvu --dpi=900 --verbatim-metadata -j4 "NASA Workmanship Standards.pdf" -o "NASA Workmanship Standards.djvu"


The proper solution is to get a bigger hard drive.


Imagine carrying it around on your smartphone/phablet to have it available at all times. Given a MicroSD with 512GB capacity that is more than half of total capacity for a single file.


You’ve slipped three orders of magnitude there.

300MB is 0.06% of 512GB.


Lol. Sorry. Just got up :)


If only these existed for blocks of code.



I was nodding along to this, but this gave me pause:

> Assertions must always be side-effect free and should be defined as Boolean tests. When an assertion fails, an explicit recovery action must be taken, e.g., by returning an error condition

> ... Because assertions are side-effect free, they can be selectively disabled after testing in performance-critical code.

Why would you put in recovery logic in if you're just going to disable it?

For example, let's say that I've got a new algorithm for calculating the cosine. Just to be safe, I add an assertion that the return value is in [-1, 1], and return an error if it isn't.

Now the clients of my code (say, the guidance computer) deal with the error somehow: if cos(x) returns an error, show an error to the pilot and maintain course or something.

If that assertion were stripped out in production code, then there was no point in writing the recovery logic. I've never written safety-critical code - am I missing something?


You might be misled by the use of the word “recovery”, returning an error condition in many areas of performance critical code while running in production just results in a loss of vehicle, there’s no coasting in a rocket launch, bugs usually mean explosions.

There are also many time critical code paths where microseconds matter.

Think of assertions in lots of places more like unit tests, important to verify your code but unhelpful in production.

Failure recovery can be whole entire systems for production which are much different than assertions.


> For example, let's say that I've got a new algorithm for calculating the cosine. Just to be safe, I add an assertion that the return value is in [-1, 1], and return an error if it isn't.

I believe the idea here is that, if you've tested your code correctly, the assertions are never triggered and therefore your constraints are met and you don't need the checks anymore.

Somewhat interestingly, the Chromium code base does something similar. There are `DCHECK` macros everywhere that are assertions that crash if the condition is false (for instance if a variable is null or some such) but they're disabled in production builds


Are you sure they aren't turned off for performance reasons?


Also yes


If your code is swinging a multi-million-dollar robot arm near a multi-million-dollar mirror assembly, for example, it’s probably not seen as excessive to add recovery code even if some of those assertions will only be active during testing phases.


Seems odd also. My understanding of assertions is if they fail, there really can be no recovery because you've reached an impossible state (impossible = this should not ever have happened).


Not exactly. For example during an automated download, if an http status code is not 200, it's perfectly fine to yield, delete the incomplete data from the failed download, and try the download again. Same thing with checking data integrity. If such fails, one can simply delete, redownload, rehash, and set a variable upper bound on the number of retries.

Of course nothing in space, keeping astronauts alive, should be using the web.


Assertions are not the errors which you’re supposed to recover from. If you ended up in the state that triggers an assertion it means that you’re in the rabbit’s hole and the Cheshire Cat is speaking your you. Also you can divide by zero. Halting problem is solvable in finite time. Pi is not transcendent.


I would hope they might try to recover from it if I'm in orbit and it's the system I need to get back down to earth. At least give it that old college try instead of just saying "welp you're dead"


Assertions are usually the stuff similar to “I have just added a key to a dictionary, now I look it up again to get it’s owned-by-dictionary address - what do I do if it doesn’t exist now?”

In such scenarios you can only kill the process - there’s no way to recover from such an error and handling it in any way but by disabling the subsystem (or killing the process) is impractical.


You mean astronauts shouldn’t have iPads they need to put in “airplane” mode when the cached data gets corrupted? cough cough


I think keyword here is

>performance-critical code.


Rule number one includes no recursion, which I get but is also really interesting to me. I feel like I could turn any recursive algorithm I'd use in production into a loop, but I feel like it would trip me up


Add rule 2 (fixed upper bounds on loops), and you see that you're restricted to a certain type of algorithm. You can't build something that, say, walks an arbitrary-sized binary tree - not and be within the rules. But then, you can't create an arbitrary-sized binary tree, because you can't allocate memory after startup.

These rules are for guaranteed-response-time algorithms that take data from a fixed number of places, do computations on it using fixed-size buffers, and write the data to a fixed number of places. They aren't for writing general Turing-complete computations in all their variety.


In addition to what AnimalMuppet said (fixed bounds on loops), you also have to be careful to avoid blowing the stack with limited memory + lots of running processes


often you have a limited stack and recursions can easily blow the stack unless you're very very careful, particularly in embedded situations. Loops are a bit better for that.



MISRA C may be the nearest thing.


MISRA C is an effort to tame the wild beast of the C programming language. I get the impression it's not a bad ruleset, but there's plenty it doesn't protect you from. MISRA C doesn't give solid assurances of the absence of undefined behaviour, for instance. [0]

If you're in the business of developing ultra-low-defect software, and you aren't committed to C, SPARK Ada is another option.

[0] https://learn.adacore.com/courses/SPARK_for_the_MISRA_C_Deve...


It really is only a spec to adhere to to get rid of some of the more obvious errors. it certainly doesn't cover everything. Ada is definitely a better language. Also if rust ever settles down I think it would be great for aerospace 178b and other standards for aircraft. Also cars :) . I'd be surprised if some car companies aren't using it or at least researching it. It's still changing so fast though. I've only ever met one engineer who actually liked coding in Ada.


> if rust ever settles down I think it would be great for aerospace 178b and other standards for aircraft. Also cars :)

I imagine the compiler situation would need to change before this could become a possibility. I doubt off-the-shelf Rust/LLVM is appropriate for compiling life-and-death code.

I imagine it would also be necessary to strictly control memory management, using pools rather than doing the equivalent of malloc/free. It seems Rust has a crate for that: https://docs.rs/heapless/0.5.5/heapless/

> I've only ever met one engineer who actually liked coding in Ada.

It certainly lacks many luxuries. For that matter, it also lacks basic examples. I tried to dabble with Ada recently, and pretty quickly ran into trouble (I was unable to figure out how to instantiate any of GNAT's 'bounded containers').


Would you mind sharing which bounded container you tried to instantiate and what the problem was?


I wasn't able to instantiate any of them, I couldn't figure out the proper syntax. There are no examples anywhere that show how to used them.


Is this the kind of example you were looking for?

    with Ada.Text_IO;                    use Ada.Text_IO;
    with Ada.Containers.Bounded_Vectors;

    procedure Main is
       package BV_Integer is new
         Ada.Containers.Bounded_Vectors (Index_Type   => Positive,
                                         Element_Type => Integer);
       use BV_Integer;

       Vec_Max : constant := 10;
       Vec     : Vector (Vec_Max);
    begin
       Put_Line ("Appending some numbers...");
       for I in 1 .. Vec_Max loop
          Vec.Append (Integer (I));
       end loop;
       
       Put_Line ("Appending another number...");
       Vec.Append (Vec_Max + 1);   --  this raises an exception.
    end Main;


Yes, thanks. Perhaps I was just missing Vec : Vector (Vec_Max);. I'll give this a go on GNAT some time.


How do these compare to the IPC standards? Redundant?


It depends on the program manager, but in many cases if you meet the IPC standards they are considered equivalent.

Kinda similar to meeting ISO or AS standards. When in doubt, you default to whatever the spec in the drawing states


Hmm I have no idea! Are the IPC standards publicly available for me to check out?


Yes but the book isn't free. IPC-A-610 is a good place to start. I just briefly browsed the page but it seems to be very close to IPC standards with the J-Standard addition (which is mission critical/Space typically)

Source: I was IPC trained for QA/technician at one point (as well as soldering and assembly)


Having worked on SMT stuff years ago, I (kinda) fondly remember the specs of what's acceptable/unacceptable in this guide (starting around page 125).


...and I didn't even know laser stripping was a thing.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: