LLVM Programmer’s Manual

musha68k · on July 22, 2016

I was just looking into LLVM the other day as I was researching the possibilities of cross-compiling Rust to m68k on my Mac and now I simply can't stop reading up on LLVM!

Support for Motorola 68000 has never been finished but this guy's abandoned fork offers an interesting glimpse into how one might start going about it though:

https://github.com/kwaters/llvm-m68k/commits/m68k

A bit over my head for now but I'm using this as a fun occasion to get back into "proper" computer science. Dabbling is my favourite learning mode so consulting the LLVM Programmer's Manual while hacking on a more isolated and really simple task is something I'm very much looking forward to at this point.

Still, if there is anybody with more knowledge who could chip in a bit on supporting m68k or LLVM in general I'd be super curious to read about it :)

webkike · on July 22, 2016

I believe that the heavy use of iterators in LLVM was a mistake. The ownership mechanics has never not been a source of extreme confusion and annoying bugs. But, it's still a great backend.

Joky · on July 22, 2016

Can you elaborate? Especially I don't really see the relation between "iterators" and "ownership".

slezyr · on July 22, 2016

If you insert something in vector, then all iterators will became invalid.

Joky · on July 24, 2016

Sure, but iterators have nothing to do with "ownership", so your point isn't clear...

DannyBee · on July 22, 2016

Errr? I'm aware of maybe one iterator invalidation bug in the like the past year, and one or two ownership bugs, mostly in badly written code.

On the list of "stuff llvm got wrong", i feel like this wouldn't even make the top hundred.

webkike · on July 22, 2016

A large part of writing optimization passes for LLVM consists of figuring out how methods/functions interact with the iterators in terms of validation and ownership. A lot of LLVM is undocumented unfortunately and it's not always clear how they will interact with the iterators. LLVM itself does not suffer from these problems much, but writing opt passes is definitely a huge pain.

Joky · on July 24, 2016

My experience is that LLVM uses of iterator is very coherent with what to expect from C++, and I never had any issue on this aspect. Iterator invalidation is pervasive in the STL containers, and as a C++ programmer you are supposed to know what type of container you're iterating on (in LLVM iterating over a basic block is a linked-list, iterator are not invalidated for instance).

Rarebox · on July 22, 2016

Notably, exceptions are not used. Instead errors are handled with assertions and error objects.

adamnemecek · on July 22, 2016

I feel like people are realizing that exceptions were a bad idea. They are kind of like gotos.

IshKebab · on July 22, 2016

Yeah finally. The problem with exceptions is that you lose context about where the error occurred, and you really need to know that to handle the error properly. That's why most exception handlers just print the error.

I think the only situation where exceptions help is in constructors, but there are better solutions to that problem, like explicit object construction methods that can return an error.

I think Go's approach to error handling is the best - explicit and in context. Rust's `Result` looks quite nice though I haven' tried it yet.

barrkel · on July 22, 2016

Most errors shouldn't be handled; that's the reason most exception handlers simply log the error.

I've said it here before, but this is my classification of errors that exceptions are commonly used for:

1) Programming error: null pointer exception, library invariants broken, etc. These generally shouldn't be caught, and could arguably be replaced by fatal assertions, but that's typically not a polite thing to do in a library.

2) Non-deterministic failure condition: something outside the control of the calling code failed in a way that it cannot predict ahead of time. A network connection broke, a file was deleted unexpectedly, a device was detached, database server died, etc. There's something of an argument for annotating functions that can fail this way in the manner of Java's checked exceptions and it has a strong relationship with the IO monad in Haskell (and similar problems with encapsulation and hiding implementation details). These kinds of errors can sometimes be handled, and handling them with exceptions isn't much better or worse than error codes. An advantage of exceptions is that they makes it easy to abort the whole task and go back to the request handler (server code) or event dispatch loop (UI code). Exceptions can carry more state than error codes, too.

3) Business logic problem: the user, for want of a better term, of the program tried to do something that is not allowed / possible, and the program needs to abort the current task and inform the user of the problem. Exceptions are as good a means of aborting as most others. These should only be caught at the top level, whereupon they can be converted into something for human / 422 response / whatever consumption.

In almost all cases, you cannot proceed and there is nothing you can do in reaction to the error. Aborting and unwinding the call stack is usually the correct thing to do.

clappski · on July 22, 2016

I happen to disagree. Exceptions are a powerful abstraction over errors that allow you to highlight the exact context of an error of handled properly by avoiding catches that just throw and actually giving the exception body useful information about the error that can be logged.

I haven't used Go (I'm about to for a small project, though) so I can't comment on the "if err != nil" idiom, but it seems like a step back by encouraging code duplication where it's not needed. I'm a fan of the pattern matching in rust, but it's still rather orthogonal to exceptions as it allows you to deal with an error when you try to access state whereas exceptions are most useful when they appear mid operation.

co_dh · on July 22, 2016

I don't agree neither :). Exceptions don't necessarily lost context. At least in Python. In my previous job, whenever a python exception raise, the code will catch it, with values of each variable, formatted in a beautiful html, and send as an email. You can never overestimate how much time it has saved us to debug. Go's approach to error handling remind me of the old COM days, while you handle error after each function call. I'm glad it's no longer popular.

tkinom · on July 22, 2016

Agree! I tried to move a program from Python to Go. Completely gave up on Go after comparing two language's implementation on exception. Python is just so much easier, cleaner.

Go's implementation on JSON parsing is also.....

pjmlp · on July 22, 2016

Even WinRT, the 2nd coming of COM supports exceptions:

Platform::COMException

https://msdn.microsoft.com/en-us/library/windows/apps/hh7104...

IshKebab · on July 24, 2016

Please. Python doesn't even allow for specifying which exceptions a function throws.

evincarofautumn · on July 22, 2016

Exceptions are much worse than “goto”, in my experience. “throw” is basically a dynamically bound “return” statement—it’s hard to reason about where you’ll end up, because it depends on the call stack above you, and the type of object you decided to throw. The same throw site might jump to many different, unrelated parts of the program. Real exception safety is almost as hard to get right as concurrency, and for similar reasons.

Chris_Newton · on July 22, 2016

I can’t help noticing that much of your objection applies just as much to return as to throw. Isn’t the point, in both cases, that you shouldn’t need to reason about where you’ll end up?

Either your function can complete its task successfully, in which case it satisfies whatever invariants are required of it and returns accordingly, or it can’t, in which case it can throw an exception to indicate that inability to return successfully. In the success case, the calling code can then continue based on whatever returned value and/or side effects the function was supposed to provide. In the failure case, the calling code may well fail itself if it can also no longer complete its task successfully, and so on up the stack until you reach a level in your design that knows how to recover from the failure or stop the entire program gracefully, which is where you catch the exception.

I’ve never quite understood the idea that exceptions are similar to goto. Other than transferring control over potentially long “distances” within a program, they don’t seem to have much in common at all. To me, a much closer analogy would be returning early from a function on success, if you reach a point in your algorithm where you already know the required answer to whatever question your calling code was asking. Another similar situation would be using a break or continue statement to finish an iteration early within a loop, or adding extra guard cases to stop a recursion early, again if you already know the outcome and continuing with the full algorithm has no advantage.

evincarofautumn · on July 22, 2016

> I can’t help noticing that much of your objection applies just as much to return as to throw. Isn’t the point, in both cases, that you shouldn’t need to reason about where you’ll end up?

That is the point, but my specific objection is to the dynamic binding and dynamic typing, not the returning. With “return”, even early return, you don’t really need to reason about where you’ll end up at all—it’s always the end of the current function. And the same goes for functions that throw exceptions directly.

The problem is that when you have unchecked exceptions, any function can throw, so every function call in your code is now also a potential early return, and for correct code you need to account for that.

pjc50 · on July 22, 2016

The real failure is that return types must be declared by exceptions need not be and generally aren't. So an exception can come teleporting through any part of your code. You can no longer enforce single return style, because any expression can return.

This means that writing state-modifying code becomes unreasonably difficult, as you can no longer just write:

  // A and B must be kept in sync and updated together
  A = computeA()
  B = computeB()

.. because it's possible that computeB() throws an exception, leaving the two out of sync.

Yes, there are ways round this involving temporary variables. So you write:

  tmpA = computeA()
  tmpB = computeB()
  A = tmpA, B = tmpB;

and home that neither an optimiser nor an intern removes the tmp variables. But wait! This still doesn't work, because your evil coworker has defined operator= on B to throw an exception!

Personally I don't think that exceptions work very well in the presence of mutable state. Without mutable state, they're exactly equivalent to

barrkel · on July 22, 2016

This is an irreducible complexity if any part of your computation can fail. You're simply ignoring the error cases if you pretend otherwise.

Exceptions with mutable state needs to be coded transactionally, yes, and this is an excellent reason to try and avoid mutable state and use persistent data structures where possible.

pjc50 · on July 22, 2016

> You're simply ignoring the error cases if you pretend otherwise.

.. which makes using the STL surprisingly hard, because you can get an out of memory exception all over the place.

Chris_Newton · on July 22, 2016

In theory, yes, but if system-level errors like running out of memory or other essential resources are realistic possibilities in your particular environment then those risks are what they are. Exceptions don’t do anything fundamental to change that situation, they just (sometimes) provide a more explicit and/or systematic way of dealing with those errors if they occur.

In practice, I think this issue gets far more attention than it deserves, particularly within the C++ community. If you’re running in a particularly resource-constrained environment where running out is a relevant problem, you probably already take precautions about how you allocate those resources and adjust how you structure your program accordingly. You’re trying to avoid any possibility of those errors occurring, which means you don’t then have to worry about dealing with them unexpectedly all over your code.

For me, the more interesting discussing is around how we deal with errors that can be properly recognised and handled within normal execution. That means the interesting exceptions are those we throw ourselves and those deliberately thrown by other modules that we depend on, which we hope will be documented with each module’s interface. These things don’t just go off arbitrarily in just about any line of code we write, and they can be systematically structured, and we can write code that controls where mutations and other effects happen and where exceptions happen and how they interact.

konstmonst · on July 22, 2016

You are just using the worng pattern. Use Computation a; Computation b; use(a, b); Then if the computation aborts in b, then computation a will be cleaned up as intended.

pjc50 · on July 22, 2016

My second point is that with exceptions you can't guarantee the assignment of two objects, because the assignment operator can throw.

plorkyeran · on July 22, 2016

This was somewhat of a problem in C++03 because writing a copy assignment operator that can't fail is sometimes impossible, but I have never encountered a scenario where I was unable to make move assignment noexcept.

slezyr · on July 22, 2016

> Real exception safety is almost as hard to get right as concurrency, and for similar reasons

It's even harder to get right concurrency with exceptions :D

quotemstr · on July 22, 2016

> It's even harder to get right concurrency with exceptions

No it isn't.

adamnemecek · on July 22, 2016

I was referring to old school gotos, not like those in c. But yeah, we are in the same page.

pjmlp · on July 22, 2016

It is very hard to make libraries that rely on features that can be turned off.

The bad idea was making language features optional.

Library writers are forced to either support all possible combinations, don't use any of them or use them and see their library being rejected by those that won't turn on the features even at point gun.

One of the things that made me initially enjoy Java was that there weren't features to turn on or off.

adamnemecek · on July 22, 2016

You are correct but all those books on cpp exceptions are about a bit more than dealing with issues related to optionally disabling them :-).

pjmlp · on July 22, 2016

Agree, but that is because when compared with other languages exceptions in C++ were always bolted on.

They don't play well with manual memory management or the C semantics C++ was trying to be as compatible as possible.

They don't play well with:

- constructors/destructors

- manual resource management

- low level code like UNIX signals

- allowing every possible type to be used as exception

So all of that made C++ exceptions poor cousins of how exceptions are used in other languages.

I have used exceptions in quite a few languages and rather use them (the alternative being sum types), but do understand that how C++ evolved they are an unwelcome feature.

quotemstr · on July 22, 2016

It's only exceptions that allow constructors to work properly and that allow the language to support value type management. Without exceptions, you need two-phase initialization, and that's a nightmare.

pjmlp · on July 22, 2016

I hate two-phase initialization with passion, specially the way it was done in Symbian C++.

cyphar · on July 22, 2016

Except gotos are legitimately useful for cleanup inside a function before you return an error (see: the Linux kernel). Exceptions have overhead, and you have no idea where it will end up because the exception will bubble up the current call-stack, while gotos are local to the current function (and while longjmp is a mess, it is useful if you use clone(2)).

vvanders · on July 22, 2016

You should be using RAII for any resources that you need to automatically clean up, then any return statement will catch it. No need for exceptions.

cyphar · on July 22, 2016

RAII is not a feature of C. I was commenting on how gotos are not like exceptions, because gotos are actually useful and don't cause your program to make less sense as a result.

SamReidHughes · on July 22, 2016

C++ is memory-unsafe and has a special relationship with the feature. Projects that start without pervasive exception safety in mind are usually stuck that way.

protomyth · on July 22, 2016

I can see it in C++, but they seemed to work pretty well in Ada. A lot of things seem to work better in Ada to be truthful.

pif · on July 22, 2016

When used for exceptional things, exceptions are wonderful. But they were never meant to completely replace status-related exit codes. Expected errors (ex: log file does not exist, so create a new one) are not to be treated via exceptions; exceptional errors (ex: no more memory available, and I'm just trying to allocate a few bytes) are.

zvrba · on July 22, 2016

You're funny. gotos are used to emulate cleanup both of which you get in a more structured manner with RAII and exceptions. Rmember Apple's infamous "goto bug" in OpenSSL?

Peaker · on July 22, 2016

While RAII and exceptions are better than goto's, the "goto fail" bug is not an example of that. The problem there was mismatch between indentation and parse which was a result of brace-less style and whitespace-insensitivity.

adamnemecek · on July 22, 2016

I'm well aware. I was talking about old school gotos, I should have probably said that.

quotemstr · on July 22, 2016

C++ projects banning exceptions is a massive blunder. We should not disable important aspects of the language that provide useful semantic features to satisfy the anti-bloat superstitions of a few who remember bad C++ compilers of the 1990s. Nobody feels the need to create an exception-less variant of Java or Python, after all.

A ban on C++ exceptions is a huge red flag for me. A project has to be really special if I'm going to contribute to it after it's crippled the language.

gakada · on July 22, 2016

Exceptions in C++ are rotten.

    void read_file_by_lines(const char* path, void (*callback)(const char* line)) {
        FILE* file = fopen(path, "r");
        
        size_t length = 0;
        char* line = NULL;
        
        while (-1 != getline(&line, &length, file)) {
            callback(line);
        }
        
        free(line);
        fclose(file);
    }

This C function should be compatible with C++. If exceptions are off, it is. If exceptions are on then it's dangerous because if an exception is thrown through callback, then the file handle and line buffer will leak.

C++ exceptions are predicated on the assumption that all resources are managed using RAII. In the real world, this isn't true, especially considering that C doesn't even have RAII.

Python programmers (including me) love their exceptions, because Python is a dynamic language, and exposes system resources as reference counted objects.

pif · on July 22, 2016

Your example is wrong. This function is compatible with C++ as soon as you document it as not accepting exception-throwing callbacks.

gakada · on July 22, 2016

I'm sorry, I can't capture such a complicated issue with a pithy example. You will have to settle for my illustrative one.

The most general definition of the restriction is that C++ code cannot throw exceptions through exception unsafe code. How do you know if your C++ function was called from exception unsafe code? You can't!

So it's only ever safe to throw an exception when you are expecting it to be caught immediately i.e. you can use exceptions as glorified error codes, but not much else.

In Django I can throw a Http404 exception from deep inside a view and the server will come up with a 404 response. I can assert and the client will get a 500 error but the server won't go down. This is the real power of exceptions, and God help anybody who tries to do it in C++.

pjmlp · on July 22, 2016

That is exactly one of the pet peeves I have with C++ and although I like the language I am happy to earn my money using other languages.

In the enterprise context you can hardly count on proper documentation, specially in projects with high attrition of consultants where each one comes, implements something and departures to the next gig.

Not to count the commercial libraries distributed in binary only form.

So one never knows if the code is exception safe, don't work with exceptions or is exception happy.

wyldfire · on July 22, 2016

Your example strikes me as "exceptions in C++ are [yet another] high-discipline task to manage."

I think there are examples related to inter-library exceptions that might be much more subtle than this one, but I've never understood the subtlety.

coherentpony · on July 22, 2016

Is the fault the exceptions or is it that the person that wrote this C++ function didn't use objects to manage their resources?

pjmlp · on July 22, 2016

This is the real issue, so many C++ features are hampered due to the drop-in compatibility with C code.

perspectivep · on July 22, 2016

The solution is to just make RAII wrappers for that stuff.

bigcheesegs · on July 22, 2016

The llvm-project is a compiler. Its developers are well aware of the true cost of exceptions.

barrkel · on July 22, 2016

Actually, compilers typically don't use exceptions because they don't usually abort on the first instance of a problem. Compilation has historically been an lengthy batch job, and thus reporting many errors to be fixed between iterations improves user productivity.

strcat · on July 23, 2016

LLVM doesn't take shortcuts like that. It's designed as a set of reusable libraries for use in all kinds of tooling. In many cases it even has to continue going after detecting errors and recording a representation of them, so that it can still produce useful information. That's an important feature for lots of tooling.

favorited · on July 22, 2016

I think the point was that LLVM compiler infrastructure has a first-class C++ font-end. They know the ins-and-outs of language features because they are implementing standards-compliant compilers for said language.

barrkel · on July 22, 2016

Most self-hosted compilers are written in languages ill-suited to compiler writing. The choice of features used is often a compromise, rather than best practice for either good compiler design or good target language use.

C++ users in particular have strong concerns about things like RTTI and the overhead of exceptions not thrown. The cost of making unthrown exceptions low (near zero) is that throwing exceptions is much more expensive. This turns into a feedback loop, where C++ programs end up performing better if they never throw any exceptions because of the tradeoff.

This feeds into self-hosted compilers. Thing is, certain compiler design techniques can work well with cheap unwinding: constraint solving (more common in languages with better type systems than C++), forward-looking grammar assertions with speculative parsing, IDE-integrated code completion (implement the lexer to add the cursor position as a special token, and jump out of the parser when it's found, where all the context is available to pass along). But C++'s peculiar tradeoffs may make using things like exceptions less than ideal to use for this purpose, not because exceptions are the wrong tool, but because most C++ programs are not compilers and aren't tuned for it.

And OTOH, compilers written in low-level languages like C++ often twist themselves into knots to simulate things like tree pattern matching, multiple dispatch, destructuring binds, etc. Sometimes they even give up and write code generating tools.

pjmlp · on July 22, 2016

I agree with you, however Bjarne wanted to sneak C++ into C code, given his experience being forced to use C (there are a few interviews where he mentions this).

So by having exceptions, with code that only knows about C semantics but was compiled with C++ mode, you open the door to all sorts of nasty surprises.

So exceptions can cause resource leaks and other issues like being thrown in the middle of a UNIX signal.

Which leads to the situation that sadly many library writers avoid them.

I still enjoy using C++, but the best place for it is as infrastructure language for the bottom layer when performance requires it, then using something else on the other layers.

hacker_9 · on July 22, 2016

This is actually really interesting to me. I use C# for side projects and often write Debug.Assert everywhere to check things, but can't quite bring myself to throw exceptions for full blown errors. I just find they add a lot of bloat, and don't really help the problem, they are just a lazy catch all. But I have to say returning Error<TResult> is a really elegant way of dealing with the problem, I think I will have to start doing it this way.

barrkel · on July 22, 2016

Returning an error type, and, in the caller, propagating that return type further up, is isomorphic to exceptions, except it's a lot more busywork, a lot more code bloat, and a lot easier to get wrong.

yoklov · on July 22, 2016

It's also encoded in the return type of the function, so you get static type checking -- a property not present for exceptions.

barrkel · on July 22, 2016

You say that like it's desirable. I think Java's experience of checked exceptions shows that it's of very dubious benefit.

perspectivep · on July 22, 2016

And you have to remember to log the error at every level of the stack, otherwise you're left with an error code that has no context.

jokoon · on July 22, 2016

I wonder how hard it is for a coder with not a lot of knowledge about parsers and LR/LL stuff to quickly define a language and translate it in LLIR.

That could be a fun thing to have a language competition that way, to see which language "feels" good: easy to read, expressive, etc.

davorb · on July 22, 2016

Assuming that you are somewhat familiar with assembly programming and know how to produce a syntax tree, getting it to output compilable LLVM IR should take you at most a couple of days.

vbit · on July 22, 2016

Do LLVM have any support for coroutines?

If I wanted to implement a language with Lua-style coroutines, could I target LLVM?

Others · on July 22, 2016

Unfortunately, I don't think it does currently. I think you'd have to reimplement them yourself on top of LLVM... That might change though, this co-routine proposal just came up on the mailing list: https://github.com/GorNishanov/llvm/blob/coro-rfc/docs/Corou...