Hacker News new | past | comments | ask | show | jobs | submit login
An introduction to libuv (2016) (nikhilm.github.io)
125 points by eatonphil on Aug 16, 2018 | hide | past | favorite | 38 comments



We merged this guide into http://docs.libuv.org/en/v1.x/guide.html which got some updates to match API changes.


I work with Nikhil. He's incredibly smart and a fantastic programmer. Didn't know he wrote a book on libuv though!


> This book and the code is based on libuv version v1.3.0.

libuv v1.30 was release at on Jan 28, 2015. It is at v1.22.0.


Original author here. Yes this book was written a long time ago when I was into the node ecosystem.

A lot of the book was copied into the official docs later https://github.com/libuv/libuv/pull/1246.

The book definitely does not capture new things. I do not intend to update it, but I still look at pull requests.


I merged it into the official docs, thanks a lot for all your work nsm! <3


libuv supports child processes via fork, such as in uv_spawn. It also uses multiple threads to support uv_fs_stat, etc.

How does it handle the well known incompatibility between fork and threads?


Which incompatibility do you mean? If you're forking to create a child process, usually you're just careful not to do anything that might acquire a lock (including allocating memory) in between the fork and the exec, and then I think you're good?

Forking for something else, though, god help you.


The tricky part of this approach is error handling. What if a system call fails in between fork and exec, or exec itself?

It looks like libuv creates a pipe for every new process, and uses that to send errno back on any failing system call. This has the disadvantage that you only get the error code, and none of the context. You get ENOENT, but you don't know which path was invalid.


They could just write the entire error string to an fd, right? It's annoying in C to write a formatted string without doing dynamic allocations, but it's possible.

(Or, if you want to be weird, do a PTRACE_TRACEME after fork and have the parent trace the child process and only detach when it sees a successful return from execve. If ptrace is unavailable, fall back to less-useful errors)


Piping back the error string is probably the best one can do.


Do you need to know ? If you're fork/exec and get ENOENT, there's only one path that could relate to, the path to the program you wanted to run.


Or the path to the interpreter (e.g. if you're trying to run a x86-32 binary on an x86-64 system without 32-bit libc, or if you're trying to run a binary linked against glibc on a system with musl).


Fair enough - but there's still no more info you could learn from the ENOENT error.


Is libuv used anywhere else outside of node?


There's a very good OCaml binding: https://github.com/fdopen/uwt . It works really well in the networking component of the Docker for Mac and Windows desktop apps (see https://github.com/moby/vpnkit)



There's an event loop implementation for Python based on libuv: https://github.com/MagicStack/uvloop

It's compatible with the native asyncio event loop and can be used as a drop-in replacement.


neovim uses libuv.


I'm pretty sure aspnetcore uses it.


The default for Kestrel has been moved off of libuv, but you can choose to use libuv if you'd like.

https://blogs.msdn.microsoft.com/webdev/2018/04/12/asp-net-c...


The Julia programming language uses it.


Why is the hello world example [0] unnecessarily allocating and freeing the uv_loop_t? When I see this kind of crap right out of the gate I immediately begin suspecting this is probably a pile of awful code written by a novice C programmer then documented and published as if it's the best thing since sliced bread.

Much of the value in making a struct (and hence its size) public and supplying a pointer to the initializer is the potential to embed or avoid heap allocation of the thing altogether, and here in an example which would benefit from both the simplified code in addition to demonstrating the advantage, it's completely missed.

Fixed form:

  int main() {
    uv_loop_t loop;
    
    uv_loop_init(&loop);

    printf("Now quitting.\n");
    uv_run(&loop, UV_RUN_DEFAULT);

    uv_loop_close(&loop);

    return 0;
  }
[0] https://nikhilm.github.io/uvbook/basics.html#hello-world


Generally, you only create one uv_loop_t, and it is vital it is never copied (which is easy to do accisentally if you stack allocate it).

mallocing is the entirely sensible thing to do here.


I just assumed it was because if the memory location ever changes it would invalidate all the pointers to it. It has to have a fixed memory location


What do you mean by accidentally copy?

struct embbed { ... uv_loop_t loop; ... } e1, e2; ... e1 = e2; ?


Presumably they're referring to copy via simple assignment vs. requiring a memcpy or pointer dereference. It's a b.s. argument.


totally. Your comment was right.


Then uv_loop_t should be an opaque type so its size isn't known at all, and uv_loop_init() should instead be uv_loop_new() returning the heap-allocated uv_loop_t, and uv_loop_close() renamed to uv_loop_free().

As-is the API and the way its use is being demonstrated in that example appear amateur to say the least.


But then you wouldn't be able to stack allocate it. It's reasonable to make something recommended, but not forbid other approaches if someone really wants to do it that way. You don't have to go to either extreme.


Funny how most structs in the linux kernel are public, they must be all amateurs, too. Perhaps go read about container_of and struct embedding in C before calling anyone an amateur.

It's just a different programming approach in C.

Encapsulation is good only for making it harder for anyone to poke in the internals of your library, which has some benefits with binary compatibility, etc.

But it's strictly worse in all other metrics when coding in C. It limits you, it forces malloc calls where most of the time none would be necessary, it makes memory management more complicated, it forces explict initialization, etc.


What exactly is funny here? The linux kernel is not a userspace library with a public API - and in the case of libuv, judging from `apt-cache search libuv` output, one which intends to be dynamically linked as well. By exposing this performance-insensitive struct publicly, they're also unnecessarily increasing their ABI surface area. It just reeks of novice library authorship.

What makes sense for a monolithic kernel's code is quite different from what makes sense in a userspace library, particularly on its public API boundaries.

It's also worth noting that the Linux kernel developers have historically made choices which are actively hostile to kernel module API/ABI stability as a way to discourage third parties from distributing out-of-tree proprietary binary modules. There is literally a policy of not having a stable interface for module writers, it's the anti-library except for the system call interface.


If one already accepts that the object should be dynamically allocated because otherwise it's too easy to accidentally copy (which seems specious, but that was the suggestion), then all of those arguments apply to that recommendation, too. And if it's "vital" that the object not be copied, then the API could enforce that with an opaque type.

I've only worked with libuv in a few contexts, and maybe there are compelling reasons for exposing the struct. (I'd like to hear them! I would not have expected a uv_loop to be allocated in perf-critical code paths or code paths where failure isn't an option.) But I think critical analysis of C API design is an important topic. C gets a bad rap for being unsafe, which it obviously is in many ways, but as C developers, simple choices like this (that a novice might not even think much about) can make an API much safer -- or much less safe.


Performance is orthogonal to this. To me the less you juggle with pointers and malloc/free the safer your code will be from memory leaks, misuse of unallocated memory, NULL checking issues, and easier to inspect/reason about.

Struct embedding helps with this quite a bit. And it's not possible without exposing the struct definition.

Performance gains are possible, but that's secondary.

> If one already accepts that the object should be dynamically allocated

It can still be allocated on heap, but in one continuous chunk of memory as a part of the larger struct. Hiding the struct definition would prevent this.


> Performance is orthogonal to this.

That's largely true, but it's often cited as a reason to avoiding malloc/free (however dubious that is).

> To me the less you juggle with pointers and malloc/free the safer your code will be from memory leaks, misuse of unallocated memory, NULL checking issues, and easier to inspect/reason about.

I don't quite agree. I've worked mostly in code bases using the pattern described earlier (an opaque pointer, a $type_create(), and a $type_destroy() function). With that pattern, I find it much easier to be certain by code inspection that a particular object or transformation is valid because as long as the pointer was allocated correctly, the object can only be modified by functions that know the type (aside from memory corruption, but that's always possible). That's usually a small set of functions that know the struct details. This fact is useful both as a library author and as the author of a library consumer. By contrast, if the struct is exposed, it's harder to identify all the places that can modify the structure's details and to be sure that invariants are maintained in all those places.

Besides that, several other failure modes are much less likely with opaque structures, including copying a structure you shouldn't, miscopying a structure that's okay to copy (e.g., off-by-one while copying), or operating on a correctly-sized block of memory that's never been initialized. You can still use unallocated memory, of course, but that's fairly easy to make safer by initializing pointers to NULL. (The analogous option for stack-initialized structs -- initializing them to zero -- is not necessarily any safer than leaving them uninitialized -- particularly if the struct contains file descriptors.)

There are tradeoffs to both approaches. To me, the ability to modify the struct in new versions of the library without breaking the ABI is a pretty major point in favor of using an opaque structure for a library's primary handle. For very simple ancillary structures that are very unlikely to change, and where the convenience of stack allocation is worthwhile, exposing the structure makes a lot of sense.


The libuv API is not amateur, is all about flexibility. The example usage however is not optimal, it is just confused.

The option to dynamically alloc uv_loop_t is not to avoid "accidental copies" or anysuch thing. Is there for when you need to dynamically create loop contexts.

Linux does exactly the same thing. But you wont see unecesary kmallocs of structs when static storage is enough. In such cases you will see the *_init functions used.


I wouldn't call a library that is the foundation of one of the most popular platforms in the world amateur.


Ever looked at the PHP implementation throughout its popular existence? Popularity is an awful metric for gauging implementation quality.


Agreed, however PHP was never praised for anything other than its ecosystem. Node is generally praised for its async IO, which is built on top of libuv.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: