Hacker News new | past | comments | ask | show | jobs | submit login
A Beginner's Guide to eBPF (github.com/lizrice)
183 points by mooreds on May 7, 2023 | hide | past | favorite | 72 comments



This may prove useful:

> eBPF (often aliased BPF)[2][5] is a technology that can run sandboxed programs in a privileged context such as the operating system kernel.[6] It is used to safely and efficiently extend the capabilities of the kernel at runtime without requiring to change kernel source code or load kernel modules.[7] Safety is provided through an in-kernel verifier which performs static code analysis and rejects programs which crash, hang or otherwise interfere with the kernel negatively.[8][9] Examples of programs that are automatically rejected are programs without strong exit guarantees (i.e. for/while loops without exit conditions) and programs dereferencing pointers without safety-checks.[10] Loaded programs which passed the verifier are either interpreted or in-kernel JIT compiled for native execution performance. The execution model is event-driven and with few exceptions run-to-completion,[2] meaning, programs can be attached to various hook points in the operating system kernel and are run upon triggering of an event. eBPF use cases include (but are not limited to) networking such as XDP, tracing and security subsystems.[6] Given eBPF's efficiency and flexibility opened up new possibilities to solve production issues, Brendan Gregg famously coined eBPF as "superpowers for Linux".[11] Linus Torvalds expressed that "BPF has actually been really useful, and the real power of it is how it allows people to do specialized code that isn't enabled until asked for".[12] Due to its success in Linux, the eBPF runtime has been ported to other operating systems such as Windows.[4]

https://en.wikipedia.org/wiki/EBPF


A paragraph like this one should be the first thing in the Readme.


Haven't sandboxed programs in a privileged context been the root cause of me seeing BSOD so often in the late 90ties?


AFAIK the cause of BSOD were, among others, the lack of sandboxing.


This is a different kind of sandbox. BPF functions by drastically limiting the shape of the control flow graph that will be accepted for a program; most working C functions will not be accepted in a BPF program. The simplest example (complexified by more recent BPF verifier work that relaxes this... somewhat) is that BPF programs can't have loops, at least in the sense that a normal program can.

You can crash a kernel with a BPF program. But it's overwhelmingly likely that the crash will arise from buggy pre-existing kernel code that just hadn't been seriously exercised before eBPF gave people new tools to push that code with. What's much, much less likely to happen is a segfault or NPE in your own BPF code.


I'm only aware of drivers, which were not sandboxed due to the nature of drivers (at least in late 90ties there was pobably not much of abstraction on that level, kernel features, hardware features available?)


There was -- Minix was developed in the 90s and ran drivers in user space. But it was (perceived to be?) slow, so mainstream OSes did not do it.


I’m new to Linux kernel programming & eBPF (just started last week) and I’m having major troubles with eBPF verifier. I honestly feel like it would be easier for me to write a kernel module than eBPF code.

I do wonder if this is the case for many people. It seems verifier is a bit unpredictable and makes eBPF programming quite painful.


Can confirm, it is quite painful.

The bpftools maintainers tell you to learn the bytecode format when you ask them what the errors mean, because they expect you to understand what the verifier means when it tells you "unknown scalar" on every single goddamn line of code.

Something like "ebpf coding rules, what to use and what not" would be very helpful.

All ebpf examples that are older than say, 3 months, already don't work anymore. Not even the official ones from the XDP tutorial project (and the libxdp maintainers because the kinda are splitting off a lot of headers into a separate xdp library as it seems).

Most userspace code still relies on the 5 years old bpf-helpers.h, which meanwhile is not supported anymore because it doesn't use the __helper methods from the kernel (they also refactored the kernel in the meantime, and force you to use e.g. __u128 instead of native data types).

Oh boi, did I underestimate what "bytecode vm" means when kernel developers talk about it.

Also, always use llvm, and remember to build two bpf files for each endianness, and use -g for debug symbols. Otherwise you will try to find out what the bpftool errors mean for days, because of shitty mailing list answers.


I mean, the flip side of this is: do you really expect people on a mailing list to debug your custom XDP code for you? You get what you pay for, and you can pay people to help you with this stuff, or not.


It would be much easier to write a kernel module than an eBPF program. But the eBPF program is unlikely to panic your machine, and the kernel module is almost certain to.


Also In cloud environments, I would much rather trust an eBPF program, than a kernel module.


Why in cloud environments?


You don’t want kernel panic affecting other users


Why would your cloud instance panicking affect other users of the cloud provider?

Or do you mean something else?


Anecdotically, I had the case on VMWare 4 (that was in 2012 or 2013) that a Solaris 11 VM managed to reboot the entire ESX it was hosted on. Very weird bug where ESX passed through some interrupt or something.

But in this case I think they mean on the same machine. "In production" would be more accurate than "in a cloud environment". And yeah I wouldn't load custom kernel modules in production just to do observability.


Cloud goes beyond rented VMs. Fully managed cloud services have thousands or millions of production customers on the same node. They have to be very careful about what they run as root.


I understand your point, but millions sounds an exaggeration- I have a hard time believing a single node can handle millions of concurrent users


I didn't mean to imply concurrent. A large fraction of the user base is very sporadic in its usage!


Thanks for clarifying :)


Reading more about that… How does the verifier detect infinite loops anyway? Halting problem and all. It must use some rather crude heuristics, no?


It doesn't. Flip things around and you get something tractable, though incomplete.

Unsolvable problem: reject any loop that is provably infinite.

Solvable problem: reject any loop that isn't provably finite.

The trade off is that there will always be some loops that in fact always do terminate, but that the verifier can't prove do.


You need to slightly modify your code. Rather than:

while (condition) { … }

Do: #define MAX 1000 for n = 0; n < MAX; n++ { if !condition break; … }

Unroll all loops, don’t allow any backward jumps and limit to (say) 1m instructions.


Incidentally, iteration limits are a good idea for production code anyway. If you don't imagine any input needing more than 50 k iterations, throw a user-friendly exception after something like 10 M iterations. Prevents much more annoying problems than it causes.


> If you don't imagine any input needing more than 50 k iterations

What could possibly go wrong.


You get an error is the worst that happens.

Way better than running a denial of service attack on your own systems or those of your customer's.


> You get an error is the worst that happens.

That certainly depends on what eBPF is used for. If your load balancer errors out at [greatest number of connections envisioned] and an adversary manages to establish [greatest number of connections envisioned] then the result is a denial of service.

Not every operator is confident in making code changes in 3rd party software or might even be allowed to make such changes. Increasing resources o.t.o.h., e.g. adding RAM, is rarely banned. I sure would want a system to make best use of available resources.


I still think a denial of service due to tripping some sort of circuit breaker is preferable to one due to resource exhaustion.

If the code is intended to use as a library or the binary distributed to third parties one will have to handle it differently. For libraries taking a parameter indicating the maximum expected is common, for example. See e.g. man 3 read.


Nice. Be good if the language had an easy way to handle it, eg.:

while(condition)[1000 label]{…}


Ever since Brendan Gregg started using eBPF for observability back in 2015 I've had the sense that eBPF is an extremely underrated tool of the future. I really would like to learn it, but beyond some improvised bpftrace scripting and the tools that come with bcc, I've not really had the need.

What custom usage do you have for it?


I’ve used it to write a layer 4 load balancer to replace old hardware appliances. Similar to Facebook’s Katran, but switches traffic at layer 2 (rather than L3) for compatibility and in Go rather than C++,’cos I’m not a great coder ;-)


The entire first page begs the question -- "what is eBPF?"


It's actually answered (indirectly) on that page:

"My report "What is eBPF?" and in-depth book "Learning eBPF" are both available for download [0] from Isovalent or with your subscription to O'Reilly's learning platform. You can buy "Learning eBPF" from any good bookstore (support your local bookshop by ordering it there!)"

IIUC, you need to give contact info on that page to get the PDF. So this [1] might be a better starting point.

[0] https://isovalent.com/ebpf/

[1] https://ebpf.io/


After looking at the link first, this comment cracked me up. I skimmed most of the GH content without figuring out the answer; I wasn't going to click an O'Reilly link. So I gave up. Then my OCD kicked in and I got annoyed with myself for giving up, since if I'm going to waste time on HN I should really get to the bottom of things. So searched it. I found this:

https://ebpf.io/what-is-ebpf/

It runs sandboxed kernel extensions? Or it's a VM? Something like that? The writing of the page itself doesn't inspire a lot of confidence, but then maybe it's just over my head.


> It runs sandboxed kernel extensions? Or it's a VM?

Yes to both of those.

The kernel has a bunch of extension points that can run eBPF code in a VM. That code can make decisions for the kernel and/or track events.

eBPF code can do basically any calculation you want, but it can't have infinite loops.

It's loaded as bytecode, with a spec for how it's formatted and what the instructions do and what data structures are built in to the VM.

The main benefit is that it runs in the kernel, so it can be triggered very very often with minimal performance impact.


Is a joke going over my head here, because the first sentence links to a document titled "what is eBPF?"


The joke is you have to give them all your information in a form to then download some B2B report... rather than add two sentences about what ePBF is


Seriously. I really don't know why people think that's acceptable. Who sees a website titled "A beginner's guide to X" that makes people hand over their data to even get "X" defined, and thinks "This looks good to me! Sign me up!"


This is a Github repository accompanying a book. Neither of those things were submitted here by their author. A real problem HN has sometimes is treating the entire Internet as if it was was written for HN specifically.


You can just type "What is eBPF" into Google, and the first hit will decisively answer this question for you.


You can work around pretty much any issue like this, but you shouldn't have to. A beginner's guide should define the concept.


I find this same issue with many things posted here. Even businesses marketing their product. My favourite is when they give you an abstraction salad instead of explaining something.

I clicked the first link so I got the gist but how many other people just give up and disengage with their post? Or click the link then just close the tab without reading further.


Although the fact that Google puts the Wikipedia BPF article in the sidebar might confuse you a little.



Just read the book, I like the way she did it, it starts out easy with what eBPF is, how it works, using short examples for both the low-level under the hood stuff, but also how to actually use it. It doesn't deviate much from this pattern, which is a good thing for such a relatively short book, and she gets to cover a lot of ground that way. It is still a "beginners" eBPF book, but sets the stage for further development through the references or alternative books.

I can highly recommend it if you're eBPF curious =D I guess my only gribe is that the latter parts of the book, is a bit product heavy, but done in the most tasteful manner it could probably have been done, i.e. using it as examples on how to use eBPF.


In case anyone is interested in writing eBPF programs in Rust https://github.com/vishpat/oxidize-ebpf


I don’t have any experience with this project, but Aya seems more vibrant, https://aya-rs.dev/


Aya is one of the most under-rated eBPF projects.


If you’re looking for a slightly higher level look, Unzip does it well: http://unzip.dev/0x00c-ebpf/


Thanks for the shout-out!<3


As I understand it, eBPF is primarily an observation tool and thus is quite limited in the modifications it can make to kernel memory. Does it have any generic way to make arbitrary modifications to kernel memory? Obviously this would invalidate any verification guarantees, but I would expect this to be very minor modifications in practice. For example, if I wanted to hook a page fault handler to change the behaviour without having to pay the cost of signal handlers.


What you can do is quite limited. You're restricted to some preset eBPF program types, and each program type has a restricted set of operations it can perform (eBPF helper methods). So arbitrary modifications, absolutely not without adding a helper method and/or program type for this purpose. More program types and helper methods are being added all the time but overall it's pretty limited in use cases and operations.

If you want full control then kernel module is the way to go, but this doesn't have the same security and stability guarantees.


Right, that's what I thought. So at a high level helper methods are equivalent in some sense to 'unsafe' code in rust and require manual validation for security (i.e. the verifier ignores them other than to check they are in some helper method whitelist for the program type)?


eBPF has pretty significant latitude with network traffic, but for everything else the idiom is to use it to do fast kernel telemetry to a userland process that does the actual acting.


The GitHub page linked here mentions the word “eBPF” 18 times and yet not once does it expand on what this abbreviation means.


while originally it's "extended Berkeley Packet Filter", at this point it does so much more than packet filtering that spelling out the words probably increases confusion rather than clarifying anything

it's like a reverse backronym


It's extremely berkeley packet fun.


If you are into podcasts, she was on "Screaming in the Cloud" recently, and talks about what eBPF enables, how to teach technologies and more.

https://www.lastweekinaws.com/podcast/screaming-in-the-cloud...


Her implementation of a rudimentary load balancer (in C) demonstrates eBPF in practice:

https://www.youtube.com/watch?v=L3_AOFSNKK8


compared to making a module, apart the fact that the module can end in kernel panic (thats my problem), is there any significance performance difference between BPF and kernel modules ?

If my goal is to reduce the cpu cloud bill, is BPF good enough compared to making a kernel module ?


The parody here is so perfect, they even created the technology they don't bother to define.


How so?


This is not eBNF. That is something completely different.


Looks like a "buy my book" ad to me...


Hey. Please be more careful. The author of this repository isn't the story submitter. People write things that end up on Hacker News all the time with no idea that it's even happened, let alone any intention to promote things to you.


Yay!


Something’s off here. I’m reasonably well read and literate on computer topics, I’ve worked in cyber security for over 5 years now, and extremely open-minded to new ideas — this reads at best like derivative marketing jargon and little in the way of technical.


The game changing idea behind eBPF is XDP in my opinion.

Lots of network drivers and NICs support offloading XDP programs ("xdp_prog") to the network controller's chipset, which results in zero CPU i/o interrupts if you e.g. use an XDP_DROP to block traffic.

Being able to block network traffic _before_ it reaches even kernelspace is a game changer.


eBPF is an extremely big deal in computer and network security, so, I assure you, this isn't "derivative marketing jargon", and it is very technical.


Well that is wild and kinda cool. I don’t often find things that are so foreign to me they appear fake! I’ll have to poke around a little more, thanks for the correction.


I've found kernel documentation under Documentation/bpf to be the best resource available. Clear, concise and no marketing-speak

As for this repository - the README triggered a false alarm in my bullshit sensors, but the code example is pretty nice.


welcome to cloud engineering.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: