Let’s code a TCP/IP stack, 1: Ethernet and ARP (2016)

cihangir · 2025-03-04T10:52:02 1741085522

Years ago, I attempted to build a user-space network stack in C [0] that processes raw packets through the TUN interface and got it working to a certain point. It currently includes a simple shell that allows configuring IP addresses, routes, and such. A hybrid structure reminiscent of both mbuf and sk_buf is used to hold the network packets. However, after completing the UDP implementation I didn't find the time or motivation to implement TCP. If you want to check it out, here's the link:

[0] https://github.com/cakturk/unet

VWWHFSfQ · 2025-03-04T13:48:45 1741096125

Many years ago, I wrote a pcap/tcpdump parser in pure bash, because it's all I knew how to write "programs" with. It was, of course, the slowest and most brittle thing of all time, but it did actually work. And was kinda fun. Wish I still had that code somewhere.

jpfr · 2025-03-06T20:28:04 1741292884

Many embedded devices run the lwip implementation of TCP/IP.

The "POSIX port" of lwip does the same. It takes the raw Ethernet bytes from a TUN/TAP device.

https://github.com/lwip-tcpip/lwip/blob/master/contrib/ports...

zoobab · 2025-03-04T09:49:52 1741081792

If you compile a minimal linux kernel without a tcp/ip stack -> 400KB. If you add a tcp/ip stack -> 800KB.

For a project where I should just send the temperature, I just made a small C program in userspace that sent the value over a crafted UDP message, saved a lot of space (and complexity) :-).

chamomeal · 2025-03-04T14:33:43 1741098823

Wow that’s crazy!

As someone who knows nothing about anything: that doesn’t mean the tcp/ip stuff is half the source code of the whole kernel, does it?

tga_d · 2025-03-04T19:12:09 1741115529

The majority of the Linux kernel's source code is device drivers. The overwhelming majority of that is not included in the kernel image by default, but instead made available as kernel modules you can enable as needed. E.g., your thermostat probably doesn't need support for an obscure game controller, so doesn't have those drivers, but it could if you were so inclined.

miohtama · 2025-03-04T14:04:19 1741097059

Curiosly, why is the IP stack so large? 400kbytes of binary is a lot of code. Is it highly optimised for large server use case?

hylaride · 2025-03-04T14:37:38 1741099058

Modern TCP/IP stacks have a lot of extra code, including for anti-spoofing, performance enhancements (eg zero-copy integration with hardware network cards), various attack prevention measures (SYN floods, randomization of sequence numbers, etc) support for various hardware offloading (including many network cards that will do checksum offloading, etc), IPv6 (that also originally mandated IPSec integration), support for lower layer 2 protocols (mostly just ARP for Ethernet, but there are still others around).

kbouck · 2025-03-04T08:15:26 1741076126

If you disable ARP, you can have a group of servers on the same network configured with the same IP! and if a server acting as a routing frontend can forward packets to a backend server's network interface by mac address (need a kernel extension for this trickery), that backend server will recognize itself as the destination, swap the source/dest IP and respond directly back to the client (without going back through the routing frontend)

Alternatively, you can accomplish the same without disabling ARP and by just adding the common IP address as an alias to the loopback interface, which allows the backend to recognize itself as the destination, but avoids ARP conflicts.

This was a trick used by IBM's WebSphere software load balancer back in the 90's-00's

citrin_ru · 2025-03-04T08:45:28 1741077928

> This was a trick used by IBM's WebSphere software load balancer back in the 90's-00's

Cicso IOS SLB can work in a similar way - a virtual IP added as an alias to loopback on each server in a farm. An advantage over more widely used L3 balancing that there is need to rewrite headers in IP packets.

Bluecobra · 2025-03-04T13:23:54 1741094634

>If you disable ARP, you can have a group of servers on the same network configured with the same IP!

The downside to this is that a switch/bridge will not learn the MAC address and continue to flood/broadcast these packets to every port in that segment. So if you do decide to do this make sure you make a dedicated VLAN. :)

10000truths · 2025-03-04T14:29:46 1741098586

ARP is for the LAN devices. L2 switches don't rely on ARP to build up their forwarding tables, they can just inspect the source MAC of every Ethernet frame they receive, and correlate it with the port they receive it on. Frames with unknown destination MACs are broadcast, but that stops as soon as every device in the LAN has sent at least one frame.

lmz · 2025-03-09T02:45:21 1741488321

Also known as DSR (Direct Server Return) https://www.haproxy.com/blog/layer-4-load-balancing-direct-s...

mannyv · 2025-03-04T19:48:03 1741117683

F5s have an arp proxy setting so you don't have to do this. The downside is it tends to break dhcp.

KeplerBoy · 2025-03-04T08:20:10 1741076410

For such low level shenanigans one can also fiddle around with dpdk. ARP disabled by default.

globular-toast · 2025-03-04T06:58:34 1741071514

I did a similar thing in Python[0]. Probably not as well written and, to be honest, I just made up the address resolution algorithm. I got as far as pinging an internet host with ICMP. I like that mine is completely contained in a (short) notebook, though (the OP article misses many details that are in the larger source code that is referenced).

I hadn't seen this article and did mine all from Wikipedia! There is a huge jump in complexity for TCP, though, and I lost interest a bit. Part 3 of this covers that so maybe one day I'll read that and finish mine.

I found it very rewarding and it's definitely something that is doable by any level of programmer if you're interested in networking.

[0] https://github.com/georgek/notebooks/blob/master/internet.ip...

intrasight · 2025-03-04T14:06:44 1741097204

Years ago I instrumented a nuclear power plant. I did the client-side development on Sun workstations. I actually got hired because of my TCP/IP experience - which I got from taking "Operating Systems" at CMU. The plant computer on the other hand was a mini computer that had no TCP/IP stack and so that team had to create one.

kasajian · 2025-03-04T15:06:51 1741100811

One minute into it, the article says, "The dmac and smac are pretty self-explanatory fields"

This immediately turns off anyone reading it who doesn't know what those things mean. The thought process will be, "Oh, this article is for those for whom these fields are self-explanatory. Since it's not for me, I'll stop reading"

howerj · 2025-03-04T15:24:02 1741101842

The full quote would be "The dmac and smac are pretty self-explanatory fields. They contain the MAC addresses of the communicating parties (destination and source, respectively).", it does explain them. However, this is an article about how to make a network stack, it is safe to assume the reader should know something about networking before hand.

petee · 2025-03-04T15:23:47 1741101827

Unless they just updated it, the next sentence explained it -

> They contain the MAC addresses of the communicating parties (destination and source, respectively).

dang · 2025-03-04T05:48:52 1741067332

Let's code a TCP/IP stack, 1: Ethernet & ARP (2016) - https://news.ycombinator.com/item?id=17316487 - June 2018 (47 comments)

Let's Code a TCP/IP Stack: TCP Retransmission - https://news.ycombinator.com/item?id=14701199 - July 2017 (30 comments)

Let's code a TCP/IP stack, 1: Ethernet and ARP - https://news.ycombinator.com/item?id=11234229 - March 2016 (49 comments)

p4bl0 · 2025-03-04T07:06:15 1741071975

I don't get where the author get the 10.0.0.4 IP address from, the one used to test ARP resolution. What is it supposed to be the address of? A fake device accessible to the made up Ethernet device programed here? Or is it an actual device on the author network? Can someone explain that?

globular-toast · 2025-03-04T07:28:06 1741073286

It isn't mentioned in the article, but the author hardcodes this when initialising an interface: https://github.com/saminiir/level-ip/blob/e9ceb08f01a5499b85...

A TAP device is like a software emulated ethernet link (or any layer2?). So if you send packets into it they get sent directly to your user-level program. It's then up to you program to decide what IP address(es) it wants to have and reply to ARPs etc. Normally this kind of thing is handled by the OS and adding IP addresses to the interface requires root permissions (as does opening the TAP device). Networking is largely cooperative and a bad actor with root permissions on your network can do bad things.

p4bl0 · 2025-03-04T07:33:57 1741073637

Ah, thanks a lot!

Forgetting to mention that explicitly in the article is a big miss, I think. It makes the ARP part feel like it's missing crucial information or is not actually entirely explained, while it's the previous part that misses something.

Thanks again :).

mannyv · 2025-03-04T19:50:55 1741117855

From what I remember ARP only works on your local segment. Your router will fill in its address and forward the packet along.

There's also rarp, which is one way to ask 'the network' for your IP address. I have no idea if rarp still works irl.

revskill · 2025-03-04T05:18:11 1741065491

I appreciate the non assumption explanation in the article. Well done.