Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: How did your computer reach my server? (how-did-i-get-here.net)
659 points by archmaster on Dec 5, 2023 | hide | past | favorite | 137 comments



Hi! I'm Lexi. I'm 17, and one of the things I'm interested in right now is gaining a deeper understanding of how computers work and showing that in new ways.

A few months ago I published https://cpu.land (discussion: https://news.ycombinator.com/item?id=37062422). After cpu.land, I felt a lot of pressure to make another Big Giant Thing but didn't really have anything compelling. So I just hacked away on personal projects and, through some coincidental learning on how the Internet works, ended up hacking together a traceroute program that could live stream to a website from scratch!

I realized I had never seen this sort of thing on the web before, and it was actually a kind of cool and novel way of visualizing the structure of the Internet, so I polished it up and built a pretty site around it. In the process, I learned some really interesting things about how BGP and the structure of The Internet, so I melted the traceroute tool with an article sharing that knowledge.

I'm still hacking on this and I'm sure my code will manage to break somehow, so please let me know if you have any suggestions! :)

(Side note: why Rust? I don’t think programming language choice matters that much, but I wanted to quickly write a very dependable low-level program, and I really like Rust’s error handling primitives. Why do you care about this?)


> I realized I had never seen this sort of thing on the web before.

Search for: "looking glass bgp" and you'll find some[1]. One of the first CGI programs I wrote nearly three decades ago (ugh...) was a Perl script that wrapped traceroute and streamed the results via server push[2]. Everything old is new again. :-)

That said, your site has a very nice presentation.

BTW, ipv4 TTL is dejure seconds even though it's defacto hop count since no router takes more than a second and the minimum decrement is 1 (except middleboxes which wish to remain hidden won't decrement at all). Also, Linux/Unix traceroute by default use UDP to a high numbered (and usually closed) port for probe packets instead since UDP historically is less likely to be dropped/filtered than ICMP.

Aside: asking how traceroute works is one of my interview questions, most people don't know (if they do the question is no good) and many are unable to figure it out from first principles no matter how many questions I answer about TCP/IP. I still think being able to figure it out is a reasonable problem solving question.

1. e.g. https://www.bgplookingglass.com/

2. https://www.oreilly.com/openbook/cgi/ch06_06.html


> Aside: asking how traceroute works is one of my interview questions, most people don't know...

Curious what type of roles you interview for, are they networking-centric? Iirc this is CCNA-level material, I'd expect anyone working in networking to be able to describe how traceroute works. I've used it more as a smoke test question than a question that most people don't know.


Describing how it works and using it to troubleshoot real problems and drawing the correct conclusions are very different things. If you believe it is simple then you have a shallow understanding of traceroute, which is why it's a good interview question as you can go really deep on your answer.

Even seasoned network engineers get it wrong.

https://youtu.be/WL0ZTcfSvB4?si=dtihJG1pGJwiyj1u


Your linked 1+ hour video includes things like IATA airport codes for geolocation and such. None of that is relevant to the fact that all traceroute implementations make use of very basic networking fundamentals: in particular, by sending out a series of packets with artificially limited TTL, to [hopefully] receive ICMP type 11 code 0 "TTL expired" packets from the hops along the path to your target.

Traceroute is simple. Sure, there's lots you can do to enrich the data you receive (e.g, reverse DNS and geolocation), or sending multiple sequences to identify equal cost multipath. But these are not inherent or necessary to perform a traceroute.

And understanding why different protocols exhibit different behavior / observe different metrics, or why some nodes don't send ICMP TTL expired, is important. But that's more in line with what you call "using it to troubleshoot", which is not "how it works."

But "how traceroute works" is simple: First you send a packet with TTL=1, then you send a packet with TTL=2, and so on. That's it, that's how it works.


The typical candidate I might ask doesn't know that IP packets include a TTL field. So we start simpler: "What is a packet?" "What is a router?" "How does a packet get to the destination?" "What if there's a routing loop?"

Some candidates throw up their hands immediately, which makes it a short interview. Some candidates already know, which makes it a useless question and we move on to other things. For everyone else, I think it's a good interview question.


I asked chatgpt to give me answers to these questions as I'm also in the process of learning these things. What do you think of these answers? Would chatgpt pass the interview?

1. *"What is a packet?"* - A packet is a data unit sent over a network, encapsulated within protocols like IP and TCP/UDP. It contains both the payload (actual data) and control information such as source and destination IP addresses. Packets enable efficient routing and reassembly at the destination in network communication.

2. *"What is a router?"* - A router is a device that forwards data packets between networks, operating at the network layer. It uses IP addresses, routing tables, and algorithms to determine the best path for packet forwarding, connecting different network types and managing traffic between internal and external networks.

3. *"How does a packet get to the destination?"* - A packet reaches its destination through routing. It passes through routers that consult their routing tables to forward the packet. The packet traverses multiple networks, with IP protocols guiding it towards the destination, ensuring error checking and congestion handling.

4. *"What if there's a routing loop?"* - In a routing loop, a packet is passed continuously between routers. The Time-To-Live (TTL) field in IP packets prevents endless loops by decrementing each time a packet passes a router. If TTL hits zero before the destination is reached, the packet is discarded to prevent network clogging. Routing protocols also have mechanisms to detect and prevent loops.


Yup I saw your other reply and totally makes sense -- networking related or adjacent, but not quite networking-proper type roles. Thanks for your replies.


Its relevant to what I stated, that interpreting the data that traceroute provides is not simple. The mechanism itself is indeed very simple.

Have you never been asked a question in an interview that starts a discussion or has follow up questions? In isolation its not a good question, true for most questions, but to initiate something deeper its good. After the initial explanation of how it works you can get into how you have used it, what kind of issues you have solved with it. Then maybe look at an actual case and give your interpretation of the data. You could get into router hardware architectures, what the control/data planes are, why some drops in the output are not a problem and when they are, ECMP, why bidirectional traceroutes are useful, routing topology, flapping routes, etc.


I just realized I don't know traceroute as well as I thought I did. apparently Unix systems send UDP packets, while Windows sends ICMP. which maybe explains why I've had such a miserable experience with traceroute on the modern Internet, because many ISPs block ICMP from outside.


I’ve never encountered an isp blocking icmp that doesn’t also block all sorts of traffic. I can ping 8.8.8.8 or whatever from almost every connection I’ve used anywhere in the world.

Not every router will return a tel expired of course, and many ISPs route ipv4 traffic via rfc1918 addresses nowadays , so you can get big gaps in their networks, but that applies whether your outbound packet is icmp, tcp, UDP, or any other type.

Worth remembering that packets with different source and destination ports can route via different paths, so sometimes you need to be aware of the entire ip/port/protocol for src/dst and configuring them properly. Nat can cause problems there too when it changes your source ports.


+1 for this, I always recommend ras' tutorials at NANOG. He goes into a lot of detail people just hand wave over or don't even consider


Hadn't come across NANOG before -- thanks for mentioning them.

Here's a link to their videos: https://www.youtube.com/playlist?list=PLO8DR5ZGla8iVN2v3UKkR...

This is the ras' "Troubleshooting with Traceroute" tutorial: https://www.youtube.com/watch?v=WL0ZTcfSvB4&list=PLO8DR5ZGla...

Slides: https://archive.nanog.org/meetings/nanog47/presentations/Sun...

(There are one or two other traceroute tutorials, not sure how different they are from the above, e.g. https://www.youtube.com/watch?v=4dUqVlZ6trU&list=PLO8DR5ZGla... ).


Unix sys admin, build & release, developer tooling, operations.

We use the Internet every day. I like folks to have an idea how it works, to be intellectually curious, and to be generally informed about the technology they use.

The traceroute man page explains how it works.

FWIW, my CS degree included a networking class.


fwiw, networking wasn't a required class at my school. neither was databases, weirdly, nor anything web-adjacent (no JS, no HTML or CSS.)

...huh. I'm realizing my CS department was pretty weak. but tbf I knew how traceroute worked by high school, so it didn't matter.


> Also, Linux/Unix traceroute by default use UDP to a high numbered (and usually closed) port for probe packets instead since UDP historically is less likely to be dropped/filtered than ICMP.

thanks! ive been wondering about this for ages!


Trivia: 32,768 + 666 is the usual starting port of most implementations


Is that still true? https://www.rfc-editor.org/rfc/rfc3443.txt has no mention of seconds.

It just says that outputTTL is (inputTTL - 1). With some exceptions.

[edit: I missed that that RFC is for MPLS but would be interested in your comment anyway; the definitive version seems to be https://datatracker.ietf.org/doc/html/rfc1122]


RFC 791:

> The time is measured in units of seconds, but since every module that processes a datagram must decrease the TTL by at least one even if it process the datagram in less than a second, the TTL must be thought of only as an upper bound on the time a datagram may exist.

https://www.rfc-editor.org/rfc/rfc791.html

The equivalent field in IPv6 is named hop limit in recognition of how the TTL field is used in practice with IPv4:

https://www.rfc-editor.org/rfc/rfc2460.html


BGP looking glasses are very fun, although a bit of a different concept.

And WOAH, that's really interesting about TTLs. Thanks for sharing! That's awesome and terrifying!


The looking glass sites often include a traceroute tool too, but "looking glass traceroute" may return better results. Unfortunately most of them have bitrotted. :-(


> many are unable to figure it out from first principles no matter how many questions I answer about TCP/IP

TBF it's neither TCP nor IP, but ICMP :-)

Edit: I meant to write "neither TCP nor UDP" -- but even though I could make the correction I'll leave my error in place.


It's IP and ICMP and possibly UDP, which are all part of TCP/IP, aka the Internet protocol suite.


The traceroute that ships with macOS supports ICMP, UDP, TCP, and GRE. As I recall, the one(s) available on most Linux distributions do as well.


> ipv4 TTL is dejure seconds

Huh. TIL. But how would a router know how long a packet took to traverse the hop? The packet doesn't have the information to figure that out … were they expecting people to configure routers to know how far away the previous hop was?

Also >1 second is most of the way to moon. (Yes, yes, speed in a non-vacuum is blah blah and switches blah buffers blah…)


Here's what Jon Postel wrote in IEN 41[1] in June 1978:

> The time to live is set by the sender to the maximum time the segment is allowed to be in the internetwork system. If the segment is in the internetwork system longer than that the segment should be destroyed. This field should be decreased at each point that the internet header [is] processed to reflect the time spent processing the segment. Even if no local information is available on the time actually spent, the field should be decremented. The time is measured in units of seconds (i.e. the value 1 means one second). Thus the maximum time to live is 255 seconds or 4.25 minutes.

A router would have to track of when each packet enters and leaves and then round to the nearest number of seconds.

[1]: https://www.rfc-editor.org/ien/ien41.pdf


Cool project! Two suggestions, one serious, one frivolous:

- I wonder if you could get more accurate results by using TCP or UDP instead of ICMP. I think traditional traceroute has an option to use UDP, mtr [1] can use TCP or UDP, and tcptraceroute [2] can use TCP.

- This would be a perfect fit for some Talking Heads references. "And you may ask yourself, well, how did I get here?" [3]

[1] https://github.com/traviscross/mtr

[2] https://linux.die.net/man/1/tcptraceroute

[3] https://en.wikipedia.org/wiki/Once_in_a_Lifetime_(Talking_He...


1. ICMP seemed a lot faster / more reliable, but you're right that UDP does sometimes bring out better results. Something I want to do is make it use a hybrid approach — sending both ICMP and UDP packets and using the best of both.

2. Wayy ahead of you, check for HTML comments :))


ICMP isn't as reliable for trace routes. ICMP typically flows through the management plane instead of the data plane. The management plane is often driven by low-power MIPS processors, which can appear as additional latency in the traceroute results.

When a network device is busy, ICMP may be dropped entirely.

ICMP is a great tool to establish baseline connectivity, assuming the device responds to it. TCP will provide more accurate results (or UDP given the device responds to UDP packets) if you know a specific port is open.

traceroute uses UDP by default. tracert.exe only uses ICMP.


While ICMP isn't as reliable, I think the reasons are a little different. Its not that ICMP gets punted to the management plane in general - i.e. when you "ICMP echo request" ping a host, that packet doesn't go through the management plane of all the network devices in between. You know that as ping doesn't have the latency/loss problems traceroute does.

Rather what is going on is most network device data planes are going to punt _all_ IP packet's "time-to-live exceeded" case to the management plane, and so it will have the problems you describe (latency, throttling).

So why is ICMP still worse? First of all it misses having the UDP/TCP "port" information that allows it to be flow hashed through different routes. Secondly if the route hits any firewalls, they often have very different configuration for ICMP then they do for the TCP/UDP that they are configured to let through.


This is sourced from https://archive.nanog.org/sites/default/files/traceroute-201..., Page 32-36. Feel free to disagree with the presentation, but it would be nice if you could offer a source rather than "I think".


You misunderstand how TTL errors are signaled.

In IPv4, ICMP is used to send TTL exceeded messages regardless of what upper level protocol was used in the packet that expired in transit. UDP, TCP, ICMP, doesn’t matter.

The slides you linked to are correct, in general, about router slow path behavior, but that isn’t what makes UDP ping “more reliable”. It’s “more reliable”, theoretically, because UDP is less likely to be subject to indiscriminate filtering than ICMP.

Additionally, generally only ICMP sent to or generated from a router goes through the slow path. An IP packet being sent through a router generally goes through that device’s fast path, regardless of the payload type.


We're not disagreeing with the presentation, you're misunderstanding it.

No matter which protocol you use the packet only has one destination, the destination IP, so it does not go through the slow path on any hop on the way just because its ICMP. When a packet hits the TTL limit on a hop it will go through the slow path as the CPU will generate the response ICMP. The protocol used for traceroute makes no difference here.


Thats a nice reference, but those slides agree with me. He is explicitly talking about "ICMP Generation" getting punted, exactly as I am.


Interesting, I didn't know traceroute used UDP by default. Just did some testing with the best traceroute tool, mtr[0], and using the -u flag for UDP seemed to give the worst results. -t for TCP seemed OK. I never thought about the difference between how TCP and ICMP packets are handled.

[0] https://bitwizard.nl/mtr/


Yep! UDP/3343 by default.

You can use the -P flag to set the protocol (TCP, UDP, ICMP Other, GRE) and -p to set the port number, if applicable.

Microsoft should have stuck to the BSD traceroute. I don't know why they "invented" tracert.exe (which I believe is _based_ on BSD traceroute, like most of those early network tools in Windows). I wonder if this was a Winsock (or STREAMS[0]!) limitation.

[0] https://web.archive.org/web/20151229084950/http://www.kuro5h...


I assume that tracert using ICMP is not that much an WinSock limitation but limitation of the underlying NT security architecture without suid binaries, which was worked around with ICMP-only non-privileged “raw sockets”. But that is mostly an educated guess.


This is a limitation that dates back to NT 3.1 where SOCK_RAW was callable by standard User accounts. Only starting with Windows 2000 [0] does creating a RAW socket require local administrator rights.

[0] https://learn.microsoft.com/en-us/windows/win32/winsock/tcp-...


"Reliable" may have been crufty wording but I did get more responsive results using it. MTR uses ICMP by default. Definitely want to do a hybrid thing in the future though!


Traffic to vs through a router; both UDP and ICMP probes to a given hop will both go to the control plane which can be rate limited or handled by a general purpose CPU.

Probes to hops beyond will virtually always go through the fast data plane.


  <!-- This is not my beautiful house. -->
  <!-- This is not my beautiful wife.  -->
The kids are all right. :)


Every now and then, I am less afraid for them with examples like this. However, they do seem to be getting more rare


You are working on a lot more cool things than I was when I was 17. Rock on! If you enjoyed "I, Robot" I recommend Ted Chiang's "Stories of Your Life and Others". The movie Arrival was based of one of the short stories in this collection.


Oh, thank you, I'll buy a copy! I thought Arrival was a fun movie


Seconding that... Ted Chiang is one of my favorite authors! His short stories are a really fun mix of technology/sociology/religion/history, and each one only takes like half an hour to read.


Good work, a nice site explaining the traceroute is very handy! I could see myself using this to explain what traceroute is for various technicians I'm training to maintain corporate networks.

(Now if only I could figure out how to enable traceroute to work on each hop from a given workstation through corporate cisco access switch, core switch, BGP tunnel to aws transit gateways, and eventually land at the VPC route table on the EC2 instance, then i might actually be able to call myself a network guy)


You can't use traceroute for switching as it operates on layer 3. Switches use MAC addresses, not IP. Switches are essentially transparent as they just forward traffic, they don't route.


The switch and router terminology distinction has become fairly meaningless at enterprise level hardware as discussed in that comment. The same corporate access or top-of-rack "switch" is just as capable of layer 3 routing as it of layer 2 network handling. This happened because the desire to expand a layer 2 fabric across multiple physical switches basically reinvented a lot of layer 3 concepts on layer 2. The silicon is the same for both, just different algorithms.


There are multiple motivations and resulting technologies there:

- “L3 Switches”, which in essence are routers that do not have discrete ports and instead have closely coupled switch. Alternatively the same thing can be viewed as switch with powerful enough control-plane CPU that it can do routing (pure L2 managed switches usually have ridiculously underpowered CPUs). This can be and often is combined with some amount of offloading the routing and even more complex upper-layer processing into hardware.

- things like Shortest Path Bridging, which uses L3-style routing protocol (IIRC it is IS-IS) in order to build L2 FIB tables for L2 switches. The idea there is to not only have (R)STP-like redundant paths for ethernet, but to use full capacity of such paths when it is available.

On the hardware level it is mostly about accelerating the fast-path, where the hardware FIB contains some cache of bit patterns seen in the received frame, where it should be forwarded and maybe how it should be rewritten. And packets that are not matched by HW FIB are passed to the CPU to be processed. In a “dumb” bridge (ie. L2 switch), such frames would be simply broadcasted to all ports.

While the general idea is the same, there is material difference in the silicon complexity between matching destination MAC of ethernet frame and matching arbitrary bit patterns somewhere in TCP header of IPv6 packet.


Cool! I was nervous it was gonna show my Tor circuit o_o

Unfortunately so many nodes ignore traceroute packets that it basically said my exit node connected to Linode and then Linode connected to your computer. I have the same experience with forward traceroutes, my router replies, my server replies, and if I'm lucky, one node in my ISP's network. The rest is locked up tight.


Tor works on the level of circuits and from the inside of the circuit has nothing to do with IP. The exit node is the first node where that is converted to an actual TCP connection. This among other things means that the only protocol on the public internet that works over Tor is TCP.


(Part of this might be that all this HN traffic is overwhelming my tiny server a bit, I'm working on getting it smoother live rn)


Would love an update for what you had to do to make it "smoother" under the HN load!


Since you're in the weeds with traceroute you might appreciate this one (if you haven't seen it already):

traceroute bad.horse


You used to be able to get the Star Wars via traceroute or telnet:

https://itsfoss.com/star-wars-linux/

But when I tested just now it didn't work for me so your milage may vary.


if you don't have traceroute:

   tracepath -m128 bad.horse
works just as well.


Is there music that goes along with these lyrics?



okay, i had never seen this command used, but i got extremely lucky by it running pretty much in sync with this video. If I had tried to get it in sync, I couldn't have gotten any closer. sometimes, the internet smiles upon you and just gives you a moment.

thanks for the smile to all in this thread from here up!


+1, love it


I found about "Lying for Money" on your website a few months ago when you posted your CPU thing. Thanks that was totally worth a read. I recommend you "When Mckinsey comes to town" (the book) in a similar vein, that was entertaining in human mischief as well.


Keep up the awesome work, Lexi. You are very skilled and I can tell that you will rise very quickly as an engineer and a teacher


This is incredible, Lexi. I have had a lot of fun learning about the low levels of networking; there are so many interesting details. Documenting those — and making them so intuitive, like you do here — is amazing.

Keep up the amazing work!


Really awesome project!

A technical nitpick though: Routes can be asymmetric—going across one path in one direction and another for the opposite. This means that your tool potentially doesn't show the route packets from the user took to reach your server, but rather the route packets took from your server to reach the user. I believe that querying with BGP looking glass tools would allow you to construct the route in either direction, but it is maybe a bit less cool looking than the real-time traceroute that is a result of actual traffic.


I'm also interested in this topic (since I was 17 and didn't lose interest in twenty years). Recently, I've created a hobby project - the atlas of the Internet: https://reversedns.space/

I'm posting here because it might be interesting for you. How it was built: https://presentations.clickhouse.com/meetup85/app/index.html


Great job!

I was reminded of working at a company in 1996... we had windows 95(!) with Trumpet WinSock and a dial-up modem (24k, IIRC). I was just learning how all this stuff worked and fit together. I stumbled on a traceroute screen that would slowly drip out each hop and... it was magical to me. Suddenly realizing the idea of 'a big global network' I'd read about was actually... right at my fingertips, and I could see which computers my traffic was being routed through... that kept me up at night for a while. Not sure I'll say it was life-changing, but it sort of felt like it for a bit at that time :)


This is absolutely amazing <3

The narrative based traceroute in green is something I’ve never seen before. How many providers, like CDNs, did you take the time to map into a narrative?

This feels targeted towards folks who kind of already understand computers. It be cool to repackage this in a way that can show non-technical people the stuff they take for granted. The mountains that move on a user’s behalf under every keystroke is humbling.

One of my favorite books on this topic is Interconnections: Bridges, Routers, Switches, and Interconnections by Radia Perlman if you haven’t come across it.


> I felt a lot of pressure to make another Big Giant Thing but didn't really have anything compelling.

Easier said than done, but don't feel like you have to provide a constant flux of interesting things. That kind of pressure ends up being toxic pretty quickly. Do what you enjoy, if it hits an audience, great, but don't feel like you have to make it happen.


Nice. I'm wearing black jeans and I'm always surprised by how little developers know about basic networking. I was always interested in it but also learnt about it at uni. I don't remember all the details, of course, but just generally knowing what routing is, what IP is etc. pays dividends when building anything that runs on networks/internet.


Nice work, Lexi. Let me add that your design skills are pretty excellent as well.


Hey Lexi,

Total side note, your other post (about cpu.land) is at exactly 1337 upvotes now :D https://share.cleanshot.com/ktVWL2pr

How fitting.


Why link to that instead of:

https://news.ycombinator.com/item?id=37062422

I'm 17 and wrote this guide on how CPUs run programs (3 months ago)


The OP already linked to that, and it looks like it has still an up vote button? Couldn't the count go up?


I KNOW, right?! It's perfect.


Most of the really cool things on the interwebs started just like this.

Kudos and carry on!


Both submitted and previous are cool and very well made. Grants!


Your projects are super inspiring!


> BGP is the protocol that gives the Internet its shape, and you can’t directly speak it yourself.

It's actually surprisingly easy to get an ASN for yourself and speak BGP. If you find building something like this tool interesting, you should give it a try. I wrote an introduction of sorts earlier (https://qt.ax/asn) if that interests you.


This is "how your server reaches my computer but reversed" rather than "how my computer reaches your server". The routing in each direction will most likely be very different.


Yeah, there is actually a whole section about this in the article! Called "Front to Back, Back to Front".

tl;dr in my experience the networks traversed are usually very similar, and the content is relevant and interesting either way around


Back in the days of bang paths, it was interesting to see how different a sender's bang path might be to the bang path for your reply and try to figure out what network connection differences caused the asymmetry.


Here's a paper you might be interested in with regards to how traceroute works. One thing that many (non-network) folks miss is that traceroute is not necessarily symmetrical -- the return path may differ.

https://archive.nanog.org/sites/default/files/traceroute-201...


Due to internet problems in Egypt earlier today with Telecom Egypt, I had a traceroute to the same destination going via _8_ different paths. Have to say that's twice the higher number of ECMPs I've seen on the internet in the past.

And yes, traffic is routing different ways from my Cairo office to my UK core --London ->Cairo is direct and still suffering massive loss, Cairo->London is now routing via ntt and seems fine. If they haven't fixed it by tomorrow might have to change some local prefs.


It's extremely rare but the data paths can be asymmetrical too. Eg satellite with a ground based uplink.


TY, will read! I touched on this duality in the article actually :)


> WHOIS is actually an... interesting protocol to make a parser for.

It's actually impossible. Responses are essentially free-form (if the server responds at all). I tried my hand at this; you can make an ad-hoc "parser" that works for 90% of addresses/domains (or you could, ten years ago when I tried). But the remainder are intractable.

Nowadays it's much worse; nearly everything is hidden behind privacy shields, which purport to protect PII. But WHOIS records aren't supposed to contain personal information; they're supposed to contain contact information for network operators.

This is ICANN's doing, I'm afraid. ICANN had a rule that networks should provide public WHOIS servers. They never enforced the rule, and now they've scrapped it.


> But WHOIS records aren't supposed to contain personal information; they're supposed to contain contact information for network operators.

Doesn't a whois have to include email, phone number and physical address? For a company that's not really PII, but I don't understand how it wouldn't be considered personal information in the whois for my personal website.


RDAP contains some of the WHOIS information in a machine-readable format. (JSON) Not everything, but I think it's better.

Not everything runs an RDAP server, though; I do wish ICANN/IANA or whoever would enforce that.

> Nowadays it's much worse; nearly everything is hidden behind privacy shields, which purport to protect PII. But WHOIS records aren't supposed to contain personal information; they're supposed to contain contact information for network operators.

Network operator info can also be PII. My info is PII, but I have a domain name, so putting my info into WHOIS is putting PII into WHOIS.

The privacy guard just forwards everything to me, minus spam.

(If it's a corporation, I don't think there's a good reason to permit privacy guards. But not all domains are owned by BigCos, yet.)


Regarding RDAP, it actually is enforced by ICANN, it’s been mandatory for a few years for gTLDs (not sure if it is, or can be made mandatory for ccTLDs). All registrars handling gTLDs should now have an RDAP, otherwise they’re in breach of ICANN rules.

RDAP has the benefit of being JSON, but even then it’s a reaaally crappy format. For example, contacts are represented by the jCard pseudo-standard, which is a JSON version of vCard, and it’s completely awful and hard to deal with. Basically instead of a nice JSON object, it’s arrays in arrays in arrays…

RDAP should get better in the future versions, but I’m not sure registrars will follow in good faith because the initial specs were a bit of a shit show.


> not sure if it is, or can be made mandatory for ccTLDs

ccTLDs don't. Proof by counter-example: .co, .io.

I don't know if that can be enforced, but it'd make my life easier.

(Technically, too, AIUI, it's the registries that run the RDAP servers, not the registrars.)

Yeah, the JSON is a bit of a crap-shoot, but I think it's maybe marginally better than trying to parse raw WHOIS text…? IDK. Probably depends on the exact datum you want to pull out of it.


> you can make an ad-hoc "parser" that works for 90% of addresses/domains (or you could, ten years ago when I tried). But the remainder are intractable.

Could generative AI help out these days? "Here's a whois, give me [the info I want]:"


You can go one step further and instead of separate ICMP ECHO trace, use existing HTTP TCP connection between client's browser and your web server. That way you can traverse client-side NAT and/or stateful firewall.


Ooh, that's compelling. I wonder how that would work for an actual traceroute.


called a "parasitic traceroute"



You might be interested to know that many times, packets in a TCP session take asymmetric routes across the internet. In my experience, the most common reasons for this are business rules related to cost and human error.

If you think about how IP works, you’ll see that this doesn’t particularly matter but that it can make understanding the routing more difficult.


Fun story time!

Boise State University, and the University of Idaho are two schools at opposite ends of the state of Idaho. UIdaho in the north is close to Spokane, and almost all of its connectivity comes from Seattle. Boise is closer to Salt Lake so most of its connectivity comes via Portland or Salt Lake City. The middle of the state between the two schools is mountains and very, very little large scale connectivity at all, except there was a small line way bad in the day because the UofIdaho had remote classrooms in the southern part of the state. Sometime in the late 90's a network engineer from BSU and one from UofI realized that they both had switches and routing kit in the same building so they ran an ethernet cable between them.

The effect was catastrophic. It turns out that both networks happily started announcing BPG to each other, which in turn announced the connection to the internet as a whole. Suddenly there was a very short jump between networks in Seattle and networks in Salt Lake City. That poor little t1 (iirc) was absolutely getting saturated. But, interestingly only in one direction. See Boise announced the route, but Idaho didn't so the traffic was effectively only failing in one direction.

Needless to say the cable as disconnected and years later when I worked at the UofIdaho it was still well known that the two networks shouldn't ever be connected again! (Which was ironic because I was working on a program to setup I2 at both universities)


On my device, there are no intermediate steps shown between my device and the server. Just FYI.


Yeah, sorry, Hacker News has successfully made my server chug a lot :)

I'm working on it right now and hopefully will be working better soon! In the meantime I've increased timeouts so loading will be longer but it should work better.


Same here! But I'm gonna bookmark this and try it another day. Kudos for sharing this - love to see younger folks hacking. This is a cool idea.


Should work a little better now! It's still chugging, but I made the rest of the article load (although now it's a bit janky when you have JS disabled, but not much I can do)


Well maybe if you'd not browse HN from inside linode data centers it'd work better :P


I think it's also worth mentioning mtr, which is what I use much more frequently than traceroute. It can help diagnose intermittent packet loss, and gives you an averaged sense of how things flow.

This article from APNIC explains more about mtr and how to read it (plus some interesting details about how MPLS can obscure true paths)

https://blog.apnic.net/2022/03/28/how-to-properly-interpret-....

Also worth noting: It's also sometimes useful to trace with UDP, and many routers will drop ICMP selectively under strain.

Nice article, and excellent presentation!


As it was mentioned above, you are not measuring connection from the client to the server, but from the server to the client. It can be a completely different route, and internal routing behind the IP is hidden. You should try 0trace to get the real route


Very cool way to present this. Obviously there's a lot of legwork here, but for major ISPs you could probably use some heuristics to give a little more detail on internal network structure. Comcast, for example, has their core network under ibone.comcast.net, but their HFC infrastructure directly under comcast.net. So you can tell when traffic makes the jump from their HFC consumer ISP network to their backbone. At one point I had some notes on patterns like this for a number of the majors but I think I've lost it. :(


> My claim that this website’s traceroute was the path your packets took to reach my server was a bit of a white lie. To calculate that, I would’ve had to be able to run a traceroute to my server from your computer. Instead, I ran the traceroute from my server to your computer and just reversed it. That’s also why the traceroute at the top seemingly loads in reverse order.

I was wondering if we'd address this. That was my first thought - how can you do this without initiating ICMP from my side?

> Does running a “reverse traceroute” sacrifice accuracy? A little, actually.

> As I said when describing Internet routing, each device a packet traverses makes a decision about where to send the packet next until it reaches its final destination. If you send a packet in the other direction, the devices might make different routing decisions… and if one device makes one different decision, the rest of the path will certainly be different.

> This reverse traceroute is still helpful. The paths will be roughly the same, likely differing only in terms of which specific routers see your packet.

Sure... But it's pretty common for multi pathed AS' to traverse in all sorts of different ways. My experience (non residential) is that more often than not, the trace and reverse trace were different. Your upstreams and my upstreams have very different commercial agreements, and both peer and transit in multiple places.

Still cool though, well done!


SQUIRREL!!!

How did you manage to tilt the section header text? I've not seen that done before.


CSS transforms!

    transform: rotate(-2deg) skew(-2deg);
    transform-origin: bottom left;


Akami blocks the return trace which is no fun. But regarless a cool project.

9 a23-203-147-39.deploy.static.akamaitechnologies.com (23.203.147.39) 36.707 ms 36.783 ms 40.110 ms 10 * * * 11 * * * 12 * * * 13 * * * 14 * * * 15 * * * 16 * * * 17 * * * 18 * * * 19 * * * 20 * * * 21 * * * 22 * * * 23 * * * 24 * * * 25 * * * 26 * * * 27 * * * 28 * * * 29 * * *


IPv6 support would be great for this. I run my own AS (AS200676) and one of my prefixes is actually announced at my home. I'm really curious as to what would the auto-generated green text say about that route.

I announce only IPv6, because I don't currently have access to any v4 blocks, they are expensive and I have little need for one.


There should be a traceroute protocol whereby a "specially marked packet" is understood to be traceroute, and generates an ICMP response even though its TTL is still positive. The ICMP response will carry the observed TTL value.

Then one packet will get all the echoes in one go instead of having to send a tirade of packets with increasing TTL values.


Combined with IP source address spoofing, it would probably help greatly with DDoS amplification attacks while only saving up <50% packets for good users.

If you care about time rather than packet count, you can send packets with all reasonable TTL values at once.


Oh no question; the amplification is concerning. Send one packet, get 19 responses, versus having to send 19 packets to get 19 responses: it's a "gain" of 19 to 1.


Have enough problems with routers not sending normal ttl expired messages

Just use a better client. Takes about 3 seconds to do an mtr -b over 15 responding hops from a server in London to something like 43.249.179.0 in the south pacific.


Great write up, thank you for making this! From a learning design perspective, the blend of static and dynamic prose based on my own packet’s journey was super cool, not something I’ve seen done quite that way before.

Makes me imagine an online programming textbook that could to walk you through what your own custom code is doing. Very cool!


Doesn't work at all with Enhanced Tracking Protection enabled in Firefox, and times out when it is disabled.

I guess it's supposed to do something like this: https://dnschecker.org/online-traceroute.php


Shouldn't have anything to do with Enhanced Tracking Protection, but my server is buckling under the traffic pressure! I'm working on it, maybe check back in 40m or so (SORRY!)


Do you have a VPN on as well? It works for me on FF mobile with Enhanced protection


No I think I just tested it while his server was intermittently failing and it happened to correspond to switching this setting off and on.



There's no possible way to guarantee that traceroute is accurate. The same route is not taken both directions.

A Practical Guide to (Correctly) Troubleshooting with Traceroute by Richard A Steenbergen explains it well


> "Hi! I'm Lexi. I'm 17..."

Holy shit. This girl's going places. I just skimmed https://kognise.dev and saw that in addition to the deep understanding of TCP/IP and all 7 layers of the OSI model she appears to posses, she also does front- and back-end development, embedded hardware, mobile apps, and compilers. She also rock climbs, can pilot a Cessna (all by herself), build robots, plays (and composes music for) the cello (since she was 5 years old apparently).

Do I need to keep going? This is nothing short of incredible. If I did 1/10 of the things this kid's already done by the time I kick the bucket I would have lived a full life.


Wow! Impressive is an understatement, she has achieved so much at that her age.


Do the h3 heading tags look like they're tilted a little bit? CSS/font trickery? Just me?


Yes, they are tilted:

    .text h3 {
        margin-top: 60px;
        margin-bottom: -4px;
        transform: rotate(-2deg) skew(-2deg);
        transform-origin: bottom left;
    }


Note, I geo-block IP's not from Australia for my home network. This is why it failed for me.


Why?


I might not now an y better, but this implementation seems so cool.

Can I trace the location of an AS?


You can even use traceroute to send short hidden messages! :D


Love the looooong furret in your header image! :D


It would be cool to add a map of the hops.


[flagged]


remember the avg person presses colorful squares on a screen to "operate a machine"

this type of self-directed exploration is still in the hacker spirit imo even if it's not the Next Big Thing according to greybeards

I enjoyed using it and i'm sure many others too, though i do agree we're in interesting times with how regressed the modern understanding of computing truly is


It makes me wonder what other random system utilities I could hook up to an API and be like “look! wow! netstat from another computer!”


You should be at YC with visions like that


If you had a non-traditional computer education (I definitely did) there will be huge holes because your skillset is going to be hyper practical.

All of these concepts are just somewhat arbitrary rules nerdy white guys came up with 30+ years ago, there is no way someone would understand them unless they were specifically taught them. Programming isn't like music or math where there is a natural order underlying the man-made rules.


> Programming isn't like music or math where there is a natural order underlying the man-made rules.

I'd like to reinforce your point by disagreeing on two counts: programming (in so far as it's related to computer science) is like music and math in that there is loads of underlying natural order that can be independently derived (think about the lambda calculus vs Turing machines which are proven to be identical, for one example). That underlying beauty is there, and I think it's there because computer science is a part of (or at the very least akin to) mathematics.

Secondly, both math and music are, like programming, not really that devoid from arbitrary rules nerdy white guys came up with 30(0)+ years ago. When you're tuning an instrument, you'll likely tune it to the western chromatic scale- but that's just a convention, there are more scales! Just as much of a convention is to tune that scale to concert pitch, where A is tuned to 440 Hz. But that's not required- concerts will tune to 422.5 Hz when playing Handel.

There are all kinds of conventions all around us which are invisible because we're accepting them as a given of the world, more so if we have a traditional education in that field. That's why it's enjoyable to see stuff like this, what intel-VISA called self-directed exploration, it deepens your understanding and is just fun!


Not sure what this has to do with race but ok




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: