Hacker News new | past | comments | ask | show | jobs | submit login

It reminds me of being a teenager in the late 90s and the early days of the internet. I discovered entirely on my own that I could telnet to port 80, type "GET / HTTP/1.1\n\n" and the server would send me the headers + page content. Shortly after I discovered the same worked for SMTP.

I was very far from the first person to have this revelation but it was definitely an eye-opening "there is no magic, it's all just software" moment for me. It fundamentally changed the way I think about computers at every level and inspired me to look "under the hood" at everything... how CPUs implement superscalar OOO execution. How atomic operations work at the CPU level. What a syscall actually is. How subroutines are called (and calling conventions). How dynamic linkers work.

You don't have to be an expert at all these things but it is like a superpower to understand the basics of every layer.




> It fundamentally changed the way I think about computers at every level

I had the exact same revelation also in the late 90s! (~1998 for me). I was already telnetting into servers a bunch and was getting into running an Apache server. I remember the moment I typed "GET / HTTP/1.1" into port 80 so clearly because it suddenly turned the "web" and "HTTP" into something comprehensible with composable parts I could understand at a deeper level.

In our current world of SSH and HTTPS, it seems less likely the new generation will have the same experience. But we also have browser developer tools nowadays which make it so much easier and motivating for someone to start to learn the about the web and JavaScript. In the 1990s I had to use proxomitron to painstakingly modify a webpage's JavaScript or HTML, but these days it's dead simple.


`openssl s_client` is underused, and it's not as clunky since they added support for host:port syntax over separate args. Encouraging thinking of that as the modern replacement would help.


Wow thanks for showing me about `openssl s_client`. I just have this a try and you're it was quite easy!

   openssl s_client -connect example.org:443
   [TLS details...]
   GET / HTTP/1.1
   Host: www.example.org
   <newline>
   <newline>
   [headers and HTML response!]
The one thing tricker nowadays is almost all of the time you need to send the `Host:` header for things to work. Took me a sec to realize that since in 1998 it was almost never necessary.


Glad to spread the knowledge, and glad you gave it a shot! It really is that easy, and host headers have been pretty regularly required anyway.

Slightly more interesting is using it to access internal sites, and setting up your own TLS roots and chains for personal or corporate infrastructure. In practice, while useful for internal use, I generally recommend everyone use LetsEncrypt and public names for even internal APIs when they cross team boundaries, because it's just easier.


> In our current world of SSH and HTTPS, it seems less likely the new generation will have the same experience.

Pushing encryption into the transport layer a la QUIC could solve this, if not for the spurious dependency on user-hostile TLS instead of a simpler PKI. SSH would become telnet over a QUIC stream, which could be used with a QUIC-enabled netcat (say). HTTP/3 could have been either 1.1 or 2, just over QUIC, but this wasn't pursued.


I think this is one of the most fundamental things a person can learn about software engineering: there is no magic. If something happens, it was part of the operation of written code and an exchange of data somewhere.

My daughter is starting to get to an age where she’s inquisitive about how magical things work, and I usually respond by asking “how _could_ it work?” And we talk a lot about what actually does what.


Being even somewhat conversant in some of the more popular protocols is also like a superpower that the newbs don't really have, too. The ability to use telnet or nc to answer the question "aside from any other piece at any level in the stack that could be going wrong, can I even talk to the HTTP server" helps you eliminate a lot of possibilities about what's going on when troubleshooting something.


Same! I didn't even know anything about telnet or programming. I was using Klik&Play or MultiMedia Fusion to make games, and some component supported TCP, so I opened a port using that and just for fun connected to it with a browser. Then I saw a request, so I used that the other way around on a real web server and it worked. Same thing with SMTP.


> Klik&Play

That's a program I haven't heard of in a long time. I wonder if you can still get it running in a VM since it was just shareware with a nag screen if you said you were using it for "educational purposes"


> What a syscall actually is

ive only regarded it as a literal system call, the lowest possible level, “language agnostic” api that does a thing in the OS. do you have some deeper insight?


"Syscalls" is a topic in systems programming. You can Google "syscall faq" for example.

https://blog.packagecloud.io/the-definitive-guide-to-linux-s...

Understanding how syscalls are invoked from user space typically involves knowing what calling conventions are, knowing what an ABI is, etc.

https://man7.org/linux/man-pages/man2/syscall.2.html


At the most basic level a system call is: loading the arguments into the ABI-specified registers then triggering an interrupt. Some architectures have a specific syscall or syscall-like instruction that is more optimized than a generic software interrupt but conceptually it is similar.

The syscall/interrupt instruction transitions to supervisor/kernel mode and moves the execution pointer to the configured location in the kernel.

If this sounds kinda like switching threads or processes you would be right. But if you had to pay that context switching cost 2x on every syscall it would kill performance. Most OSes use a split address space as an optimization here: every userspace process has the kernel's memory mapped in the upper half, but with protection bits that make it inaccessible to userspace. That is so when a syscall is issued there is no need to change the active page table entries or flush the TLB: the kernel is already mapped only now in supervisor mode those kernel pages are accessible.

The CPU decided what code got control by the interrupt table which itself can only be configured in supervisor mode. That is what prevents a userspace process from hijacking the CPU. User mode code doesn't have permission to modify the register that points at or the memory containing the interrupt handler tables. Thus by definition any syscall/interrupt will jump to kernel code.

The kernel entrypoint then often has a COPYIN/COPYOUT process that will treat certain register values as pointers and copy the data into the kernel's address space when required (or copy it out to a caller provided buffer).

For reference pre-emptive multitasking is related. The kernel's scheduler configures a hardware timer interrupt. The configuration of this timer can only be done in supervisor mode. So once the current thread's timeslice is up the timer fires and the CPU changes the instruction pointer to the kernel's configured timer interrupt handler. User mode code can't prevent the timer from firing nor change what code the CPU will jump to. The scheduler routine saves the current context to memory, loads the next thread's context (registers, instruction pointer, page tables, etc), updates the timer's next deadline, then "returns" from the interrupt... only the instruction pointer is now in a different thread (or different process with different memory entirely) so the CPU "returns" to a different piece of code. If all goes correctly it "returns" to the next instruction beyond the one that completed when that thread last got pre-empted so from that thread's POV execution was continuous.


The difference between a syscall and a library function call is that a syscall crosses protection boundaries. Implementations differ, but where a library (even the lowest-level OS library like libc) runs in the context of the application and can be invoked with a regular "store pointer and jump" method call, a syscall usually involves transfering control to the kernel through a software interrupt.


I was lazier and just did "GET / HTTP/1.0\n" and saved one character :P

Edit: I am probably wrong about "1.0", might have been that I just did "GET /" and saved 8+ characters. I was just trying to make a funny remark about "Single line request" vs "Multi-line request"




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: