How does mkdir() really work?

antics · on Aug 19, 2010

Linux is a monolithic kernel designed to operate on all sorts of hardware, in all sorts of environments. So it's complicated. In some cases it can be beneficial to see a simpler implementation, which is a task I think Minix [http://www.minix3.org/] is well suited for. Edit: Why? It's smaller, it's a micro-kernel, and there is a lot of documentation, both about the source, and about the theory of the source? And why is that, you ask?

For those who are just starting, the author (Tanenbaum) also wrote a book called Operating Systems: Design and Implementation, which is a great resource for learning the ins and outs of OSs. But it is a necessity to know C beforehand, and you should have a reasonable understanding of basic CS first, also.

Edit 2: Oh, also, Minix is in a lot of ways responsible for the genesis of Linux, for those who didn't know. [http://groups.google.com/group/comp.os.minix/msg/b813d52cbc5...]

recampbell · on Aug 19, 2010

I disagree: Don't wait to have an understanding of C or CS, just dive in.

Find something interesting and figure out how it works. The best learning happens when you get in over your head. Once you get sufficiently lost, go back and learn some C. You'll appreciate it more and have a context to apply what you learned.

jpcosta · on Aug 19, 2010

I disagree: That approach might work if either a) you are a genius or b) the problem is simple so background knowledge wont make a difference c) or you have loads of time to go back from all dead ends you'll get yourself into.

The kernel code is complicated. Learning C and some CS concepts before diving in will save you a lot of time and headaches. What you could do is learning C and CS while trying to understand some bits and bytes of the kernel, but for that I guess you would need someone else to guide you

antics · on Aug 20, 2010

I tried that when I started in the fall of '09 and it failed miserably. When I came back this summer after intensively studying C and computer systems in general, things went a LOT smoother.

If you want to learn that way, you have to be incredibly tenacious. Some people are, some people aren't, but I think my time was better served by learning all the dependencies and then breezing through it when I was in the right place. I mean, you could learn Organic Chemistry and just backtrack to learn Chemistry where applicable, but that's a very hard way of doing it.

bconway · on Aug 20, 2010

I tried that when I started in the fall of '09 and it failed miserably. When I came back this summer after intensively studying C and computer systems in general, things went a LOT smoother.

I'm not doubting your story, but it's not a hard-and-fast rule. Con Kolivas, who has written some interesting/thought-provoking schedulers for the Linux kernel (and stirred up some politics along the way) had never touched C before jumping in. (http://apcmag.com/interview_with_con_kolivas_part_1_computin...)

antics · on Aug 20, 2010

I completely agree. There certainly are bright people out there. What I'm saying is, I consider myself to be about average intelligence, and so my example is probably more typical, although I do concede there are probably outliers.

MarkBook · on Aug 19, 2010

I think there's a lot in what you say. It's more like the approach kids have to learning stuff when left to their own devices

lolipop1 · on Aug 20, 2010

Not exactly the same thing.

At the end you tend to re-invent the wheel if you learn that way. Learning about the existing wheels will save much time and correct a lot of errors that you might never even see when working that way.

And kids knowledge is generally supplemented by adults to complete the picture and sometimes we have to lie to introduce them to some concepts.

ori_b · on Aug 19, 2010

I found the Minix source code to be quite terrible to follow through. If you want a simple implementation, I'd suggest looking at the xv6 source code. It's far more comprehensible, cleaner, and better architected, in my opinion. For the code in question, look through sysfile.c in the source code.

http://pdos.csail.mit.edu/6.828/xv6/

For a real-world example, I second the recommendation of BSD, combined with the book below.

dododo · on Aug 19, 2010

linux was initially designed for just x86. that's the real reason it's complicated in this case: each architecture defines it's own system call table and dispatch, for example, instead of having a common one amongst architectures.

other kernels (like NetBSD) where designed with being portable across architectures in mind and so have much cleaner code paths (typically just one system call table and the bare minimum arch specific code to link it all together).

minix is okay but it's not really going to give you a grip on a real work kernel. "design and implementation of 4.4BSD-lite" is a really good book for this: most BSDs are very similar to this design still in a lot of ways, and many of the ideas described (e.g., VFS) are also used in linux.

antics · on Aug 20, 2010

First, actually Linux and Minix (and BSD as far as I know) are both POSIX compliant, so the syscall table will be more or less the same. The thing that changes between architecture is the instruction set, which is different, no matter what. That's actually the point of POSIX compliance: to define *nix-like systems in a way that is predictable.

Second, actually the more architectures you are compatible with, the more code is involved, and it tends to be more, not less, complicated. This is especially the case because all OSs have some assembly in them, and that assembly DOES change per your architecture.

Third, Minix is a "working" kernel. The main difference between Minix and BSD/Linux/whatever is that it is a microkernel, which is easily the best to learn on, on account of things like the permissions structure being MUCH simpler (arguably one of the most difficult things to grasp), but not the best in terms of (for example) security.

dododo · on Aug 20, 2010

1. the system call table of linux and minix are very different. posix specifies the minimum interface (and i don't think it specifies what is and what is not a system call). linux has a lot more system calls than minix. the system call table itself changes between architectures under linux since...

2. linux duplicates the system call dispatch table for each architecture. check it out:

http://lxr.linux.no/linux+v2.6.35/arch/x86/ia32/ia32entry.S#...

vs

http://lxr.linux.no/linux+v2.6.35/arch/avr32/kernel/syscall_...

whereas NetBSD has no such duplication. in fact, one complaint about the linux kernel is that it has too much per-architecture code.

3. who uses minix? the linux kernel is actually not too bad. ctags is your friend. i learnt on linux+netbsd, it's not really as bad as people make out.

ams6110 · on Aug 20, 2010

Tanenbaum's book and Minix was what we used for our undergrad operating systems class ca. 1990. To put it in perspective, Minix was a compact but fairly usable unix-like OS that booted and ran from a 5.25" floppy disk.

shadowfox · on Aug 20, 2010

The newer versions of the text features a more full featured microkernal Minix

Locke1689 · on Aug 20, 2010

The first response about C not knowing how to do syscalls with multiple arguments is wrong. This kind of thing is done via the system syscall interface. In fact, the more I think about it the more wrong he becomes. System call interfaces don't have any such thing as "language dependence" -- if you can't do a syscall in straight object code you're screwed no matter what language you're using. The syscall intrinsic in C and C++ is a nonstandard Linux defined relation between the OS syscall interface (per the OS designation for the ISA) and code being run.

Here's how you do a 2-arg mkdir:

  push %rax
  push %rbx
  push %rcx
  movq $0x27, %rax
  movq $path, %rbx
  movq $mode, %rcx
  int $0x80

Here's how you do a 3-arg:

  push %rax
  push %rbx
  push %rcx
  push %rdx
  movq $0x128, %rax
  movq $dfd, %rbx
  movq $path, %rcx
  movq $mode, %rdx
  int $0x80

Edit: Hmm, maybe he was referring to overloading the intrinsic. That kind of makes sense, although there's a standard way to do that if necessary (first arg is number of args, just pop x off the stack after the call, see printf).

rwmj · on Aug 20, 2010

"int $0x80" still works for backwards compatibility, but it's not been used for making syscalls from modern code for many many years.

dododo · on Aug 20, 2010

oh? on a recent glibc:

  $ objdump -D /lib/libc.so.6 | grep 'int[[:space:]]*$0x80' | wc -l
  447

what were you thinking they used instead? sysenter? iirc, this turns out to be slower than "int $0x80."

Locke1689 · on Aug 20, 2010

I was also trying to be Intel/AMD ISA independent. The Intel is SYSENTER, the AMD is SYSCALL. He's right though, I probably would have used SYSENTER in production code. You probably want to use int $0x80 in shellcode, though (fewer save registers).

rwmj · on Aug 21, 2010

On i386 perhaps, but on modern 64 bit machines like the one in the example code above it'll be using sysenter or another method.

$ objdump -D /lib64/libc.so.6 | grep 'int[[:space:]]+$0x80' | wc -l

0

koenigdavidmj · on Aug 20, 2010

Apparently on old Unices, mkdir was not a system call, so it had to manually mknod(2) the directory and hard link the directories '.' and '..' in. This also required the binary to be setuid root.

drv · on Aug 19, 2010

The question and answers are all missing the key "glue" between the C mkdir() function and the Linux syscall - glibc.

meastham · on Aug 20, 2010

I get this, but why is there a declaration of the libc mkdir() function inside of a kernel header?

caf · on Aug 20, 2010

There isn't - <sys/stat.h> is a glibc header, not a kernel one.

Seth_Kriticos · on Aug 20, 2010

True, to be more precise, the GNU C library has it in the sysdeps/unix/mkdir.c file:

[..] char *cmd = __alloca (80 + strlen (path)); (mkdir command line parsing) status = system (cmd); [..]

That's right, it just relays. I'm not sure how it gets to the kernel. I suspect with a system call somewhere.

What I know is that it arrives in the kernel, in the fs source files: namei.c for the vfs part and <fs-name>/namei.c for filesystem specific implementations (that are called by the vfs code in the end, I guess).

Ps. Feel free to correct me. I only concluded this by poking around the sources a bit, not into kernel development myself.

caf · on Aug 24, 2010

That's a fallback mechanism, used by glibc on any "unix" system that doesn't have a more specific implementation deeper in the sysdeps/ hierarchy (there's a Linux one that defers to the syscall somewhere in there).

jacquesm · on Aug 19, 2010

mkdir is fairly mild in that respect though, compared to say fopen or fwrite.

probablyrobots · on Aug 20, 2010

Back in school I helped write a file system based on ramfs that used the ram in all of the computers in our lab as one big shared super fast hard disk. It was a great learning experience.

I wouldn't say it's necessary for every programmer to know how the linux vsf system works and how files, directories and links are stored behind the scenes, but it is really interesting. Here's a good description of vfs http://www.mjmwired.net/kernel/Documentation/filesystems/vfs... . The implementation in ramfs is pretty simple (compared to ext3). You can find it in your kernel source in fs/ramfs/inode.c. Theres a function in there called ramfs_mkdir that allocates and configures a new inode. Anyway, thats how i'd answer the question.

rnicholson · on Aug 19, 2010

Why wouldn't this question be on StackOverflow?

MarkBook · on Aug 19, 2010

There's a scavenging pack of higher rep OCD types on SO who immediately kill any question they perceive to be weak

philwelch · on Aug 20, 2010

StackOverflow has its own deletionists now? Is there any other site that does?

I think now that it's happened to more than one online community, it's worth a lot of careful thinking to figure out why and how these communities develop deletionist subcultures.

DougBTX · on Aug 20, 2010

If your "pack" exists, they are hard to come across by accident.