Hacker News new | past | comments | ask | show | jobs | submit login
Unix Edition Zero (1971) (cat-v.org)
149 points by mbucc on Feb 15, 2023 | hide | past | favorite | 59 comments



You can run the PDP-7 version and an early slightly frankenstein'ed PDP-11 version in simh: https://github.com/DoctorWkt/pdp7-unix https://github.com/DoctorWkt/unix-jun72


Very happy to find this:

http://fortunes.cat-v.org/

I was looking for a source of good stuff to import into my Bugzilla "quips" database, and a lot of these UNIX'y "fortune" entries would make great grist for that mill.


TIL PDP-11 UNIX had an eight-character file/directory name limit https://retrocomputing.stackexchange.com/questions/23917/ori...


PDP-11 running RSTS/E had 6.3 filenames, and iirc no sub-directories (each user account (grouped by project) was a single directory). Remembering what “EDS023.DAT” was for could be a headache on large projects.


in 6th edition it was already 14

https://github.com/memnoth/unix-v6/blob/master/sys/ken/nami....

https://github.com/memnoth/unix-v6/blob/master/sys/param.h#L...

i guess this 'edition zero' is from 01971, and 6th edition is 01975


To be pedantic, V0 actually refers to the PDP-7 UNIX released in 1969.

1971 saw the port to the PDP-11/20, which was released as UNIX-11 and soon after, V1 UNIX.


The edition numbers refer to the manual pages more than the software, which in the early days was under continual development running on approximately one machine. It wasn’t released in 1971.


or 01969, though it did exist


Discussed at the time (of the McIlroy email):

Newly discovered earliest draft of a Unix manual (1971) - https://news.ycombinator.com/item?id=10794189 - Dec 2015 (40 comments)

The Unix Time-Sharing System, unpublished draft (1971) [pdf] - https://news.ycombinator.com/item?id=10660727 - Dec 2015 (1 comment)


Draft Unix manual written by Dennis Ritchie circa 1971---at a time when Unix had been running for a "few months" on the PDP-11.


seems extremely similar to modern unixes, though of course it's only a small subset

main differences seem to be

- filenames were only 8 bytes instead of 14 or 255

- devices were in the root directory instead of /dev: /ppt, /bppt, /rppt, /tty, /ctty, /tty1, /tty2, /rtty, /tap0, /tap1 (magtapes), /disk (the disk), and /system (the kernel memory)

- there were no groups, just six permission bits (u+r, u+w, o+r, o+w, a+x, and u+s)

- creat is spelled with an e (and evidently you couldn't open() nonexistent files)

- instead of lseek you only have seek, with a different argument order and presumably only able to handle offsets of up to 64KiB (less of a problem on a "256K word disk", which is presumably 524288 bytes, according to the note on p. 12 that a 64-word block was 128 bytes), which explains the name of lseek

- correspondingly, there were no doubly indirect blocks and so the maximum file size actually was 64 KiB

- the tty really was a tty, no video terminals!

- "i-number" and "i-node" had hyphens, and the "i" stood for "identification"

- the shell prompt was "@", and there was no control flow or pipes in the shell; it's not mentioned but I think that around this time there was a `goto` command which would seek standard input until it found the specified label before returning control to the shell. i/o redirection did exist, and so did argument quoting

- `cd` was spelled `chdir` even in the shell, and was the only shell builtin

- correspondingly, no standard error yet; the famous phototypesetter error printout had not yet happened because there were no pipes

- no shell wildcards

- no PATH, because no /usr/bin yet (they hadn't bought the /usr disk yet)

- no -o flag; `as` always wrote its output to `a.out` and you had to `mv` it if that wasn't what you wanted

- `fork()` returned to a different location in the child process, instead of returning a status value and making you test the return value. this would have saved me great embarrassment at my first sysadmin intern position when i accidentally wrote a fork bomb (by getting the test backwards) and ran it on the departmental nfs server

- no `execve`, just `execle` (called `execute`)

- no `waitpid()`, `wait3()`, etc., just `wait()`

- no `select()` or sockets of course, nor any of the other bsd niceties

- traps weren't handleable, so there was no SIGINT yet, just SIGQUIT, with the same ^\ keystroke it has today (but no way to attach a signal handler to it); no concept of tty process group (or indeed process groups at all) so ^\ could kill a random background job. but there was an `intr` system call to disable these breaks (presumably so you didn't kill the shell)

- no C yet, just B, but evidently the B compiler was already generating native code instead of stack bytecode as it had a few months before. consequently the system calls are documented in terms of machine registers and assembly instructions

- no `rename()` system call; `mv` linked and then unlinked the file

- no environment variables

- `time()` was still provided in sixtieths of a second since "the start of the current year" (so at the time the Unix epoch was the beginning of 01971, not the beginning of 01970) and was 32 bits (the AC and MQ registers)

- symbol tables were written to a separate `n.out` file

- no `ptrace()` and so no live debugging support (except for the kernel!), just inspecting core files

- `tar` is called `tap` and cannot put its archives in files other than actual physical magtapes

sadly page a7 is missing from the scan


My first Unix was v6. It had seek() but no lseek(), which was introduced in v7. We converted several programs to use it.

The tape archiver in v6 was called tp. I never used tap, which was earlier. Tar was introduced in v7. It was a complete re-write and used a different on-tape format.


thank you very much, i did not know these things


> no C yet, just B, but evidently the B compiler was already generating native code instead of stack bytecode

No, B always generated interpreted/threaded code on UNIX, certainly on the PDP-7. We have the runtime and B library from that time.


the manual seems to claim otherwise when speaking about the -11, in that it says the B compiler produced a.out and n.out files just like the assembler (presumably by invoking it)

you could do this just by aggregating the interpreter (which is never mentioned) with the bytecode in a single file, but that would imply making 10 copies of the interpreter if you had 10 programs compiled with B, on a 512-kibibyte disk; a highly suboptimal tradeoff under the circumstances

but even in such a case what symbols would you put in n.out


The interpreter is part of the binary yes. I have reversed what was available here: http://squoze.net/B/ and written my own B compiler that generates the same threaded code as well: https://github.com/aap/b


thank you very much for the extremely awesome correction

what do you suppose the b command put in n.out


I actually didn't know about n.out before. It wasn't mentioned in the v1 manual so they must have gotten rid of it by then. However it seems that it is just a symbol table that's generated by the assembler. B code is eventually also passed to the assembler so it would just contain whatever symbols were in the assembly i suppose.


Oddly enough, it looks both more and less featured than MS-DOS, and the PDP-11 it ran on was somewhat close to the first IBM PC in memory and CPU speed.

Some comparison points with MS-DOS 1.0:

- 8.3 filenames

- no subdirectories

- single user

- 32-bit file sizes and offsets (64k file size limit was definitely not enough by then)

- FCB-based API (not file handles), FAT12 filesystem with 32MB limit

- Hardcoded device names special-cased

- No pipes nor redirection either

- No multitasking nor TSRs

- No networking at all

- Environment variables present

- Supports dates up to 2099

- Kernel, shell, and utilities written in 100% Asm

- Live debugger (DEBUG.COM)


yeah, i think some of that was cp/m compatibility crap; too bad about 8.3

the other thing is that the pdp-11 had working segmentation, the 8086 didn't, so trapping faults in user processes so they couldn't break the kernel would have required some kind of interpretation or something

did you know microsoft was shipping xenix in 01981 (the same year they started shipping qdos/ms-dos/pc-dos) and shipping xenix for the 8086 in 01982, and that seattle computer products was selling 8086 xenix boxes in 01983


According to some sources, which I sadly cannot point out to, just vague memories from somewhere, so take it with a grain of salt.

Early MS-DOS development used to be done from those Xenix environments, they would cross-compile to PCs, until later on, did they migrate to MS-DOS directly.

Most likely around MS-DOS 5, given the MS-DOS 3.3 resources and how MS-DOS 4 development went.

On the other hand, there is the what-if alternative universe of what would have happened had they kept Xenix around.


Hmm. Wasn't it done on a dec-10?


you're thinking of basic-80, which gates and allen (and davidoff) did (in 01976?) on the dec-10 at harvard, but we're talking about qdos, which tim paterson did in 01981 at seattle computer products before selling it to microsoft


Ahhh yes. The great distance of time made me forget that DOS is actually relatively new.


> pdp-11 had working segmentation

The earliest models didn't. See: https://gunkies.org/wiki/PDP-11/20


indeed, and https://www.bell-labs.com/usr/dmr/www/odd.html reminds us that in fact in the time period we're talking about, the pdp-11 that unix was running on didn't have working segmentation, so anyone could crash it easily and people often did so accidentally

this also explains a remark in the manual that puzzled me about the break() system call (p. 43, §A1.17):

> To save time, UNIX does not swap all of the 4K user core area when exchanging core images. The locations swapped are those from the beginning of the core image to the initial program break, and from the top of user core down to the stack pointer. The initial program break is determined by the size of the file containing the program. The system’s idea of how much to swap may be altered by using this call:

    sys break
    newbreak
> newbreak becomes the first location not swapped. If it points beyond the stack, or to the verify first word in the core image, the entire core image is swapped.

in later versions of unix, of course, attempting to access memory after the break would result in a segmentation fault, but evidently in this version whatever you wrote there would just sometimes be lost when 'exchanging core images' — and presumably it wasn't just the segment limit that was missing on the -11/20, but also implicit indexing off the segment base pointer, which would imply that a mere context switch would require swapping the user program out in this way, just as on the pdp-7

i don't suppose anyone else here has knowledge if this inference is correct?

in any case, thank you very much for this correction!


The first time you zero-padded a year to five digits, I thought it was an accident, but there's two more instances of it. Why do you write 1983 as 01983?


Probably due to this: https://longnow.org/

Personally I think it's a harmless but also not useful affectation (but I also think our chances of making it into the 10,000s as anything still using equivalent dates are slim at best).


It's not harmless. 1) It's annoying. 2) It's off-topic - this isn't an article about the long now, or about dating systems. 3) It's inefficient to write dates not in the standard format - it makes everyone waste mental energy trying to figure out how to parse it.

The third point is especially bad for this article, where in context, the assumption should be that 0xxxx is octal, which isn't what he's actually doing at all.

So the upshot is that, rather than efficiently communicate what he's trying to say on this topic, he'd rather grind is axe on an unrelated topic. His choice, I guess, but I think it's a bad one. And an unfortunate one - he's got really good information on this topic, and his choice has hijacked us into talking about his date format.

And, thinking about it, it's not really the "long now". It's more like the "medium now". If it were the long now, he'd have several more zeroes in front.


You'd hate the Holocene Calendar then. (This year is the year 12023.)


where in context, the assumption should be that 0xxxx is octal

That's what I thought it was too. This convention carried over into C and its derivatives today.


amusingly 8 and 9 would continue to be accepted as octal digits by the c (or b) compiler until ansi c forbid them

a priori, though, '01982' is not very likely to be intended as octal either


the degree to which you are projecting your own antisocial behavior onto me is astounding

i decline to take responsibility for your emotions

i would appreciate it if you would stop harassing people because you don't like the date format they use, their haircut, or their clothes


Despite the fact that they've got a point, in my opinion (which I won't try to impose onto others, be reassured), I don't how it could considered harassing. He wrote one comment, that's all. Also, that comment didn't talk about haircuts or clothes, and I don't see how they could be seen as antisocial through that comment.


Thank you.

But in fairness to kragen, I have complained about his writing dates that way before. He might at least feel harassed.


Antisocial? If you do something in public, I can complain about it in public. If the complaining can be antisocial, so can the doing be antisocial. So think well before you throw that charge at me.


That's a bit like his signature but it's also fairly annoying, it doubles the time required to interpret the number as a year, and I usually have to read the number twice.


That's the main reason I asked. It took longer than normal just to read a date, and it apparently has no useful benefit (yes I saw the side-comment about the "Long Now").


> `tar` is called `tap` and cannot put its archives in files other than actual physical magtapes

Insofar as "everything is a file" applied at this point, could you just mv /tap0 out of the way and put a file there? If not, which of the few syscalls that had been invented so far was tap(1) using against /tap0, that couldn't be used against files?


it's hard to be sure from looking at a manual to a version of unix whose source code has been lost, but it seems that this might work

conceivably though /tap0 had different seek() semantics than regular files or something (it was a dectape)


Thanks for the summary!

> - creat is spelled with an e

You mean it was 'create'? Funny if 'creat' was a mistake that stuck.

My favourite in this genre is 'makunbound'.


right

some cases of this relate to identifiers only having 6 significant characters in some object file formats (typically a 32-bit or 36-bit field using rad50) but i don't know if lisp's makunbound was such an example, and creat surely was not


Amazing stuff, amazing job. Most of what’s written here is still relevant. It’s hard to concieve that all the unixies here are the rock-solid bases of current UNIXes


cat-v is such a website. i love it. nothing like it these days


Dive into Inferno and Limbo documentation, discover where Go roots originated, what Go still misses, and the ideas of an Android like OS with managed userspace.


this is great. i realized that manuals and documentation might have used a lot of paper back then. the new PDF that was typeset using troff is half the length of the original!


I find reading the scan of the manual more pleasant and nostalgic than the troff version! Haha.


I don't think so. It's exactly the same as the original (give or take fonts). The original you see is a daisywheel printed version, so much less dense. If someone had output the doc on a phototypesetter way back then, it would have looked much the same as the "new PDF". fwiw the main point of developing Unix was to facilitate this kind of "room-top publishing".


I love reading this. It feels perfectly familiar.


Curiously, it describes create instead of creat.


and ken in the go source tree in 2009:

> spell it with an "e"

https://go.googlesource.com/go/+/c90d392ce3d3203e0c32b3f98d1...


Tangent: I've yet to find what I'd deem a "definitive" answer on why the 'e' was dropped. Sure, Ken said it was his biggest mistake, but every answer on why it's missing just point to the quote from him. I've seen suggestions related to a possible six-character limit (from Radix50 - "50" being octal for 64), but "creat" is only five. And being a mistake, why was it never fixed early on?


And being a mistake, why was it never fixed early on?

Maybe because it became part of the jargon and served a useful purpose --- if you mention creat, it's certain you're talking about the system call and not creation in general.

Personally, I like names which are close enough to an existing concept to be evocative of the meaning, but also distinct enough that, just like "byte" vs "bite" and "nybble" vs "nibble", a (non-stupid) search engine can give relevant results.


I was going based on the theory of "5 characters plus leading underscore hits linker 6-character limit", but then this document talks about a system call named "execute".

If "execute" was really called "exec" in the source, then none of the syscall names I saw are longer than "creat".


Except in this document, it's actually documented as "create" (with an 'e')


Maybe this document predates (discovery of) the linker limitation?

My understanding is that the linker didn't really limit the identifier length, it's just that only 6 characters were used to match them. So, shortening names create->create, execute->exec might have come afterward, as a clean up round to make it less surprising.


That would make sense, but it's still speculation. Preferably, we'd be able to get a definitive answer from Mr. Thompson himself.


> 50 being octal for 64

Looks like 40 to me... 64 would be 100.


it looks like this unix had B programming language (C was still not invented)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: