I'm currently reading The Design of the UNIX Operating System [0], so this is super interesting for me to watch. I love how he creates a spellchecking program live, while being filmed.
Something that occurred to me when I first saw this video is the number of women featured, I can't say with certainty if I've seen a recent major tech demo video from a large company feature as many.
Thank you for mentioning this - there's a perception that CS and tech has always been male dominated, but there were plenty of women programmers in the 80's and the field has become even more male-dominated since. It's really annoying that one of the primary qualifications for tech CEO's is apparently 'be a tall man, ideally white but Asian is OK too'
"In the United States, the number of women represented in undergraduate computer science education and the white-collar information technology workforce peaked in the mid-1980s, and has declined ever since"
Indeed! At one point a "computer" was a person, and this was also a field where women were relatively well represented (when compared to other fields during the same time periods).
I owned an AT&T 3b2/400 multi user system in 1985 running Unix System V. Came with a bound set of loose leaf manuals (might have been about 10 of them iirc). [1]
I learned pretty much everything I needed from those manuals.
It's weird having never used a real terminal, because while I understand what it means that Unix is a multi-user system (and have obviously used terminal emulator programs as well as ssh-ing into servers, etc), the notion of a big physical box with arrays of dedicated ports that physical terminals plug in to is really alien.
It was really cool and fun even though a simple green terminal screen. I had one in front of me and one in back of me. The one in back was typically logged in as root. (Was an office with a lock and a small enough place that I didn't have to worry about anyone knowing what that was..) Hard drive was 70mb with 4mb of memory. And a tape drive for backup. (System was perhaps 30k in 80's dollars iirc approx etc.)
That said right now I'm staring at 3 Thunderbolt displays hooked up to a Mac Pro with over 25 terminal windows going. All in different colors. (Well actually not different colors for all of them).
But using even the simple green screen was really cool. Definitely got a buzz using that system.
I remember the first test to show that it was "multi user". I put two of the terminals next to each other, hit return at the same time, and ran two things at concurrently. (Might have been ls -Rlt / or something like that.)
>Might have been ls -Rlt / or something like that.
Ha, I used to do that sometimes, to make long-running disk I/O happen so that I could check something or the other that needed the I/O happening. Redirecting a process's output to /dev/null and putting it in the background with & can also be useful for many things.
I like that too, I just wish UNIX IPC pipes had more metadata than throwing raw strings at one another and hoping the receiving process understands.
I know people are going to roll their eyes at this, but Powershell's IPC constructs are a step forward. Everything inherits from Object which has the following prototype:
public class Object
{
public Object();
public string ToString();
public bool Equals(object obj);
public static bool Equals(object objA, object objB);
public static bool ReferenceEquals(object objA, object objB);
public int GetHashCode();
public extern Type GetType();
}
So if you want to just do strings you still can (string will inherit from type object, and must implement toString()).
However if you want to do a List<Object> you still can, and better still the receiving process can use GetType() which has to be implemented, and then have a few different workflows to handle the type (e.g. List, string, et al).
UNIX was created in 1969(!) so nobody can blame them for designing it the way they did. However if you were creating UNIX today you'd definitely want to look at OOP and inheritance as a core pillar.
Defaulting to a data serialization scheme like Avro or Protocol Buffers could be used here, especially with a small amount of shell support to automatically convert terminal-destined data serialization streams to a text string.
It would be an interesting to see if any such projects exist, because it would fit very easily into the current pipe IPC infrastructure, with the addition of transformative pipelines for interfacing with the existing string-base utilities, but that's necessary anyway.
I mean to say that there's no need to modify existing IPC mechanisms, just a need to establish a common serialization standard and embed it into a shell.
>Defaulting to a data serialization scheme like Avro or Protocol Buffers could be used here
>I mean to say that there's no need to modify existing ICP mechanisms, just a need to establish a common serialization standard
PS gives you two things though. It doesn't just give you objects, it also gives you the behaviors of those objects as implemented by the runtime (CLR) that is common between both endpoints of the communication.
For example, consider a command that returns a list of processes running on the current session as a List of Process objects. This list is then piped to a command that filters the list to a sublist of processes that have a particular Name property. Finally, this sublist is passed to a third command that calls the Process.Kill() method on each of those objects.
This requires that all three commands not only agree on the structure of a Process object (as protobuf would provide) but also that it has a Kill() method, etc. This is possible in PS because all three commands are running in the same common runtime, but not possible with generic IPC.
IPC of complex objects has existed on Windows even before the CLR with COM and DCOM, and again both of these are more than serialization mechanisms, since they must support cross-process data as well as behavior.
Interesting, in a unified object programming environment like the CLR I could see that being very useful.
Such live methods could also be implemented in, say, Python by constructing objects from an Avro data stream. Or structs/stubs could be autopopluated in C with the proper library, but that would require the entire POSIX C world to be codified into a standard serialization.
It would be a lot of work, but I think that it still fits inside an Avro-style data serialization scheme, plus proper runtime support. The runtime deserialization is really where the magic happens. And I think the serializaiton, deserialization is necessary to maintain proper process separation, particularly in languages like C.
I can write *nix command-line utilities with ease in any language that I please. Can the same be said for powershell utilities? My understanding is that you are pretty much restricted to using CLR languages.
If we are accepting that sort of languae lock-in, I'd rather get something like scsh off the ground.
That's a little circular. You can use any language you please because all languages support pipes as a concept and "understand" stdin/stdout/stderr, etc.
There's nothing inherently linking the CRL to using objects within pipes. You could use for example JSON if you really wanted.
Reading/writing to/from files is a FAR lower bar than linking with the CRL. It's a lower bar than even using JSON over pipes; most languages will have good JSON libraries but none of them are as trivial as reading/writing lines to a file.
I'm not opposed to lifting ourselves above text over unix pipes, but I think that if we are going to do it then we should go in for the penny, in for the pound. Such a system will be incompatible with existing unix tools, so we might as well drop all the technical debt. I don't think powershell is a large enough break from the traditional unix setup.
Aside: I've noticed that there is an open source implementation of Powershell for Mono called `pash`. Does anybody use this?
That is not language lock-in, it is ABI lock-in, and that is exactly the same with Unix pipes.
You can complain that the increase in ABI complexity isn't worth the extra features, but I don't think it is fair to complain about increased ABI complexity, period.
> "Reading/writing to/from files is a FAR lower bar than linking with the CRL."*
If I want to write a unix utility in C, or Java, or C#, or Lua, or Racket, or postscript, or brainfuck..., I can. Not just in an academic sense, but in a "it is actually reasonable and trivial to do so" sense. The list of programming language implementations that cannot read/write to/from files (knowledge of the concept of "pipes" is unnecessary, the user's shell opens those) is vanishingly small.
For the average developer who just wants to use Powershell/bash/whatever, it is absolutely language lock-in.
PowerShell utilities are implemented as CLR classes inheriting from a particular base class. As such, they must be written in a CLR language (or PowerShell itself, of course).
Edit: Of course, you have the option of writing a minimal wrapper in PS that shells out to your other-language command but the interface between PS and non-PS programs is still based on a string commandline.
The other key limitation of PowerShell is that its commands all run in the same process (e.g. PowerShell is akin to a REPL). By contrast, UNIX programs each run in their own process, giving the programmer a much greater degree of freedom in the design and implementation of each program.
You can do IPC with PowerShell [1], but it's nowhere near as simple as UNIX-style piping.
PS has the concept of jobs, an amalgamation of code and environment that is serialized to a separate powershell.exe process (potentially even to a different networked Windows machine). Communicating with these remote commands is identical to communication with a command - piping in and out of objects.
About 'IPC' that talk about Lisp Machines single namespace was interesting. Passing pointers instead of serialized data. www.youtube.com/watch?v=o4-YnLpLgtk
> However if you were creating UNIX today you'd definitely want to look at OOP and inheritance as a core pillar.
I doubt it. Imposing structure and types on IPC would make it very difficult to compose separate programs.
Suppose I created UNIX-2, where the only difference between UNIX and UNIX-2 in principle is the fact that UNIX-2 programs all pipe serialized Objects to and from each other, instead of byte streams.
Now, the ls program obviously outputs more information than just a list of strings--it also outputs types of files, permissions, owners, inode numbers, sizes, timestamps, etc. I might be inclined to have ls output an lsOutputObject (derived from Object) that encapsulated all this information.
Suppose I wanted to pipe the output of ls into wc. How does wc handle an lsOutputObject? Either wc is programmed to know how to handle lsOutputObject, or it is not. Since we want an object-oriented environment, we'll assume the former case, so wc can call the appropriate object-specific methods and access the appropriate object-specific fields. But, now wc is tightly coupled to ls.
This problem generalizes. For a given program P in a set of N programs, wc will need to know how to access P's output-object-specific fields and methods. So, wc needs O(N) different subroutines to interact with the N other programs. This does not scale--each additional program I add to UNIX-2 will require me to write O(N) additional IPC handlers--one for each program.
The only way to avoid this IPC-handler-explosion in the design is to define the set of IPC objects a priori and mandate all programs know how to handle them. Then, there are O(1) IPC handlers per program, and adding a new program does not require me to couple its implementation to any other programs. This is effectively what UNIX does: there is one IPC object--a string of bytes. In UNIX-2 I could have more types of objects, but the fact that they're defined independent of the programs means that I will still be "hoping the receiving process understands" when I give it data from arbitrary programs.
Suppose I relax this a priori object mandate above to allow programs to extend the base IPC object types. But then, programs that do so will only compose with programs that implement IPC handlers for their extended object types. In this scenario, I can expect there to be disjoint sets of programs that compose with one another, but not with others. This is effectively what happens in SOA/microservice architectures: you get a set of programs that are composible only with programs that speak their (arbitrarily-structured) messages (the set of which is much smaller than the global set of SOA/microservice programs).
My point is, trying to enforce OOP on IPC will take away universal program composibility, which is the killer (if not defining) feature of UNIX.
But there is already a de facto data serialization in Unix: tabs/spaces as field separators and newlines as record separators. That's what allows ls and wc to interoperate, but it makes it such that ls and cut don't interoperate without a transformation in between, because of ls's use of columnar layout rather than separator based layout.
Or the output of a 'uniq -c', you can then 'sort -g' to order lines by the number of occurrence, but if you want to take the top 5 lines and discard the counts with 'cut', you need whitespace transformations in between. (AWK would be an alternative to cut that performs the whitespace conversions on its own, but AWK is a full blown programming language, so one may as well have something that deals with dictionaries, arrays, etc, as input types anyway).
All this is to say that the untyped bytestream relies on conventions in Unix to make the bytestream useable between composable programs. These conventions are adequate for current uses, but show some weaknesses, and suggest that perhaps there are additional conventions, that if sufficiently simple, could be used to build composable programs that don't need to understand a format that is particular to just one program.
I totally agree that objects (meaning data + associated data-specific code) are probably overkill, though optional object interfaces may be nice if the programmer is willing to pay the computational cost, like one does when using AWK over cut. And I definitely think that inheritance is an idea best to avoid for data.
> But there is already a de facto data serialization in Unix
De facto is a far cry from "required by the OS". You'll notice that the programs we both mentioned (as well as programs that interpret whitespace this way) tend to operate on human-readable text, which uses whitespace to separate words. However, there are many other programs that do not operate on human-readable text, and do not rely on whitespace to delimit records in a pipe (or other file-like construct). Thus, the OS should not try to enforce a One True Record Delimiter, since there isn't one.
> ...and suggest that perhaps there are additional conventions, that if sufficiently simple, could be used to build composable programs that don't need to understand a format that is particular to just one program.
This really is the heart of the matter. There is a trade-off between the specificity of the conventions and the freedom of the program to interpret the data however it wants. UNIX is at (almost) the extreme right end of this spectrum--the only convention it imposes is that information must be represented as 8-bit bytes.
My question to those who feel that UNIX is too far to the right on this spectrum is, what are some conventions that can be adopted universally that won't break composability? I'm not convinced that there are any. Even simple things like requiring programs to communicate via a set of untyped key/value pairs (where each key and value is a string of bytes) would be risky, since it could easily lead to the creation of disjoint sets of programs which only work with members of their own sets (e.g. members of each set would require set-specific key/value pairs).
I mean we have to do the C->Bash->Awk->Bash->C precisely because nothing actually works together and thats with just text. If we did mandate some IPC and ran across a program that couldn't speak it? Well as long as our smart program had a toString equivalent we wouldn't be any worse off than we are now. But between programs that spoke it we'd be better off.
> The thing is we're already dealing with disjoint sets of programs and broken composability. Thats why so many shell one liners take the form...I mean we have to do the C->Bash->Awk->Bash->C precisely because nothing actually works together and thats with just text.
You seem to contradict yourself :) Yes, you might have to munge some text to get the disparate programs to work together. But, at the end of the day, you get the result you want, because munging the data is possible. Contrast this to a world where munging the data is all but impossible, since each program speaks different objects. Then you'll have some real breakage on your hands, unless you can write O(N^2) inter-object mungers to replace the O(1) text-based mungers you were using before.
> If we did mandate some IPC and ran across a program that couldn't speak it? Well as long as our smart program had a toString equivalent we wouldn't be any worse off than we are now. But between programs that spoke it we'd be better off.
How would this be an improvement? Most programs (i.e. the ones that don't know your "smart" program's IPC object format) would fall back to using the toString() method. The comparatively small set of programs that can use your program's IPC objects would be tightly coupled to each other, meaning a change to the IPC object structure or semantics will require modifications to most/all of the programs. If anything, your proposal exacerbates the "disjoint sets of programs" problem, and has the effect of turning the legacy/compatibility option (toString()) into the ironically future-proof method for interacting with "smart" programs.
Oh, I totally agree that the byte stream should be only stream that the kernel itself is concerned with. I'm merely arguing that there's room, in addition to the current situation, for having more highly structured data streams between Unix CLI utilities. I've had great success with piping binary formats for very specialized use, both for IPC and across the network, and I'd like to see additions to the Unix toolset that take advantage of some of those greater capabilities, in a language- and OS-indpendent way.
> I'd like to see additions to the Unix toolset that take advantage of some of those greater capabilities, in a language- and OS-indpendent way.
Don't get me wrong, I won't turn down structured IPC just because it's structured :) However, I have yet to see an IPC system that (1) gives you more than a byte-stream, and (2) allows for universal composability. Every IPC system I have ever come across sacrifices one of these properties to make gains in the other. While I haven't ruled it out, I'm skeptical (but would love to be proven wrong) that there exists a form of IPC that can meet both requirements.
I sincerely hope you are joking. Dbus-as-IPC doesn't make for composability at all, since each program must be aware of (1) the names and signatures of each other program's dbus-accessible methods, and (2) each other program's behavioral semantics with respect to the method invocation.
> The only way to avoid this IPC-handler-explosion in the design is to define the set of IPC objects a priori and mandate all programs know how to handle them. Then, there are O(1) IPC handlers per program, and adding a new program does not require me to couple its implementation to any other programs. This is effectively what UNIX does: there is one IPC object--a string of bytes.In UNIX-2 I could have more types of objects, but the fact that they're defined independent of the programs means that I will still be "hoping the receiving process understands" when I give it data from arbitrary programs...My point is, trying to enforce OOP on IPC will take away universal program composibility, which is the killer (if not defining) feature of UNIX.
Not necessarily. You could pass S-expressions instead of bytes. Parsing would be much easier, and there would be no loss of generality.
Either wc needs to specifically be aware of the structure of the S-expression ls emits in order to know which atom or subtree of atoms make up words and lines, or ls must emit an S-expression with a structure that is globally mandated a priori. The former case loses generality (wc must be aware of ls's behavior), and the latter case gives wc no help in interpreting the S-expression (which is the original problem OP pointed out).
He seemed to have said: "I worked entirely on Plan 9, which I still believe does a pretty good job of solving those fundamental problems."
I would like to hear what exactly was done better in Plan 9 in regards to IPC. If I recall correctly Plan 9 is open source now - is there more development going on now or is it stalled?
There are forks which are undergoing active development like 9front, although I don't think anything from Bell Labs is still coming with respect to plan 9.
There's a wonderfully entertaining moment around the 16 minute mark when Lorinda Cherry demoes a talking calculator with pipelines. Not satisfied with it saying "five" after evaluating 8-3, she commands it to raise 2 to the hundredth power, providing ample time to finish her coffee.
oh man, I wonder if that talk program was used by Kraftwerk for their great "Numbers" single, https://www.youtube.com/watch?v=4YPiCeLwh5o ? The two audio sources sound similar.
Try convincing an old VMS hand that Unix is easier to use. VMS's command interface is aimed at system administrators who haven't had enough coffee and are prone to making mistakes. Unix is notoriously intolerant of mistakes -- so nuch so that one fat-fingered keystroke can cause Bad Things to happen.
Lorinda Cherry et al's references to "pipelining" and "unix stream processing" reminds me of the language used to describe Gulp.js and node in general.
I wish they had patented truly novel ideas like pipes, hierarchical file systems and "everything is a file" and then licensed it so that anyone releasing software under an open source license can use them and not others.
That would have put an end to others like Microsoft, Apple etc from taking all these innovations and then further "innovate" with 'phones with rounded edges' or 'performing an action on a structure in computer-generated data', patent them and extort money out of free software. Not that there was any way they could have known that.
I wouldn't say Microsoft or Apple stole these concepts. Patents are horrible. Those proprietary systems would exist anyway, but without patents, the users of them can also benefit from these ideas.
Are you aware that Apple's OS X is actually based on the original UNIX source code? There's nothing proprietary about that.
That isn't really true. OS X is based on BSD, which is not "the original UNIX." The original Unix was closed source and was actually quite proprietary.
None of the original code is left, though. Mac OS X is based on (for the Unixy bits) FreeBSD and NetBSD, which are based on 4.4BSD-Lite, which was BSD with all the AT&T-derived parts chopped off.
[0] http://www.amazon.com/The-Design-UNIX-Operating-System/dp/01...