Hacker News new | past | comments | ask | show | jobs | submit login
Functional programming and the death of the Unix Way (newcome.wordpress.com)
90 points by jemeshsu on March 25, 2012 | hide | past | favorite | 33 comments



The gist—as I understand it—is that Unix programs are essentially all equivalent to functions : ([String], TextStream) -> (TextStream, TextStream) which limits how one can tie them together and causes a proliferation of text-processing tools (awk, sed, grep, cut, &c &c) which is not as elegant as functional programming. There are already some Unix command-line tools that work on structured data as mentioned, e.g. jsawk[1] is a tool inspired by awk whose scripting language is JavaScript and which works on JSON.

As an aside, I'd like to see people experiment with these concepts on an OS level while consciously targeting Xen or other virtualization systems. Already, Haskell can run barebones on Xen using HaLVM[2] and the successor to Plan 9, Inferno[3], was a virtual machine that could run either on bare metal or inside another OS. I can imagine an entirely new OS would meet some resistance—like Plan 9 did—but supplying an OS intended to be virtualized would let people experiment freely within their existing OS.

[1]: https://github.com/micha/jsawk [2]: http://halvm.org/wiki/ [3]: http://code.google.com/p/inferno-os/


Just because data is in a text stream doesn't mean it is not structured. Awk is designed to handle structured text for example, the output of most (traditional) Unix tools is 'structured' too (eg., ls and ps).

I recommend you read The Unix Programming Environment which gives many illustrations of how the Unix 'text streams' model works in practice.

One of the authors of that book and other folks that created Inferno (and Plan 9 and Unix) are now at Google working on Go.

Go channels are obviously not the same as Unix pipes, but there are similarities.


Traditional Unix stream processing assumes a specifically line-oriented structure, though, and is somewhat awkward in other cases. The Plan 9 people recognized that, and made some attempts to improve it, for example with Rob Pike's concept of "structural regular expressions" to describe streams that have a different "shape" than an array of lines: http://doc.cat-v.org/bell_labs/structural_regexps/


Not really sure how you jumped from structured data in Unix to a bare metal OS, but this project came up a few weeks ago, OCaml programs on Xen with no OS:

http://www.openmirage.org/wiki/papers (down now, doh)

Seems to have some similarities to halvm.


It's back up now; the ML TCP stack was running a pcap dumper for debugging, which didn't cope well with a hackernews link.

Hacking is going pretty well on Mirage. The longer-term plan is to generalise the support libraries to work with other languages (particularly HalVM and GuestVM for Java), but it's far simpler to work with just OCaml for getting the first cut out. The Xen Cloud toolstack (also written in OCaml) is currently being adapted to support low-latency microkernel establishment, which will remove much of the hassle of coordinating multiple Mirage 'processes'.

Another interesting performance-related aspect has been the heavy IPC workloads that result from using many VM-VM interconnects. Some early benchmarks in http://anil.recoil.org/papers/2012-resolve-fable.pdf .


The (inadequately developed) point was that this idea could be built into an entirely new operating system with e.g. a different notion of a process—as opposed to building new tools on top of Unix—and that this "operating system" could be a lightweight layer like HaLVM or Mirage, so such ideas could be explored without committing to building an entire OS.


Often times I’ll want to grab just one part of a command’s output to use for the input of another command. Sometimes I can use grep to do this, and sometimes grep isn’t quite flexible enough and sed is required. The regular expression required to get sed to do the right thing is often complex on its own, and of course the flags need to be set appropriately. If the data format is in columns, sometimes cut can be simpler.

Master Foo nodded and replied: "When you are hungry, eat; when you are thirsty, drink; when you are tired, sleep."

Upon hearing this, the novice was enlightened.

-- http://catb.org/~esr/writings/unix-koans/shell-tools.html


Don't koans generally cast off preexisting notions? It seems the philosophy espoused by that story is "don't think too much about the design of your tools".


I read it as "there are many ways you could solve a problem; rather than obsess over which is The One True Way in All Situations, choose whichever is the simplest path to scratching your itch".


TermKit[1] was a proposal that tried to fix some of those problems by adding two new channels (separating terminal in/out from stdin/stdout data pipes) and using MIME-types (particularly JSON) to the latter.

It got a lot of flack, and I don't agree with everything the author proposes, but I think that part could definitively be improved.

The problem, of course, is backward compatibility: even if you can reimplement and/or wrap the core utils, what about the thousands CLI programs in each distro's repository? You'll end up with an hybrid beast that doesn't really do anything well.

[1]: http://acko.net/blog/on-termkit/


So basically the author is saying:

The UNIX shell pipeline is really just a functional programming language whose functions can only operate on strings. Think how much more powerful, concise, readable, etc. it would be if other data types were supported.

I agree. I don't even think this requires changes at the OS level. Newlisp and Racket's shell attempt might be clunky, and Clojure certainly isn't ready for quick-and-dirty scripts, but it shouldn't be too hard to implement such a language if shell replacement is its main purpose.


> Think how much more powerful, concise, readable, etc. it would be if other data types were supported.

One of the reasons Unix is alive and well today is because of its simplicity. I'm not sure about the wisdom of introducing a system-wide type system.

Let's see who's using Microsoft's PowerShell 40 years from nos.


Unix pipes are about octet streams not strings.


In most hackers' idiolect, the term "string" refers to an implementation-dependent representation of characters. An octet stream or byte stream is the abstraction that UNIX files and the standard I/O streams happen to be instances of. So it wouldn't be wrong per se to use "octet stream" here, but I'm treating each phase of the pipeline as a hypothetical function that operates on text. With less indirection, you could certainly view them as functions operating on streams, but I abstracted that away in my mental picture.


I had an idea to write Clojure parsers for the standard unix command line tools that translates the output into s-expressions.

I got too frustrated - on the one hand, Clojure's slow start up time meant it was no fun to use in a simple pipe

and on the other hand, even the simplist unix util has surprisingly complicated behavior. Like `wc` - it outputs three columns of numbers, right? Well, unless it knows filenames, which go in a fourth column. Or unless you pass it flags, which can turn any set of the number columns on and off. And then I thought, wait, what happens if you make a file which has leading whitespace in the name - it turns out it only outputs a single space in front of the filenames, so any whitespace after that is part of the name.

Which lead me to the conclusion: unix tools aren't actually simple - they are actually really complicated, but you can construct a happy-path of simplicity for most use cases

and functional languages are still monolithic when it comes to interacting with the outside world. Maybe someone will make a service I can run in the background that will run my command-line clojure scripts in a pre-warmed JVM, but that's not a piece of technology that I want to try to write.


Not all functional languages run on the JVM or anything similar. In fact, most don't. See: Haskell, *ML, most Lisps and Schemes, ...


yeah, it's true. But I still think it's much harder to compose two functional programs running in separate processes than to write one monolithic program composing functional libraries.

Compare to perl, where piping text is so simple that chaining perl scripts with pipes is no harder than writing functions.


Look for "nailgun", which prewarms the jvm. Also, in the interim, clojurescript->nodejs looks promising.


> I’m not advocating a return to Lisp machines here. We tried that and it didn’t work. Symbolics is dead, and no one even gave a eulogy at that funeral.

The hell? Lisp machines were awesome. I thought it was well known that they failed partly for political reasons, and partly because Symbolics was absolutely horrendous at business.

The only failing I'm aware of was the lack of multi-user support, which wasn't particularly unusual for era, and even now we don't have anything which can compare to their high points.


Dataflow environments like PureData, Max/MSP, Quartz Composer and vvvv are very much like 2D GUI-driven shells that handle various data types. Max/MSP was, at least, explicitly conceived as a sort of UNIX for multi-media. After working with such tools for years, I must say that the tools the professional programmers use seem comparatively quite primitive in many ways (though much more advanced in others). A graphical environment that utilized the same sort of dataflow model, but with a more general intended audience, more extensible architecture and some concepts from functional programming would have the power to really change how programming is done.


The Unix Way, but piping python objects instead of text: http://geophile.com/osh.


It's not the Unix way if it's restricted to one programming language.


It sounds like the OP wants scsh (The Scheme Shell): http://www.scsh.net


Steve Yegge's The problem of emacs goes over some of issues discussed here, although it focused on text processing.

https://sites.google.com/site/steveyegge2/the-emacs-problem


Try Powershell / Monad for Windows. It even solved Haskell's problem of Monad needing a name change :-) http://en.m.wikipedia.org/wiki/Windows_PowerShell


... and the evolution can continue along the same lines as Unix. I think I'll write a Haskell function that takes two strings and returns a string. The first string is a program in a new language I'll invent. The second is the input to the program, and the return value is the output of the program.

Maybe I could call it "herl".

  herl :: String -> String -> String


This article reminded me of something I read about the Unix model vs the lisp model. I managed to track it down, it's a comp.lang.lisp posting by Erik Naggum.

http://www.xach.com/naggum/articles/3245983402026014@naggum....

The interesting part is the 3rd paragraph of the answer.


I wonder, does this mean we actually want types as metadata on our shell commands? Type of args, type of stdin, type of stdout, or whatever we decide to call them?

All apps would have to speak a single, more complicated language and we would have to do more explicit translation but it might work out better.


Aren't Go channels similar to typed pipes?


I guess, but they're not exactly cross-language inter-process communication. Maybe you could work something like them into the OS, though.


Conal Elliott did something like this with Tangible Functional Programming.


True. However, about the issue that triggered this chain of thought, did you consider using the `cut` utility along with `grep` ?





Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: