Hacker News new | past | comments | ask | show | jobs | submit login
Ryan Dahl addresses comments on his recent rant (plus.google.com)
60 points by collypops on Oct 2, 2011 | hide | past | favorite | 32 comments



I agreed with the general ideas in its original rant, but IMHO the problem is not the POSIX API nor the C language (including "volatile"), those are both well designed and simple stuff, for the most part (the C standard library is horrid unfortunately).

IMHO most of the problems are about the other layers: tricks you need to know about the operating system implementation of POSIX, or dynamic library loading, all the subtle things with different binaries formats, and so forth.

A few of this things are easy to solve, for me it is impossible to understand how the libC can be in this sad state, and how the replacements and improvements to it like glib are also a mess. If one day I'll not hack on Redis anymore my mission will be, assuming I'll have another way to pay my bills, to create a replacement for the C standard library.

While we are at it, not C++ nor Objective C are the final words on making C better and more comfortable to use (but I think the latter is much better than the former at it). This is surely an area where there is a lot to do. Unfortunately "D" is diverging so much form C that it should completely replacing it to get mainstream: very unlikely given the role C is playing today and the code base. A backward compatible improvement is still what we need I think.


> operating system implementation of POSIX, or dynamic library loading, all the subtle things with different binaries formats, and so forth.

That is the real pain point. I embed Spidermonkey on not just Linux, but also Solaris, AIX, and HP-UX. While they are all "POSIX", they all have their peculiar issues which make it harder to dev with or write OSS. Autoconf, while hardly the best option, usually allows enough tweaking to get by in these strange worlds. Solaris is usually the nicest of the pack, while AIX is usually the odd ball. Submitting patches to OSS is easy enough to support these systems most of the time, but the annoying part is that things usually regress again quickly and so you're left constantly bug-fixing when things should "just work". Part of the mess could be avoided if vendors all rallied around GCC/Clang/gdb and redirected their compiler efforts to optimizing open compilers for their hardware. Maybe POSIX needs to evolve faster -- if you view an autoconf test for something as an inconsistency that POSIX could standardize away.

EDIT: Here's a perfect example of something which should be "easy" and standard in POSIX. Getting the stack base address of your process. (search for XP_UNIX)

http://hg.mozilla.org/mozilla-central/file/90575e23ea93/js/s...


Objective C is certainly a cleaner object layer over C, but I still find myself writing core data structures and algorithms in C++ on iOS a lot of the time. The Obj-C data structures are so awkward to use and the lack of operating overloading results in a lot of really verbose code for things like intensive string manipulation and matrix math.


What is so bad about the C standard library? What would you change?

I find it lacking a lot of essential features -- can you believe strdup isn't part of the C standard? -- but simplicity and minimalism has always been C's strongest point.


Lack of real string type. String manipulation is a pain and you have to allocate everything by yourself which result in inefficient and dangerous code.


Any code can be inefficient if performance wasn't a concern while writing it. I'd say in that case C string manipulation is more efficient by design because you see how many times you are copying something around. When things are abstracted away in something like std::string then you can wind up with code that creates multiple unnecessary copies by accident (e.g. forgetting a '&' to take a reference). Yes, it is more dangerous, but that can be a tradeoff with performance. For example, assume you have a std::hash_map<std::string, int>. There is no way to insert into this without one string copy (C++11 changes this to make zero-copy possible).


You're imagining a false dichotomy. Having a proper string type doesn't stop you from representing strings as pointers to null-terminated buffers when you need it for the flexibility. Meanwhile, more or less encouraging all software written in the language to use an unsafe idiom (so unsafe that the library itself includes functions that should never be used, like gets()), definitely is responsible for a huge amount of harm.


Well what are your thoughts on the example I gave for hash_map? There is always a tradeoff hidden away somewhere. I guess I'd like to see an example of a concrete string type that doesn't incur performance penalties when used in more complex structures. C++11 move semantics / emplace() on containers will fix some of the performance issues with std::string, but support for that everywhere is a ways off. But in C, you'd be left with something like glib's GString which isn't more than API over a struct.


I think your example is irrelevant - to the general case, which is where a string type is useful. If you need to avoid copies that badly in a specific case, don't use std::string or anything like it.

For example, I know of one compiler architecture which scanned strings from the source, all the while calculating a hash, and basically interned the string (turning it into an index per unique string) without ever actually copying it. Thereafter, the program used the index (an integer) to represent the string, making for fast lookups and comparisons.


I guess I just take issue with saying the "general case" usage of strings does not have to avoid copies. That is how we generally wind up with slow, bloated software.


I don't think most software is particularly slow or bloated. It's been quite some time since I thought to myself, "gee, this software could do with being a lot faster", outside of games and video transcoders (and perhaps iTunes on Windows). On the other hand, a lot of software has buffer overflow vulnerabilities; I see a lot of crashes when input data is fuzzed or corrupted slightly.


Fast enough is a very recent development. Until 2005 and dual core, computers were not fast enough, especially computers running objective-C.


Most C++03 std::strings (including g++'s) are copy on write, so inserting into a hash_map will not copy the string, only update some reference counts.

(Pedanticness: Assuming a few things, like you don't have any references or pointers into the string).


Yes, it just so happens the STL I use switched from a ref counted imp to a "short string" optimized one where tiny strings are kept on the stack. Again, it was done for large scale tradeoffs in the app as a whole once memory usage was analyzed. So without the C++11 enhancements which allow any string imp to perform as best as possible, apps must be aware of how their particular STL works under the hood. (or at least my apps do :))


Shouldn't string manipulation be more efficient if you have to everything by hand? Besides, using a recursive memory allocator takes away most of the pain and it's more efficient than a garbage collector.


It's quite easy to overuse strlen or strcat or otherwise turn an O(n) algorithm into O(n^2). But the efficiency I'd be more concerned with is programmer efficiency finding and fixing security bugs caused by off-by-one buffer lengths and input length checking, and the like (e.g. copying k characters into a char buf[k]).


surely every time you re-invent an algorithm you make it more efficient? part of the problem is that there isn't a standard of "maximized efficiency" algorithms for programmers to use, so anything they come up with could be better or worse and for no real reason ..


Can you not imagine a world where one could operate in native code without the concept of volatile variables?

Sometimes you want the compiler to do its best with register optimizations and code ordering etc and sometimes you have to explicitly tell it to go to memory every time. You could layer an abstraction over this but some people would still need to burrow under it. The point of working at the C/C++ systems layer is you have this control but also the responsibility that comes with it.


Is that a NO?

I think Ryan is just saying that we should continue to question the standards and architectures that have become deeply rooted in our systems. Yeah, posix is the best we have but it's relatively young and there is so much room for new ideas.

Real innovators learn current paradigms and limitations, then forget and destroy them.


Or maybe he is hoping for something more elegant than plain shared memory, at the hardware level.


There are 3 problems with this statement that I can see.

1) people haven't even agreed on the best technique for utilizing multiple cores, or heck, on the difference between concurrency and parallelism, and quite the contrary, I think the x86 architecture is too smart and carries a lot of baggage compared to the elegance and simplicity of pure RISC processors

2) most of the time you're not really working directly with hardware -- like, for example modern kernels do not allow you to access the memory of another process directly, and that value that you wanted to save in RAM may end up in a swap file on your hard-disk.

3) what hardware does have is a separation between hard-disk, RAM, L2 cache, L1 cache and CPU registers. In a perfect world there would only be one type of memory, unfortunately that would be too expensive and without much benefit versus the current state of the art.

For maximum performance (which is required in many instances) you do have to know about the difference between these multiple layers of memory and take appropriate action, even though many times this is abstracted away from you.

The problem with "volatile" is different in nature - the behavior of volatile is not really portable and in practice it is also useless as higher-level APIs, like POSIX threads, do have better atomic and fence semantics that are more portable. Basically "volatile" was added in for good measure and it is now a legacy generating lots of problems.


Volatile is useful in implementing things like lock-free queues, at least on X86. For some real-time applications you can't afford to take a kernel lock or spin on a spinlock.


His first point here really hit home for me. Indeed, people like Ryan Dahl hate software for us so we don't have to.


"""We should program as if we were perched atop of 20 floors of bamboo scaffolding - because that's the state of the Unix userland."""

Genius.


POSIX is beautiful. d-bus != POSIX.


why does he keep mentioning dbus and glib? I havent been following node.js development and last time i checked in, node was written in c++.


I wonder if perhaps he doesn't realize that Ted Dziuba is not a fan of Twisted either. He's generally recognized as a very belligerent, assertive personality, in the same vein as Zed Shaw, and you have to have a certain amount of thick skin when reading his commentary.

That said, the fact that Node doesn't provide the tools necessary to defer blocking JS code to a thread does pose a problem for these sorts of situations. Apparently (and correct me if I'm wrong; I'm not a Node expert!) Node won't let you run JS in any thread which is not the main thread. Twisted does let you run Python in non-main threads with the deferToThread()/callFromThread()[1] functionality.

I also agree with him about JS being a poor language for server-side work, but that's because I don't think JS's object model is well-suited to large, interface-driven/service-driven applications, and that isn't really a gripe with Node.

[1] http://twistedmatrix.com/documents/current/api/twisted.inter... and http://twistedmatrix.com/documents/current/api/twisted.inter... document threading in Twisted.


If you are interested in node beyond a single thread of execution, you are not alone. The core team is also interested. Although the wind is blowing in the direction of a multi-process instead of multi-thread architecture. (The distinction being mostly empty, for the kinds of distribution that make sense in node.)

If you want to get started on multi-process node.js, the http://hook.io project is sort of an Erlang-OTP for Node.js. I've used it on a couple of things, and the API is compelling.


My point is largely that this work is already done in Twisted, both at the multithread and the multiprocess level. This is the first time I've actually seen something in Twisted which simply isn't in Node, period.


What's wrong with JS's object model? It's certainly more flexible than say Java's. If the problem is the flexibility, do you feel the same way about Ruby?


There's plenty not to like about JavaScript, but the biggest mess is this:

js> x = 1 1 js> z = x.y js> // WTF??

In other words: it's not an error to access an attribute of an object which isn't there. It's dicey enough that you can have attributes in Python and Ruby that can be misspelled, and mismatch their declarations, without an immediate compiler error. But, that is reasonable to deal with as long as you have pretty good unit test coverage: after all, if you run the code that is actually accessing the attribute, you'll quickly see that there's a run-time error and fix it:

irb(main):001:0> x = 1 => 1 irb(main):002:0> z = x.y NoMethodError: undefined method `y' for 1:Fixnum from (irb):2

~or~

>>> x = 1 >>> z = x.y Traceback (most recent call last): File "<stdin>", line 1, in <module> AttributeError: 'int' object has no attribute 'y'

... and you need unit test coverage anyway, because the compiler can't save you from a huge variety of other violations, so it's not like this is really making much extra work for you.

In JavaScript, by the time you actually encounter an error, it's too late to figure out where the heck the erroneous object is getting generated. So, if you have some code like 'this.observers.push(object.somefunc);', every test case which adds an observer must also verify what happens when the observer gets called: and it has to be the same test so you have some idea where the observer came from, whereas you can easily make those things different tests in Python.

Then, in order to get reasonable error-reporting behavior from quick things which aren't tested, you have to have tons of manual type-checks anywhere that objects are put into a persistent container, because by the time you have some random 'undefined' in your list of observers, it's far too late to figure out how it got there.

This type of paranoid defensive programming is a bad idea in Python and Ruby, because you can just let the language runtime do its job and inform you if there's an error, and your stacktraces will give you a good idea where it is.

The fact that sometimes unknown things are 'undefined' and sometimes they're 'null' and sometimes they're '"undefined"' and sometimes they're the empty string and sometimes they're 0 really compounds this problem. Python has None, Ruby has nil, and nobody uses random ad-hoc sentinel values because why would you do that?

The thing that just blows my mind is that the designers of JavaScript must have _known_ that this was a terrible idea, because:

js> undefined.undefined typein:1: TypeError: undefined has no properties js> null.nothing typein:2: TypeError: null has no properties

so when you are two steps out from the misspelling that caused your error, you can figure out that something has in fact gone wrong. And:

js> blub typein:3: ReferenceError: blub is not defined

since the assumption that everything you really care about in JavaScript is going to be a global variable, and prototype attributes are kind of an afterthought. After all, why would you store data in organized structures when you can just stuff it all into an undifferentiated bag of crap in the global namespace! It's all going to go away when you reload the page, right?

Except then Node comes along and changes the equation so you actually have to live with your persistent data structures and you probably want to know when things go wrong in your long-running servers. Oops.

Here I really have to agree with Ted: despite the fact that there are advantages to be had from keeping your client and your server in the same language and leveraging your investment in utility libraries in both places, JavaScript really is bad enough language that it's worth the effort to use something different when you can.


Implicit coercion is always iffy, and JS's coercion rules are broken along the same lines as PHP's, where == and === have to be carefully managed. There's no actual identity check, only two different strengths of equality.

Booleans coercing to strings instead of strings coercing to booleans is weird.

There's no operator overloading. Sometimes this is useful, in languages which have it. Notably, there's no way to override how equality, coercion, and arithmetic are handled.

There are no metaclass operations. The type model is incomplete; it's not possible to create new first-class types or query type information respecting inheritance. For that matter, there's no blessed way to have inheritance. Makes sense since the language doesn't have classes per se, but it's kinda annoying in an object-based language to not be able to actually examine objects in a unified way.

Those are the ones off the top of my head. There are others, but they're matters of opinion.




Consider applying for YC's first-ever Fall batch! Applications are open till Aug 27.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: