His emphasis on simplicity is welcome, but I'm not so sure I agree that simple hashes are the best choice for internal data representations. Exposing all the fields of any particular datatype to all clients seems to invite exactly the kind incidental "complecting" he decries.
Every non-trivial program is going to have to define abstract datatypes and I'm not sure how embedding and dispatching on a type tag in a hash is any better than using the more explicit support for dynamic dispatch you find in typical OO languages.
The real problem with most OO is that it mashes a lot of interdependent, mutable state together.
> I'm not so sure I agree that simple hashes are the best choice for internal data representations
I've been contributing to the Clojure community lately. My experience working with hash-maps as the primary data structure has been entirely liberating.
At my startup, we've got an app with Sinatra services, a Rails API, a Node.js web frontend, and Backbone client code. JSON gets passed between them. Being forced to encode keys as strings is a mild annoyance that Clojure's reader syntax avoids, but the real issue is that I've got raw JSON, Javascript domain objects (Backbone.Model), Ruby models (ActiveRecord), and Ruby hashes (hashie/mash/similar). Each has their own idiosyncrasies and interfaces. Of all of them, the raw JSON is most pleasurable to work with. CoffeeScript & Underscore.js roughly approximate 10% of the awesomeness that is Clojure's core data structures, including maps, sets, vectors, and lazy-seqs.
ActiveRecord, for example, makes it super easy to tangle a bunch of objects up. If we had a big bag of functions, they could operate on in-memory hashes, or they could operate on database rows, or they could operate on the result of an API call. It would be so much simpler to reuse code between our main Rails API and our Sinatra service. And we could one-for-one translate functions for non-Ruby services. Instead of requiring a crazy tangled ness of polymorphism and mutable state.
> Every non-trivial program is going to have to define abstract datatypes
Absolutely true. However, Clojure has taught me that you really aught to only define a very small number of those. It's been said that it's much better to have 100 functions which operate on 1 data structures, than to have 10 functions that operate on 10. Clojure's get-in function for example: (get-in some-hash [:some :key :path]) is glorious compared to Ruby's some_hash[:some][:key][:path] because you don't need to go monkey patch in a get_path method. And even if you did monkey patch that in, it won't work for the some_object.some.key.path case, unless you got fancy with object.send and yet another monkey patch.
Look at some of the substantial pieces of Clojure code out there. They may only define a small handful of data structures, but most of those are even defined with defrecord, which produces a hash-like object, which all those 100s of functions work on. The rest are tiny primitives that compose in powerful an interesting ways.
> I'm not sure how embedding and dispatching on a type tag in a hash is any better than using the more explicit support for dynamic dispatch you find in typical OO languages
Because you may want differing dispatch & single-dispatch inheritance doesn't let you change your mind as easily. Those dynamic dispatches in Ruby/Python whatever are simply hash lookups anyway. You'll get the same performance either way. Look at the output of the ClojureScript compiler for example. Most code paths dispatch on :op, but you could just as easily dispatch on some bit of metadata, maybe the [:meta :dynamic] key path to have a function that runs differently on static vars than dynamic ones. People are also working on advanced predicate dispatch systems.
> The real problem with most OO is that it mashes a lot of interdependent, mutable state together.
That's a real problem. But it's not the real one :-)
That style of programming was also popular back in the pre-OO days with tagged unions in C. It's pretty easy to write fragile software that way that has a lot of incidental dependencies.
It basically makes every single field part of your public API. Once somebody starts consuming that field directly you can never remove it or change it or replace it with a synthetic property.
I'd argue it's better in the long term to change the API than to maintain an increasingly elaborate facade as your data structure changes. OOP doesn't solve the core problem; it just lets you sweep the issue under the carpet until it becomes too big to easily deal with.
It gives you the freedom to make a lot of smaller changes before you have to resort to explicit versioning. I think the benefits of encapsulation are pretty well established at this point.
For example, the entire Apple CoreFoundation API is exposed as operations on opaque C structs. Cocoa uses that abstraction to wrap them in a very nice, high level API. If the actual layout of those structs was exposed I can guarantee you people would start writing code that used that information directly and Apple's hands would be tied when they wanted to improve their implementations.
Where I agree with Rich is that we should work to minimize mutable state.
If the only benefit of encapsulation is to make small API changes easier, is it really worth the additional complexity it adds to your architecture?
A strong theme in many of Rich's talks is that we should prefer "simple" over "easy", as the former will benefit you more in the long run even if it's harder in the short term.
You also seem to be confusing encapsulation with abstraction. Working with data structures directly does not mean you need to operate at a low level.
> Changing software without breaking it is the central problem of software engineering
But encapsulation doesn't solve that problem, except in very minor cases. Is it worth complecting the architecture of our software for the development equivalent of a bandaid?
> In what sense can you have encapsulation when your entire data structure is exposed to every function that touches it?
I'm not saying you should have encapsulation... Did you mean to say "abstraction"?
But encapsulation doesn't solve that problem, except in very minor cases.
The WIN32 API has largely remained unchanged over the last 15 years while changing dramatically under the hood in its implementation. With the recent possible exception of Apple's recent bull run this is so far the most lucrative API in the history of the world. And it's made possible by the fact that all the key API elements are exposed only as opaque handles.
I could go on and on with examples like these that are about as far away from a minor bandaid as you can get.
I'll give an example of what I think Hickey is talking about. In a OO API you have an
class Employee
private boss
public getBoss()
What IMO Hickey thinks, is that you should have instead a map with a field boss. Yes, you can't then change that field name, in the same way you can't change the method name getBoss in the OO API. But you can change the content of that field.
I'm not sure I'd classify Windows as changing dramatically in its implementation, and the cost of maintaining binary backward compatibility has been considerable. Only a company the size of Microsoft could afford to do it year after year.
There's also a difference between building good software and building profitable or even popular software. Sometimes bad design can even help maintain a product's market lead, as if a design is complex, it's hard for competition to make compatible products.
You're also still confusing abstraction and encapsulation. An API can be an abstraction over a lower-level system without being an example of encapsulation.
With web services, documents are your public API. Once a service is public, there is a contract between you and the consumers of the API that it will not change until a new version comes out. This is why we see the /v1/ convention in REST services. /v1/ MUST not change on people. You wouldn't do that with a public C library, you shouldn't do that with web services. It is anti-social to change that API on developers.
These documents are very complex. Just take a look at a twitter timeline JSON response. Having a getter and setter for every field in that document would get ridiculous.
The problem with JSON is it is so easy to change that it takes self discipline to enforce your own social contract. We all know that when faced with deadlines some developers get a little fast and loose to meet that deadline.
But you should not even have a /v1/ in your URLs. mysite.com/v1/people/john is probably not a different person from mysite.com/v2/people/john. Instead, you should have versioned content types. The URLs of a REST service can change as much as they like, the client should not assume any other URLs than the entry point.
The form of the response, be it JSON or something else, is the thing that should not change. And you are right that even with a documented content type keeping the promise is hard.
/v1/people/john and /v2/people/john are different representations of the same internal state.
REST = Representational state transfer. While a URI is a unique identifier, it uniquely identifies a representation of internal state. The structure of the representation under a version should stay constant or else you void your social contract with consumers of your service.
The structure of the representation under a version should stay constant or else you void your social contract with consumers of your service.
Yes, but the version should be stated in the Content-Type header, not in the URI of the resource. URI identifies the resource and Content-Type identifies the representation.
The problem with encapsulation is that invariably you will think something is an internal detail, and "protect" it. Later, I will come along and want to use your code to do something you did not envision, and the "protected" status of some part of your code will greatly complicate my life. I don't have a problem with separate name-spaces for "public" code versus "implementation" code, but limiting my ability to re-use implementation code is just plain wrong.
Ultimately, if someone uses something that is part of a transient namespace, they are responsible if it breaks at a later time. You should not handcuff everyone to save stupid people from themselves; stupid people will always find a way to shoot themselves in the foot.
> CoffeeScript & Underscore.js roughly approximate 10% of the awesomeness that is Clojure's core data structures, including maps, sets, vectors, and lazy-seqs
Thanks, great to see more material along the same lines as your Simple Made Easy talk. The concepts you outlined in that talk (and in this one) have made a tremendous impact on how I think about creating software. Can't thank you enough.
I don't understand his point on encapsulation of information versus implementation. If anyone can help me here, I would be grateful.
* Should the user of your object/data have to make this distinction? Does he care if age is a piece of data or the result of a calculation?
* Should the user of your object/data know where a piece of information is? Suppose I start of with having a birthdate attribute in my person hash. I later read something about CQRS and decide to build my person as an event store. My birthdate is now in some event hash inside an events list in person. If I encapsulate this birthdate information, the users of my object/data don't have to change.
Encapsulating information/implementation might "complect" person, but doesn't it make it a lot more simple for the users/callers?
"We can make the same exact software we are making today with dramatically simpler stuff—dramatically simpler languages, tools, techniques, approaches. Like really radically simpler—radically simpler than Ruby which seems simple.
Inertia. That uncomfortable feeling you get when you venture into a new paradigm that you don't understand acts as a barrier and an excuse for people to stay where they are.
I'm not sure that's it. I do 100% consulting, and so end up working with lots of programmers.
In my experience, younger programmers who are otherwise competent tend to create the most complicated, generalized systems imaginable.
It's only when you get older, and have done enough systems, that you start to favor the "simple" approach Rich is talking about.
If I had to guess, I'd say that younger, more inexperienced programmers know there is a problem with complexity, and believe that "general", non-simple solutions are the way to tackle that complexity.
A similar thing happens with college students introduced to a new topic. They'll write these elaborate, complicated papers about the simplest of topics -- because they don't understand it well, and don't want to miss anything. It looks and feels very similar to what inexperienced developers do.
It seems strange (to me)--maybe in a good way, but would it be preferable to use the JSON literal over the if statement (for the sake of the argument assume in javascript if returns a value)?
val =
if (person === "Homer") {
return expr
} else if (person === "Bart") {
return expr
} ...
----
val =
{
"Homer": expr ,
"Bart": expr ,
...
}[person];
Would the literal be simpler because it uses data over syntax, or rules instead of conditionals?
I frequently use the latter in Python. I like that it's easy to construct programatically (because it's data) and it more explicitly handles the various cases (especially the default case).
Every non-trivial program is going to have to define abstract datatypes and I'm not sure how embedding and dispatching on a type tag in a hash is any better than using the more explicit support for dynamic dispatch you find in typical OO languages.
The real problem with most OO is that it mashes a lot of interdependent, mutable state together.