Hacker News new | past | comments | ask | show | jobs | submit login
The Building Blocks of Ruby (yehudakatz.com)
93 points by wycats on Feb 7, 2010 | hide | past | favorite | 39 comments



I don't feel like any of these are particularly compelling examples, particularly if you are familiar with more languages beyond Java and Python which are used for comparison in the article.

In both the file handling and mutex example, blocks seem to be serving as a substitute for proper lexically scoped variables. In Perl, for example, we would use a lexically-scoped file handle to ensure the file is closed when the variable goes out of scope. The same technique is the standard way of implementing mutexes in C++.

As for respond_to, I've never understood how it wasn't just a baroque rewriting of a case-statement, but perhaps someone can enlighten me as to why blocks are superior in this situation.


For the "resource acquisition is initialization" technique, you are limiting yourself to a reference-counting garbage collector, or to manual or stack allocation/deallocation (like in C++), otherwise you can't have a deterministic behavior.

For example this technique wouldn't be possible on a virtual machine like the JVM or .NET. Yes, you could say something like "when a variable goes out of scope, you need to call the dispose method on the object".

But what happens in a case like this? ...

    {
        open my $fileh, '<', $path;
        $another_object->add_file_handle($fileh);
    }
The intention here is to preserve the file-handle in $another_object. So what happens? Is $fileh closed when going out of scope? Or is another file handle opened when the reference is copied?

Really, it's not really clear when the handle is closed. Not like in C++ anyway. And Perl is a really bad example, what would really be needed is something like the "use statement" we have in F# ...

    {
        use fileh = File.OpenText(path);
        ...
    }
But this is a special form. And that's the trouble ... Ruby's blocks solve a lot of problems that are solved in other languages with special forms (not that F# doesn't have closures or anonymous blocks).


Blocks are objects themselves and they preserve their access to the local variable scope (i.e. they are "lexical closures"). This allows you to write control structures yourself. E.g. if-statement:

  def my_if( cond )
    if cond
      yield
    end
  end
  x = 3
  my_if x < 5 do
    puts "#{x} is lower than 5"
  end
  # prints '3 is lower than 5'
Unfortunately the syntax for a control structure accepting more than one block is less elegant. But you don't need to compromise on the semantics:

  def my_if_else( cond, a, b )
    if cond
      a.call
    else
      b.call
    end
  end
  x = 2
  my_if_else( x < 0, proc do
    puts "#{x} is lower than zero"
  end, proc do
    puts "#{x} is greater or equal zero"
  end )
  # prints '2 is greater or equal zero'


Sure and I understand all that. My comment was just that the examples in the article are perhaps not the best ones to use because they are simply replicating what other languages do with lexically scoped variables, and users of those languages view the lack of such as a deficiency of Ruby.


How does a lexically scoped variable close the file? I couldn't find any documentation about this while Google searching. Is this a real language feature where you can specify what happens to a variable when it goes out of scope?

In other words, is it a baked-in convenience or a real extensible abstraction?


  #include <fstream>
  using namespace std;
  int main()
  {
    {
      fstream f( "test.txt", ios::out );
      f << "Hello" << endl;
    } // File gets closed
    // ...
    return 0;
  }


I haven't done C++ in a while but is it possible to allocate an fstream object on the heap instead of the stack? If so and you do not explicitly call delete on it then will it still close the file when going out of scope?

Something like:

  #include <fstream>
  using namespace std;
  int main()
  {
    {
      fstream *f = new fstream( "test.txt", ios::out );
      f << "Hello" << endl;
    } // memory leak and non closed file.
    // ...
    return 0;
  }


No. But you can put a smart pointer object on the stack.

  #include <boost/smart_ptr.hpp>
  #include <fstream>
  using namespace boost;
  using namespace std;
  int main()
  {
    {
      shared_ptr< fstream > f( new fstream( "test.txt", ios::out ) );
      *f << "Hello" << endl;
    } // File gets closed
    // ...
    return 0;
  }


You can allocate an fstream (or any) object on the heap, and if you do not explicitly call delete it will not close the file. Resource-acquisition-is-object-instantiation (RAIOI) is a common C++ idiom, and it only works on stack variables.


It only works with stack variables, but you can wrap the construction and destruction of anything, including heap-allocated objects, with a stack variable. You tie the allocation of the heap object to the constructor of the stack variable, and the reverse for de-allocation/destructor.

The premise of all this is that the construction and destruction of stack variables both happen at well known times and are guaranteed to occur.


Yes, you can absolutely specify a destructor which will be called when a lexical / local variable goes out of scope in Perl or C++ (just to name the most well-known languages with this feature).


For the multi-block conditional, there is this idiom:

  def assuming cond
    Thread.current[:last_cond] = cond
    Thread.current[:last_value] = (yield if cond)
  end

  def otherwise
    if Thread.current[:last_cond]
      Thread.current[:last_value]
    else
      yield
    end
  end

  def alternately_if cond, &b
    otherwise { assuming cond, &b }
  end

  def how_big x
    assuming x < 3 do
      "small"
    end
    alternately_if x < 7 do
      "medium"
    end
    otherwise do
      "large"
    end
  end

  10.times {|z| puts how_big z }
Seems a bit sketchy though. It could be made cleaner by making them methods of an object that holds the state, but I guess that is starting to defeat the purpose.


Ruby does have a nice syntax for passing a single block. But like you say it starts to become a little inelegant when more than one block is used.

Strangely Perl copes with multi "blocks" far nicer IMHO:

    sub my_if_else {
        my ($cond, $then, $else) = @_;
        if ( $cond ) { $then->() }
        else         { $else->() }
    }

    my $x = 2;

    my_if_else $x < 0, sub { say "$x is lower than zero" }, 
                       sub { say "$x is greater or equal to zero" };
And it can be even more window dressed by using the fat comma:

    my_if_else $x < 0
        => sub { say "$x is lower than zero" }
        => sub { say "$x is greater or equal to zero" };


Well, you could do something like this in Ruby:

  def my_if_else( cond, a, b )
    if cond
      a.call
    else
      b.call
    end
  end
  x = 2
  my_if_else x < 0, proc { "#{x} is lower than zero" },
                    proc { "#{x} is greater or equal zero" }
My point is that you can't do without 'proc' (or 'sub' in your case).


Opps, my last comment/code got cut off :(

Here goes again....

And going into sub prototype sublime you can also do:

    sub then   (&@) { @_ }
    sub elsedo (&@) { @_ }

    my $x = 2;

    my_if_else $x < 0,
        then   { say "$x is lower than zero" }
        elsedo { say "$x is greater or equal to zero" };
But yes, I do get your point about proc/sub. There is only so far you could (and should!) stretch the parsers syntax.

There is always macros (see Devel::Declare in Perl) if you're mad enough to want cosmetic purity :)


The problem using a case statement is: what are you casing on? In order to properly do content negotiation, you first need to know all of the possible provided formats, so you can negotiate against the "Accept"ed ones.

In order to do it with a case statement, you'd have to repeat the list of provided Mimes: once so the content_type method knew how to negotiate, and then once for each switch in the case statement. Using a block here allows us to make a native'ish case statement that eliminates the duplication.


Thanks you, that does make sense. Can I suggest adding that explanation to your article so it's clear that the example actually does do more than simply switching on a singular requested format?

(I'm actually curious in practice how often the "Accept" header is really used to specify the format, and how often the format is simply coming from the extension on the requested path. I've only ever encountered the latter case, which I why I was thinking of the format as a single-valued variable you can switch on.)


Lexical scoping has almost nothing to do with Perl's ability to close file handles as you describe. As others point out, it's RAII with reference-counted garbage collection. This feature is not-thread safe, which is one reason why perl threads require making an entire copy of the whole program state and treating shared variables specially. The former makes perl threads a mostly useless abomination and the latter causes other problems.

For instance, a friend of mine at work who didn't know the full details of perl threads couldn't figure out why his threaded program was running abysmally slow and using tons of memory. When he found out, he ended up just switching to a forking process model. As for the specialness of shared variables, another friend of mine wrestled with the fact that the perl serialization libraries for whatever reason couldn't handle serializing shared variables and he as far as I know never found a way to solve the problem. Bottom line, Perl's morass of bad language design leads to lots of problems that more elegant solutions like ruby's blocks can avoid.

Also, reference counting is considered computationally less efficient than global garbage collectors. See the recent paper Myths and realities: the performance impact of garbage collection, which shows that a reference-counting garbage collector performs significantly worse than a mark-and-sweep algorithm.


It's true that lexically scoped variables can replace closures when all you're using closures for is to wrap a section of code in starting and ending sections. But blocks are more general purpose than this. Have a look at the clean API that blocks have enabled in Ruby's Enumerable module, for example. Once you're using blocks/closures, why not use them to acquire/release resources as well?


In ruby you can create a file pointer as a lexically scoped object:

  def write(something)
    f = File.open("/path/to/somewhere", "w+")
    f << something
  end
The variable f is local to the write method. Without explicitly closing the file, it may not be flushed to disk until the ojbect is garbage collected (I think).


That delay until the next garbage collection is a big caveat. While you might get away with this for file handles in some situations (honestly, I'm not sure that File#close really guarantees a write to disk anyway), it makes this technique essentially useless for mutexes.

(Edit: I should also point out that that's a function-scoped variable, not a lexically-scoped variable. Ruby doesn't have lexically-scoped variables.)


  File.open('path', 'w') do |f|
    f << something
  end
will do what you want


It might be good to read the whole discussion, particularly noting what I personally have said, before making conclusions about what I want.


I should have simply explained that my example will close the file handle and flush the changes out at the end of the block passed to the anonymous file handle, instead of making a statement about your wants.


I've always felt that blocks were Ruby's gimmick. They're neat, and I'd like to have them in Python. But I get the feeling that this is just a bikeshed issue. People show them off because they're easy to understand. There's absolutely nothing wrong with that.

However, I get the feeling that they're not Ruby's strongest feature. Python's coolest features (metaclasses, descriptors, and other things) wouldn't make sense if you'd just read a blog post on them. I suspect Ruby is the same way.


It's not a bike-shed issue. Surely the other features are great, and Python has lots of powerful abstractions.

But once you have closures with a light-weight syntax, as Ruby has, the APIs start to look a lot more different.

For example, take this example ...

    v = [ x for x in collection if x % 2 == 0 ]
    for item in v:
        print item
In Ruby the equivalent would be ...

    collection.find_all{|x| x % 2 == 0}.each do |item|
        puts item
    end
Yes, the Python example is elegant, but Ruby doesn't need extra baked-in features like list-comprehensions. It doesn't need a bunch of other features as well, like generators, or generator expressions, or with statements.

There are a lot of PEPs in Python that cover use-cases for Ruby's blocks, trouble is there are still use-cases that aren't covered.

Guido is partially right though ... adding blocks in Python wouldn't be pythonic because blocks wouldn't be orthogonal with lots of other Python features. They should've been added from the start, and now it's kind of late.


"However, I get the feeling that they're not Ruby's strongest feature."

Perhaps the strongest feature is the underlying architecture that allows for blocks, the general idea that behavior is a data type you can pass around.

And since you can alter the properties of objects at different levels (instances, classes, meta-classes), you can alter not just the more conventional values (int, strings) but an object's (and system of objects) very nature.


Very true, I think Ruby's metaprogramming is an example of the kind of strong feature you were talking about. I don't use Ruby, but I remember reading about it in the Poignant Guide and being really impressed at how it can make code look very different, basically one way to make a DSL.


Blocks push the language across some kind of line that makes it feel right to use λ-expressions everywhere. They may seem trivial but I've come to realize that such minutiae can make or break a programming language and can be pivotal in deciding how the language is used.


pedantic:

Digest::MD5.digest(x), not Digest::MD5.hexdigest(x). If humans aren't reading it, don't convert it to hex.


Blocks are also an extremely powerful tool when it comes to building DSLs, and if done correctly are a great alternative to complicated options files/hashes/etc. Rails' routes and config/intializer come into mind.


> Blocks are also an extremely powerful tool when it comes to building DSLs...

I can't read this as anything more insightful than functions are also an extremely powerful tool when it comes to building APIs. I first used Ruby in 2000. What am I missing?


"What am I missing?"

Nothing. The emphasis on DSLs is distracting.

Worse, it may lead people to think that certain coding techniques are only used when creating a "DSL".


ruby block syntax is clean, elegant and nest-able which allows creating good looking DSLs.

    foo do
      bar do
        ..
      end
      baz 123
    end
now try to do the same with some other language which supports something-kind-of-like-ruby-blocks but with a different syntax. it will not look nearly as good, so in those languages instead of creating DSL people usually implement some kind of config file format instead. or just use XML :)

if you'd have to use 'lambda' to define a block for example, it would make a much worse DSL with lots of extra syntax noise.


The lack of punctuation characters makes this a DSL and not bog-standard Ruby code?

> ... instead of creating DSL people usually implement some kind of config file format instead. or just use XML.

Writing a grammar or a parser means you haven't created a DSL?

Unless by "DSL" you mean "Ruby syntax and Ruby semantics with symbol names chosen by the programmer", I have no idea what you mean by "DSL".


How about this:

Ruby's syntactic flexibility and semantic model mean that many cases handled by a change to the language spec or delegating to another language (XML config files, for example) can be cleanly handled in Ruby itself. The end result is often something that looks like a little language for a specific task, such as Rake, Rails, etc. (I'm not a Ruby programmer, so I hope those are good examples.)

So with Ruby, there are very few cases where writing a custom grammar or parser is necessary. Ruby's flexibility gives you the ability to do things that might require a custom grammar or parser in other languages.


However even Jim Weirich doesn't like Rake being called a "DSL".

I think Piers Cawley was dead on by calling things like this a Pidgin

refs:

* http://www.infoq.com/interviews/jim-weirich-discusses-rake

* http://www.bofh.org.uk/2007/08/08/domain-specific-pidgin


Common usage of DSL seems to mean an API whose usage looks more like a secondary language rather than the host language. At least that's my interpretation. A case example would be jQuery. Which while it is JavaScript, idiomatic code using jQuery isn't written in the same style as traditional JavaScript.


Yehuda Katz is always able to remind me that there is so much Ruby I don't know.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: