Making Coffeescript’s Whitespace More Significant

Groxx · on Nov 30, 2011

I like it a lot. It would also simplify the ".end()" function in jQuery[1], because you could control it with indentation:

  $('.class')
    .find('.sub')
      .remove()
    .find('.stuff')
      .addClass('other')
      .find('.more_stuff')
        .removeClass('things')
    .filter('li')
      .appendTo('#my_list')

[1]: http://api.jquery.com/end/

raganwald · on Nov 30, 2011

Funny you mention jQuery. I wrote “Significant Whitespace” in March of 2010, long before trying Coffeescript:

https://github.com/raganwald/homoiconic/blob/master/2010/03/...

dextorious · on Nov 30, 2011

Yes, it could control it with indentation. "Simplify" though? Are you sure?

This whitespace zig-zaging is barely readable.

Groxx · on Nov 30, 2011

Very much so. Otherwise you get this, at best:

  $('.class')
    .find('.sub')
      .remove()
      .end()
    .find('.stuff')
      .addClass('other')
      .find('.more_stuff')
        .removeClass('things')
        .end()
      .end()
    .filter('li')
      .appendTo('#my_list')

Or, to match jQuery's documentation style for that function, this:

  $('.class').find('.sub').remove()
    .end().find('.stuff').addClass('other').find('.more_stuff').removeClass('things')
    .end().end().filter('li').appendTo('#my_list')

Matching .end() with each selector change is equivalent to writing valid XML by hand, without the aid of an auto-tag-closer, and without a validator - you only see the error on run, and only if you hit that code path, and only if it does something you notice isn't correct.

None of this is meant to imply that chaining things like this is a Good Idea™, and I avoid .end() like the plague and use intermediate variables. But when you don't need the root or intermediate results for anything else, yes, this is more readable, more easily optimized (you can't get it wrong, every level is cached for you), less prone to error, and significantly fewer characters / lines of code. That's simplifying and improving.

-- late-edit:

Less whitespace zigzaggery is also possible (I agree, not easy to read such dense zigzagging), and similarly easier with significant whitespace. My example was essentially just a trivial one, I tend to see larger ones where I see that kind of indentation at all. Is this better?

  $('.class')
    .find('.sub').remove()
    .find('.stuff').addClass('other')
      .find('.more_stuff').removeClass('things')
    .filter('li').appendTo('#my_list')

tordek · on Nov 30, 2011

In your last example, `.find('.more_stuff')` works on the value returned from `.addClass('other')` (or so it'd seem), so it behaves differently.

Groxx · on Nov 30, 2011

Which, with jQuery, is the same as the results from the most-recent selector in the chain (in this case, the .find('.stuff') before it). Normally though, you'd be absolutely correct, and that example would need to nest the .addClass('other') inside its .find so it doesn't pollute the next .find:

  $('.class')
    .find('.sub').remove()
    .find('.stuff')
      .addClass('other')
      .find('.more_stuff').removeClass('things')
    .filter('li').appendTo('#my_list')

wwweston · on Dec 1, 2011

  with($('.class')) {
  	find('.sub').remove()
  	with(find('.stuff')) {
  		addClass('other')
  		find('.more_stuff').removeClass('things')
  	}
  	filter('li').appendTo('#my_list')
  }

raganwald · on Nov 30, 2011

Great point:

    This whitespace zig-zaging is barely readable.

I considered an example like this in the post, but it confuses accidental with essential complexity. If you’re going to do all of that, then I suggest that significant indentation is a win over everything being a chain and using something like jQuery’s `.end()` to discriminate chaining from cascading.

It’s a lot easier to spot an error in indentation than a missing `.end()`. That being said... This could be a false dichotomy. Perhaps the right thing to do is use named temporary variables or structure the code another way.

Although I don’t see a ton of it in the wild, it’s 100% valid to write your own custom jQuery traversals, filters, and so forth. my own jQuery Combinators includes `.K` and `.T` combinators so that you can break code up into functions so you don’t need to write elaborate trees.

So... I agree that an elaborate tree is a difficult problem to handle, and perhaps neither significant whitespace nor `.end()` is the answer. But for the limited and possibly artificial choice of a fluent interface OR significant whitespace, I prefer significant whitespace.

itmag · on Dec 1, 2011

"Although I don’t see a ton of it in the wild, it’s 100% valid to write your own custom jQuery traversals, filters, and so forth. my own jQuery Combinators includes `.K` and `.T` combinators so that you can break code up into functions so you don’t need to write elaborate trees."

This is very interesting to me. Tell me more :)

raganwald · on Dec 1, 2011

https://github.com/raganwald/JQuery-Combinators

jashkenas · on Dec 1, 2011

@raganwald -- Lovely post, as always. One (somewhat important) question: If same-level continued calls mean chaining, and indented continued calls mean to use the value of the previous line, then what do outdented continued calls mean?

      first
    .second

Or heading 'offsides', in this fashion:

      first
        .second
    .third

Are those both syntax errors at compile time?

Also, how does this play with things that are not necessarily side-effect-ful? For example:

    object.property
    .value

Currently means `object.property.value`. Under your rubric, would it evaluate to `object.value`? Making any trailing access effectively a no-op?

raganwald · on Dec 1, 2011

Outdented calls use the last value in the column to their left. So:

      first
    .second

Never means `first.second`, either `.second` is an error, or it is applied to something above this, such as:

    someValue
      .methodName ->
        first
      .second

...which becomes:

    someValue.methodName(-> first)
    someValue.second

If such a value can't be found, it's an error. Of interest is the meaning of `.propertyName` in the leftmost column, as in:

    object
    .property

I suggest it's an error. As long as you can write:

    object.property # or...
    object
      .property

I don't see the value in

    object
    .property

This:

    object.property
      .value

Means `object.property.value`.

satyr · on Dec 1, 2011

> I suggest it's an error.

So we'd have to give up:

    array
    .map (x) ->
      f x
    .reduce (a, b) ->
      g a, b

sausagefeet · on Dec 1, 2011

IMHO, Smalltalk got this right using ; rather than whitespace. And it's easier to read.

eric-hu · on Dec 1, 2011

What about maintaining backwards compatibility while adding an operator specifically for this purpose. Example

     # same as first.second.third
       first
         .second
       .third

     # same as first.second, first.third
       first
         .second
       ^third

     # same as first.second.third
       first
         .second
     .third

     # syntax error
       first
         .second
     ^third

satyr · on Dec 1, 2011

There's no outdented continued call in current CoffeeScript; an outdented line always makes OUTDENT.

(Or are you suggesting to allow such code? I personally see no value in it.)

groby_b · on Dec 1, 2011

I read the article, and I simultaneously liked the idea and was fundamentally disturbed by it. At the same time, Smalltalks cascading messages never disturbed me.

So after going through the initial two reactions I have to every thought-provoking blog posts (#1: OMG - raganwald is insane!, followed by #2: OMG - raganwald is brilliant!), here's what actually doesn't work for me:

It is overloading whitespace with both control flow and data flow. Smalltalks cascade neatly introduce a different symbol to sidestep that.

So, ultimately, I'd rather see cascades introduced into CS than overloading the meaning of whitespace.

wwweston · on Dec 1, 2011

Amen. The part about data flow is insightful. The idea of overloading it into whitespace is insane. There has to be a better way, and SmallTalk's might be it.

Blocks delimited by whitespace work because they rely on a familiar set of conventions that have already evolved among most programmers over 30 years or more (and interestingly, a lot of programmers still really hate the idea of having it enforced). Moving what is essentially the alleged JavaScript BadPart(TM) "with" into an overloaded whitespace/dot combo is going to throw at least as many programmers for a loop as "with" has.

I also think there's something smells wrong about the examples. For all the dataflow insight -- did I miss the part where he talks about how exactly we're keeping track of the destination of the return values? And the problem with the ".pop" example specifically might be less that you can't call it fluently three times than that you have to call it three times to get three items off the collection instead of ".pop 3".

groby_b · on Dec 1, 2011

There's no destination, exactly.(I think). A "data scope" for lack of a better term just implies that all functions are called on the object returned by the statement that introduced the scope.

It is _exactly_ like JS's 'with' statement, just that we let the language automatically infer that we probably meant "with".

iambot · on Nov 30, 2011

I actually agree with everything that is suggested in this submission. I wonder what the best way would be to get it implemented is. Perhaps as a form of feature request poll, to see what the suppor for it is. Or perhaps a call for people that agree to "watch" it on github.

clutchski · on Nov 30, 2011

It's being discussed here (along with some alternate syntax ideas):

https://github.com/jashkenas/coffee-script/issues/1889

raganwald · on Nov 30, 2011

I think this is a little closer:

https://github.com/jashkenas/coffee-script/issues/1495

The above link is actually a discussion about Dart’s “Monocle Moustache,” so I console myself that when people say it’s ugly, they mean the moustache and not significant whitespace :-)

moomin · on Nov 30, 2011

I'm the originator of that issue. There's been previous proposals, and subsequent ones). I originally proposed different syntaxes, but the current proposal was superior.

I've been amazed at how much interest there has been in this. Every time I think it's over, more people pile in. The only catch is: there are a couple of people who still need to be convinced, and they're actually the important ones. I'm pretty sure that a pull request for this would not be accepted.

thedufer · on Nov 30, 2011

I like the idea, but the lack of backwards-compatibility is less than ideal. Some people who update CoffeeScript compilers will suddenly find their code mysteriously doing the wrong thing (when they wrote what this suggestion considers "cascading messages", but expect them to not cascade). I have worked on multiple projects that would fall prey to this issue.

Semiapies · on Nov 30, 2011

Perhaps a "legacy" option to disable cascading?

thedufer · on Nov 30, 2011

The problem is the time it will take people to realize that they need to turn on the "legacy" option. The failures this change could cause could be very difficult to track down.

davej · on Nov 30, 2011

Make it an opt-in feature for the moment and maybe make it the default for CS 2.0 or some other milestone.

Uncompetative · on Dec 2, 2011

@raganwald -- fascinating ideas

Whilst the 'staircase' form forces each message to await the return from the reciever. The 'cascade' form could be used to post commands to a concurrent process into a separate recieving processor's message queue with no need to await a reply - as in Eiffel's Command/Query Separation Principle.

Also, 'futures' could be used to decouple queries from having to await replies from the reciever's of their messages. All that is needed is for variables defined through assignment to a query to remain potentially undefined until needed by some command. At this point all of the command's arguments would need to be defined and it would either have to await a reply from the queried process, or await some globally visible but yet to be defined thread to bind a value to the variable i.e. dataflow.

All of this hinges on using a language that doesn't freak out when processing undefined variables, but regards them as their symbolic names, reducing complex expressions with a collection of rewrite rules.

I'd be interested to know what you think about my proposal for these richer concurrent semantics.

-- Uncompetative

raganwald · on Dec 2, 2011

I like it! I’ve had some similar thoughts along slightly different lines recently.

lisper · on Nov 30, 2011

The problem with significant whitespace is that you can't count on whitespace to be preserved across many common protocols. Text editors will convert spaces to tabs and vice-versa. HTML rendering eats whitespace. Cut-and-paste may or may not preserve whitespace.

Python has had this problem since its inception. If you're editing a Python program in emacs python mode and you hit TAB at the wrong time you can inadvertently change the semantics of your code. And that's just the tip of the iceberg. I'm a big Python fan, but significant whitespace is a bad idea.

masklinn · on Nov 30, 2011

> Text editors will convert spaces to tabs and vice-versa.

Get a good text editor?

> HTML rendering eats whitespace.

Except when you tell it not to, of course.

> Python has had this problem since its inception.

problem being mostly encountered by those who never use it, interestingly.

> If you're editing a Python program in emacs python mode and you hit TAB at the wrong time you can inadvertently change the semantics of your code.

So can you if you hit "}" or ";" at the wrong time in a braceful language...

The primary (and as far as I'm concerned the only significant) issue of significant indentation (it's not even significant whitespace) is auto-generated code (which is why Haskell has a braceful syntax and an indentation-based transformation of it), as giving the right contextual indentation to a piece of code may make the code generator much more complex (codegen targetting Python should probably generate python bytecode, rather than generating code).

And to support my claim that significant indentation is not effectively an issue, I will use Haskell: Haskell can be written using both a brace-and-semicolon syntax and an indentation-based one. Both forms are perfectly equivalent and can be translated into one another without loss of information.

I do not remember ever seeing a Haskell piece of code, article, demonstration or example which used braces except when the article was about the braceful syntax or about auto-generated code.

If significant indentation was such a crippling problem, would Haskell users not have coalesced around the "less problematic" braceful syntax?

lisper · on Nov 30, 2011

I use Python a lot. And I encounter these problems often enough for them to be very annoying.

> So can you if you hit "}" or ";" at the wrong time in a braceful language...

The difference is that when you hit "}" or ";" the effect is always the same, it's always visible, and it's always possible to undo by hitting DELETE. If you hit either of those characters N times, you can always undo that by hitting delete N times.

This is not true for the tab key. The effect of hitting TAB depends on the context. Determining whether your last press of the TAB key had an effect or not requires that you remember the previous state, and so undoing the effect (or lack thereof) of hitting TAB requires that you remember the previous state. And if you ever do a block auto-indent at the wrong time you are pretty much hosed.

snprbob86 · on Dec 1, 2011

What tools are you using?

I had a lot of problems with Python when I was trying to edit code in a variety of different IDEs, text editors, etc.

That was years before I discovered the beauty of Vim (sub in Emacs here, if you like).

For your vimrc:

    " Indentdation
    set tabstop=2
    set shiftwidth=2 softtabstop=2
    set smarttab
    set expandtab
    set smartindent

    " Shed light on hidden things
    set list
    set listchars=tab:»»,trail:•
    set wrap
    set linebreak
    set showbreak=↳

This will use soft-tabs (assert (> spaces tabs)) and expose tabs and trailing spaces on lines using the » and • characters respectively. They show up as a nice, obvious blue in my theme.

Sometimes, this can be annoying for other people's code, who prefer tabs. Easy fix is to `:set nolist` on those buffers.

This also works with file formats that expect tabs, like Makefiles, which have plugins in most Vim distributions that will forcibly type a tab when required. Will be obvious when you see the ». If you ever want to explicitly type a tab, go to insert mode and type ^v<tab> (that is control+v, then press tab). ^v lets you disable custom mappings for the next chord, so instead of <tab> meaning "indent" it will mean "type a damn tab character!"

Meanwhile, our non-technical CEO does some Haml/Sass (both whitespace significant) using TextMate. I had to write some on-save scripts for him to make sure he doesn't submit any trailing whitespace and always ends his files with a trailing newline. grumble grumble

lisper · on Dec 1, 2011

I use emacs and python-mode. This is not an editor issue. It's more fundamental than that. The problem is this:

    block:
      stmt1
      stmt2
      stmt3
    stmt4
    stmt5

If your indenting gets screwed up for ANY reason there is not enough information left to reconstruct it. There is enough information reconstruct the indent at stmt1 (thanks to the colon, which is essentially equivalent to a left curly brace), but not enough to reconstruct the outdent at stmt4. There are many, many ways for indentation to get screwed up.

jholman · on Dec 2, 2011

But to reiterate masklin's point, in C/etc, if your braces get screwed up for ANY reason there is not enough information left to reconstruct them. So what's the difference between meaningful braces and meaningful indentation?

To that you replied that the output of the Tab key depends on context, and implied that in your editor(s), sometimes the result of the Tab is invisible, and/or cannot be reversed by hitting Delete (or Backspace). And snprbob86 pointed out that in his/her editor (and mine), this isn't a problem. Tab never does anything invisible, and it's always reversible with Backspace. So what's the problem?

And although I assume you noticed this too, just to be clear and err on the side of explicitness, it seems to me that that there're two things going on here. One half is arguing about whether or not there's a fundamental problem with significant indentation that is not present in languages without significant indentation, and the other is an attempt to solve non-fundamental problems that others might have (e.g. complicated state in tabs, possibly due to using a poor editor).

lisper · on Dec 2, 2011

> if your braces get screwed up for ANY reason there is not enough information left to reconstruct them

That's not necessarily true. If my code is indented, then I can reconstruct the braces from the indentation. Also, it's a lot easier to inadvertently screw up whitespace than a brace because there are so many more things out there in the digital world (HTML, autoindent) that muck around with whitespace than things that muck around with braces.

The right answer is to SPECIFY block structure using braces (or something equivalent), but then RENDER the block structure using (automatically generated) indentation. It's perfectly fine for the compiler to complain if they don't match. This is one case where redundancy is a feature, not a bug.

If you hate braces and love whitespace so much, why are you not urging Guido to get rid of the colon? It's essentially equivalent to an open brace. Why is an open brace more pythonic than a close brace?

> using a poor editor

I use emacs, but just to see if maybe I'm missing something I fired up vim and tried editing some Python code. AFAICT vim (at least out of the box on Snow Leopard) is not aware of Python syntax at all.

thedufer · on Nov 30, 2011

Presumably if its a problem, the accidental change in indentation changed the code to a wrong form, rather than a form that can't be parsed (i.e. unindenting in the middle of a block, which will just cause easy-to-identify indentation errors).

Think about the same situation in, say, C. Now, your program still compiles correctly - but people read it as doing something differently. I'd argue that its worse to accidentally disconnect a reader's interpretation from a compiler's than to change both of them to something that's logically incorrect.

lisper · on Dec 1, 2011

I agree. But those are not our only choices.

The ironic thing about Python is that it actually does have an open-brace. It's the colon. The compiler can tell that this:

    def foo():
    baz()

is syntactically incorrect. And if you tab the second line, auto-indent can do the Right Thing. The screw case is this:

    def foo():
      if baz:
        bar()
        bing()
      boff()

If you auto-indent the last line, it will quietly change the semantics of your program. That's bad.

I always end my blocks with a PASS statement (or a return), i.e.:

    def foo():
      if baz:
        bar()
        bing()
        pass
      boff()
      return

If you do this, then auto-indent will always do the Right Thing. This is particularly beneficial if you want to take a big block of code and wrap in an an outer block. I can add two lines to the above code:

    def foo():
      while snoz:   <--
      if baz:
        bar()
        bing()
        pass
      boff()
      pass       <--
      return

Then I can just auto-indent the whole thing and be confident that the result will be correct.

Contrast this with the traditional method where you have to manually re-indent your code. If you accidentally select the wrong region to re-indent you can change the semantics of the code in a way that loses the information about what the semantics should have been. The only way to recover from this is to manually reconstruct the correct semantics. It may not happen very often, but when it does it's a colossal PITA. (Entering those examples was a colossal PITA too.)

thedufer · on Dec 1, 2011

Wait, so you reconstruct the end delimiter using pass? That's one of the least-pythonic things I've ever seen. I really don't see how this could end up being a problem - doing a block indent shouldn't be this difficult.

lisper · on Dec 1, 2011

> Wait, so you reconstruct the end delimiter using pass?

Yes.

> That's one of the least-pythonic things I've ever seen.

What can I say? It works.

> I really don't see how this could end up being a problem

Do you use emacs? Open up two windows, each with an emacs editing some python code in python mode. Cut and paste some code from one window into the other.

thedufer · on Dec 1, 2011

I don't use emacs, but I use vim, and I suspect I've seen the problem you're talking about. Its the one where auto-indent puts extra indentation in pasted code? This is a problem of not knowing how to use your editor. For example, in vim I either open files in tabs (and use the vim yank buffer/clipboard) in the same editor, or go into `paste` mode to paste from the OS clipboard. I'm not sure what the solution is in emacs, but I guarantee there is one.

snoble · on Dec 1, 2011

Ah ha! you guys weren't arguing about whitespace in code at all. This thread is just a proxy war for the vim/emacs holy war. I should have guessed.

itmag · on Dec 1, 2011

"Proxy war", lol :p

Does that mean that nano and notepad will be recruited to do the actual fighting in the jungles of Vi-ed-nam?

phzbOx · on Nov 30, 2011

There was a conversation on this recently on HN. IIRC jashkenas said he liked the idea but it would be better to encourage library author's to write a functional style enabling chaining rather than adding a new feature to the language.

Btw, I found that missing too in Python and created Moka (http://www.phzbox.com/moka/ It's still in heavy construction)

moomin · on Nov 30, 2011

Actually, that was in response to a previous request for a dedicate chaining syntax. This syntax is actually more useful for dealing with when chaining is already implemented.

herge · on Nov 30, 2011

Saw your presentation at Montréal-Python, it was very good.

Have you played with pointfree (http://markshroyer.com/docs/pointfree/latest/module.html#mod...)? What do you think about it compared to Moka?

raganwald · on Nov 30, 2011

I recall this discussion:

http://news.ycombinator.com/item?id=3174442

There may be others...

quitedisgusted · on Dec 1, 2011

For posterity, the original title of this blog post was "White Power":

https://github.com/raganwald/homoiconic/commit/bd55e8ad731cc...

Reg Braithwaite doing the Clayton Bigsby. Stay classy.

scotty79 · on Nov 30, 2011

Where do I sign?

Do you think it would be hard to introduce such feature to CoffeScript on your own?

jashkenas · on Dec 1, 2011

No, it wouldn't be terribly hard. The source code is all annotated to make it easier for folks to get started trying out their own flavors:

http://coffeescript.org/documentation/docs/grammar.html

You'd probably want to start by altering the lexer to stop considering ...

    a
      .b
      .c
      .d

... as effectively a single line, and turn it into some sort of "chain" node. Then, the value of the expression "a" can be cached at the beginning, and all further operations in the chain can be performed against the original value.

jewel · on Dec 1, 2011

I've thought about something similar, but for a different reason. I'd like to be able to omit the parenthesis on multiline, chained statements, like this:

  $ 'class'
    .addClass 'babies'
    .removeClass 'kids'

alexyoung · on Nov 30, 2011

This is important to me:

s/Coffeescript/CoffeeScript/g

s/Javascript/JavaScript/g

morsch · on Nov 30, 2011

And s/Nvidia/NVIDIA/g? Ugh.

English has a normal way to deal with proper nouns, the first letter are capitalised, the others aren't. CamelCase is an exceptional phenomenon in written English. It's not unusual for exceptional stuff to get regularised, particularly (relatively) high-frequency words like Javascript. Especially since there is zero loss of information involved; there is no added ambiguity.

mhartl · on Nov 30, 2011

I don't know why so many people are cavalier about this. Proper capitalization is part of proper spelling, which is important for clear communication. Whether it's 37Signals, Github, or Javascript, it irks me every time.

This cavalier attitude is so entrenched that attempts to correct it are sometimes even met with hostility, which on HN manifests itself as downvotes. Apparently there are those who feel that comments on this subject (e.g., this one or its parent) don't add to the discussion. And yet, I sometimes stop reading otherwise interesting articles simply because they exceed my misspelling or typo threshold—reading badly edited copy is unpleasant, and the lack of attention to detail undermines its credibility. I'd much prefer to avoid the problem altogether. In that spirit, I'd like to offer some surefire advice on how to prevent this kind of nitpicking: Get it right in the first place. Anyone who can develop awesome web apps or write an optimizing compiler can surely spell 37signals, GitHub, and JavaScript correctly as well.

subsection1h · on Nov 30, 2011

    I don't know why so many people are cavalier about this.
    Proper capitalization is part of proper spelling [...]

My favorite example of this behavior is when copywriters, graphic designers, etc. are inconsistent regarding the capitalization of the name of their own organization. I can't count all the times I couldn't figure out how best to bookmark an organization's website because the copywriters, etc. referred to it as Organization, ORGANIZATION, OrganiZation, and Organi Zation.

JadeNB · on Dec 1, 2011

> I don't know why so many people are cavalier about this. Proper capitalization is part of proper spelling, which is important for clear communication. Whether it's 37Signals, Github, or Javascript, it irks me every time.

So, presumably, is the proper use of articles, but how many people do you know who insert 'The' before all and only weak proper nouns? (For example: Is your alma mater University of Blah or The University of Blah? I'll bet you don't know; I didn't.) I'm on your side, and I like to make this distinction; but it's obvious that we have to draw the line somewhere or spend so much time on pedantry that we have no time left for meaning, and I don't have a problem with people who choose to draw the line before worrying about internal capitalisation.

nkohari · on Dec 1, 2011

Sometimes punctuation and capitalization is important. sometimes its not

alexyoung · on Nov 30, 2011

"writing functions to return a certain thing just to cater to how you like to write programs is hacking around a missing language feature"

To me chaining demonstrates just how flexible JavaScript is, rather than pointing out a fundamental missing language feature. In fact, by avoiding adding language features like this, I feel like the language is simpler and allows me to be more creative within its constraints.

raganwald · on Nov 30, 2011

    Java(S|s)cript != Coffee(S|s)cript

:-)

Lisp does not consider indentation significant, Python does. Smalltalk has cascading messages, Ruby doesn’t. I think these are simply design choices, and the goal is to find a set of choices that work together harmoniously.

Note that my proposed syntax still allows you all the chaining you want.

PLejeck · on Nov 30, 2011

Nice title change, much better than the racism from before, but I'm afraid my spacebar, being white, is offended.

Additionally, my previous opinion (http://news.ycombinator.com/item?id=3296010) still stands, that whitespace-significance isn't such a good thing, and this whole "YAY TREES" stuff is overrated.

raganwald · on Nov 30, 2011

I regularly use languages where whitespace is not significant. However, in those languages, whitespace is not significant. It isn’t significant some of the time and not significant some of the time. It is a separator all of the time.

Coffeescript is a language where whitespace is held out to be significant, so I’m simply saying “Great! Well in that case, let’s make it more significant."

I have no argument with the idea that perfectly good programming languages do not consider whitespace significant.

PLejeck · on Nov 30, 2011

I don't think whitespace should EVER be significant, it seems like a very bad setup that's more prone to issues. I also have an unnatural hatred of all things Compile-to-JS.

thedufer · on Nov 30, 2011

So, to take this to its natural conclusion, either you (1) don't think we should bother writing indentation, or you (2) think we should write indentation (presumably for our own benefit, since the language is indentation-ignorant) and then write some extra delimiter for the compiler.

In option (1), you're going against possibly the strongest majority opinion programmers hold. Feel free to make this case.

In option (2), you want us to write the same information in 2 different ways - one for the reader, one for the compiler/interpreter. If you want to make a case against DRY principles, I would love to hear that as well.

Is there a third option I'm missing?

fennecfoxen · on Nov 30, 2011

I'll make a case against DRY principles.

A programming language isn't just about the code. It's about the programmer. What the programmer reads and what the programmer writes are an essential part of programming, and we should look at it not just from a coder point of view (DRY principles) but from a user-experience point of view, where redundancy is frequently helpful!

Consider: Displaying whitespace (indentation whitespace, that is) to the user is surely essential for basic code readability; our human brains like this stuff. But the quality of our perception of is limited. It's easy to tell when something is a few spaces further in, but can you tell at an instant's glance whether something's indented by 8 or 12 spaces when it's preceded by a paragraph indented by 32 spaces? Because that tells you which flow control construct you just closed. (For bonus points, the start of that construct is off the top of your screen.)

Does counting invisible spaces and lining things up with a ruler sound like a great way to figure out the flow of a block of code? I say Meh. Whitespace is a poor medium for communicating something precise like the flow control of a program. The end to code blocks is something important enough that the marker should be something visible, not invisible.

And if that means repeating myself, so be it. But this is a repetition that can be trivially automated. Instead of making whitespace into syntax, go the other way around and turn your syntax into whitespace with your IDE or a code prettifier.

--

Now, from all I hear, the ever-popular Python programming language already does plenty with whitespace, so clearly it's certainly not impossible to work with. But I don't like it :)

icebraining · on Nov 30, 2011

but can you tell at an instant's glance whether something's indented by 8 or 12 spaces when it's preceded by a paragraph indented by 32 spaces?

No, but the problem there isn't the whitespace per se, but the 32 spaces. To quote Linus on the Linux Coding Style guide:

    If you need more than 3 levels of indentation, you're screwed anyway,
    and should fix your program.

And the Zen of Python:

    Flat is better than nested.

In conclusion, if telling nested levels apart is a problem, that's a code smell, not a problem of the indentation system.

extension · on Nov 30, 2011

Is counting delimiters any easier than measuring indentation? Not really. You will just fall back on the indentation in the end anyway. The delimiters never make anything clearer, they just help push the opening line off the top of the screen.

ansible · on Nov 30, 2011

What we need are better editor setups that can slightly shade the backgrounds of nested blocks to clearly indicate indent levels and easily enable the programmer to see what matches to what.

Something like this for VIM:

https://github.com/nathanaelkane/vim-indent-guides

Screenshots at the bottom of the page.

jholman · on Dec 2, 2011

commenting to save for later (sorry)

jtc331 · on Dec 1, 2011

/Counting/ delimiters isn't necessarily any easier for our brains than measuring indentation.

However, it is (at least for me) much easier for your eyes to line up two delimiters then a keyword/function call/literal/whatever and the end of the block it introduced when that block's termination is implicit from indentation. With the delimiters, I have two similar things to line up visually. With indentation-significant, I have to line up a delimiter with blank space, which is difficult.

extension · on Dec 1, 2011

You're lining up the statement that opened the block with the statement that comes after the block. Should be easy.

And in the most common code style, delimiters don't line up anyway, because the opening delimiter is at the end of the line.

thedufer · on Nov 30, 2011

I guess my feeling is, why automate something that doesn't need to happen at all? If you need to see your whitespace, it should be trivial* to make it visible. Thus, I agree with the comment about brackets being no easier to count than spaces.

*Assuming you're using a reasonable editor.

fennecfoxen · on Dec 1, 2011

It's true; you could use syntax highlighting or IDE features to make things more obvious. You could also turn on syntax highlighting in Whitespace.

http://en.wikipedia.org/wiki/Whitespace_(programming_languag...

This is, of course, significantly more ridiculous than an IDE helping you out with your CoffeeScript, so don't take my teasing too seriously ;)

thedufer · on Dec 1, 2011

My response was meant in much the same tone as yours; I use vim and, while I have never needed visible whitespace, I don't have any indication that its an easy thing to do.

mayoff · on Nov 30, 2011

Option (3): require both braces and indentation, with a syntax error when they don't match.

thedufer · on Nov 30, 2011

I don't see this as different from option 2. Failures will just happen at compilation-time rather than run-time. This has the slight advantage of making those errors easier to track down, but still violates DRY.

fennecfoxen · on Dec 1, 2011

Is DRY as important in syntax? I thought the main point of DRY is to avoid writing the same code over and over (code duplication is bug duplication, etc). Are you sure those reasons still apply here, or are we being overly dogmatic in our application of this principle?

(though obviously no one likes pointless repetition or obnoxious syntax.)

thedufer · on Dec 1, 2011

Yes, it is important in syntax. You're still entering the same information twice (this time in two different ways, rather than two different places), which allows the possibility that the two conflict (the main problem that DRY prevents), resulting in something bad.

fennecfoxen · on Dec 1, 2011

Some people would say that when a compiler blows up telling you "ur doin it rong", that's a good thing.

This is surely why there's so much type information floating around a programming language like Java. (Of course, Java is a little extreme. It's obnoxious to be forced to catch/declare-that-I-throw 50 different exceptions when I do any file I/O, and doesn't bring much to my typical use case. I don't use Java much. :)

artsrc · on Dec 1, 2011

These should all be compiler errors because the indentation and the delimiters don't match.

    if (expression)
        statement;
        statement;
    statement;

or:

    if (expression)
        statement;
        if (expression)
            statement;
    else
        statement;

or

    if (expression); {
        statement;
        statement;
    }

So curly brace languages should have a defined formatting standard and anything else should be a syntax error.

mikeklaas · on Nov 30, 2011

How is it "more prone to issues"? Have you have significant experience in an indentation-significant language like python? As someone who has written 100k's of lines of code in python, I'm surprised how rarely issues arise in practice.

The thing is, indentation needs to be correct in "brace" languages for them to be understood correctly, anyway. The famous "dangling if" problem for instance:

  if (condition) 
      x = 1
      y = 2

The better objection to indentation significance, in my opinion, is that it makes it very difficult to find an acceptable syntax for anonymous blocks (which is the main reason python doesn't have them, I reckon).

tomp · on Nov 30, 2011

Which programming language do you prefer then? Don't say C, because in C, whitespace obviously is significant:

  int a = 1;

vs.

  inta=1;

other languages, like CoffeeScript and Python, simply take it to the next level.

agscala · on Nov 30, 2011

I think your example is a little too much. Whitespace isn't significant in C. When talking about programming languages, I don't think whitespace refers to spaces between tokens. It's mostly a reference to indentation.

masklinn · on Nov 30, 2011

> Whitespace isn't significant in C.

As demonstrated, it is.

> When talking about programming languages, I don't think whitespace refers to spaces between tokens.

But that makes no sense, it is whitespace, and it has semantic significance (hence being significant). Whitespace was not significant in older versions of Fortran, and that allowed you to write

    DO30I=10,100

which was interpreted as

    DO 30 I = 10, 100

that is non-significant whitespace.

Ruby has long struggled with how it interpreted its whitespace, for a long time

   sin (x) + y

would be interpreted as

   sin(x + y)

for instance, rather than

    syn(x) + y

how is that not significant whitespace?

tordek · on Nov 30, 2011

But that makes no sense, it is whitespace, and it has semantic significance (hence being significant). Whitespace was not significant in older versions of Fortran, and that allowed you to write

    DO30I=10,100

which was interpreted as

    DO 30 I = 10, 100

An amusing bug I saw in Expert C Programming mentioned how somebody once typed e dot instead of a comma, and

    DO 30 I = 10. 100

ended up interpreted as a simple real assignment:

    DO30I = 10.1

rayiner · on Nov 30, 2011

He has a point. Whitespace is definitely significant to C's lexer, if not it's parser. When people say "significant whitespace" they're implicitly talking about "implicitly significant to the parser as well as the lexer."

PLejeck · on Nov 30, 2011

When we say whitespace, we usually mean multiple spaces at the start of a line, something known as "indentation" ;)

cube13 · on Nov 30, 2011

Whitespace does not control program flow in C. It only is used to separate tokens. It is not used to control program flow(like coffeescript and python).

Note that:

  int a = 0;

and

  int a=0;

and

  int                       a             =       0;

are all equivalent.

thedufer · on Nov 30, 2011

s/whitespace/indentation/

You know that's what is being discussed.

dextorious · on Nov 30, 2011

"""Coffeescript is a language where whitespace is held out to be significant, so I’m simply saying “Great! Well in that case, let’s make it more significant.""""

Ever heard of the concept: "too much of a good thing"?

scott_s · on Nov 30, 2011

Responding to what you said before: I also believe that representing everything as a tree (why are programmers so obsessed with trees!?) is ridiculous: code is, quite simply, code. Outline structure or not, what matters is the actual meaning of the code, and this is part of your mental model of the program, not the syntax.

Compilers represent your code as a tree - a parse tree first, then an abstract syntax tree. That is how code is understood. It's not an arbitrary decision on raganwald's part.

perfunctory · on Nov 30, 2011

I like it