Hacker News new | past | comments | ask | show | jobs | submit login
Never create Ruby strings longer than 23 characters (patshaughnessy.net)
357 points by there on Jan 4, 2012 | hide | past | favorite | 109 comments



He's nearly there, but the reason for the number 23 is staring you right in the face. It's not arbitrary. A heap allocated string requires 8 bytes for the length, 8 bytes for the pointer to the string, and 8 bytes for the capacity / reference count. The sum is 24 bytes.

So you either use it as a length/pointer/capacity/refcount struct, or save the malloc and use the 24 bytes directly for 23 chars and a NULL terminator.


I think Pat actually points that out:

The value of RSTRING_EMBED_LEN_MAX was chosen to match the size of the len/ptr/capa values. That’s where the 23 limit comes from.


I apologize, I accidentally downvoted you instead of upvoting.


use the 24 bytes directly for 23 chars and a NULL terminator.

Ruby strings aren't null terminated though

EDIT: So I went and looked at the code, they are null terminated, but as far as I can tell, Ruby doesn't rely on this directly.


Pardon?

The code is right there in the article -- in this case, for this specific kind of string, at the C level, they very clearly are null-terminated strings.

No, you don't get to see that at the Ruby level, it's all nicely abstracted away, but that is exactly what is happening inside the VM.


Not really. RString operates in the same way std::string does - it has a character array and it has a member variable denoting the length.

It's not null-terminated. You can store a sequence of nulls and that will not affect the result of std::string.size()

In C, you'll be forgiven for thinking it was null terminated, because attempting to assign a std::string a value from a null-containing array of characters would terminate the copy upon reaching the null, but that's only because the original char array is null-terminated when read as a C string.

However, you can manually construct a std::string with \0 sequences in the middle and that will not terminate the string, nor affect the separate length calculation. The same applies for Ruby's RString.

So that was the reason they're not null-terminated. Now the reason why they technically are (i.e. the reason NULLs are stored at the end of the string) is for compatibility and optimization. At the cost of one byte per string (for the trailing \0), we get instant compatibility with non-RStrng/std::string functions. If a function needs a C string, we can just pass the internal pointer to the character array - no need to copy the string to a temporary buffer and append a null.

Therefore, while null-termination is absolutely NOT required when dealing with an exclusively counted-length implementation of C strings (a la RString, CString, std::string, etc.) if you can just pass the pointer and the length separately, it would be a ridiculously foolish optimization for a general string implementation to NOT have the option of directly exposing the underlying null-terminated string to any functions that need it, with the caveat that null-containing counted strings will obviously terminate sooner than expected.


You seem to confusing C and C++, there is no std::string in C and MRI is written in C.

A 'C String' is by quasi definition, a segment of memory that can be properly processed by the string functions in the C standard library, which requires null termination.

>Now the reason why they technically are (i.e. the reason NULLs are stored at the end of the string) is for compatibility and optimization.

Really its just because they are C Strings, that is they use the C standard library string functions, if you want to use them, you must null terminate.

>Therefore, while null-termination is absolutely NOT required when dealing with an exclusively counted-length implementation of C strings (a la RString, CString, std::string, etc.)

None of those are implementations of "C Strings", they aren't even available for C.

The determination as to whether your using null terminated strings or not comes down to the String library your using. If your on C, your probably using C std lib and need to null terminate your 'strings'. There really isn't much more to it than that.


No, I'm perfectly well-versed in the differences between C and C++, having written in one or the other for a long time. A trivial look-alike implementation of std::string can be written in C, and would look a lot like the RString class.

Your argument is actually, essentially mine. The need to use the platforms' string functions heavily swings (but does not force) the choice of null-terminating the RString members. As I mentioned, it would be really stupid but entirely possible to simply clone the non-null-terminated string into a temporary null-terminated char array every time you want to use a function that takes standard "C strings" if you really, truly, madly wanted to have an RString implementation that was one byte smaller to store. But that would be insane.


Well I dunno how they could possibly be null terminated if you can stick a null in any old string:

    >> "hello\0world".length
    => 11


That could be explained with ruby doing escaping of null characters automatically in the background, although I don't believe that to be the case.


It seems like the length is stored inside RString's RBasic->flags:

  #define RSTRING_EMBED_LEN(str) \
     (long)((RBASIC(str)->flags >> RSTRING_EMBED_LEN_SHIFT) & \
            (RSTRING_EMBED_LEN_MASK >> RSTRING_EMBED_LEN_SHIFT))


but as far as I can tell, Ruby doesn't rely on this directly

They may be null-terminated to make writing C extensions easier. If they weren't, you'd have to make a null-terminated copy every time you wanted to pass the bytes to a function expecting a null-terminated string, which most C APIs do.


They probably are when they are inlined like this, else, how would you know the length?


The length is stored in a flag.

include/ruby/ruby.h:

    #define RSTRING_EMBED_LEN(str) \
         (long)((RBASIC(str)->flags >> RSTRING_EMBED_LEN_SHIFT) & \
                (RSTRING_EMBED_LEN_MASK >> RSTRING_EMBED_LEN_SHIFT))

    #define RSTRING_LEN(str) \
        (!(RBASIC(str)->flags & RSTRING_NOEMBED) ? \
         RSTRING_EMBED_LEN(str) : \
         RSTRING(str)->as.heap.len)


I read the title and thought that I was going to see some really stupid design decisions. To the contrary, it's vary clean and smart.

It's not that strings over 24 are slow, it's that strings below length 24 have extra optimizations.

It's a great article, but I really, really despise the linkbait title.


I work with Pat and kicked around some title ideas with him yesterday. It's interesting to see a title that he created out of a sense of fun and excitement being perceived as linkbait. I can see why that perception would arise, but I hope people think of him as a good writer trying to have fun rather than some dude writing linkbait titles trying to get attention.

Some other title ideas:

* 23 characters ought to be enough for anybody

* 23: How Twitter could have solved their ruby scaling problem


Heh. I would have gone with "Ruby Optimizations: Why short strings are an order of magnitude faster" or something.


No better. It is becoming a real problem if factual accurate titles don't go through. Imagine, I'm on a phone in China, most articles never load, I have to rely on title and comments to now what we are talking about.


> It's a great article, but I really, really despise the linkbait title.

Generally I feel the same. Although, I think exceptions can be made for articles with real content.


I enjoyed the article too but the title is incongruous to the conclusion of the article, which openly states that it's ridiculous to suggest using strings < 24 chars in ruby applications, but just be aware what's going on under the hood


s/vary/very/

Sorry! Can't edit it now, either. :$


Terrible title, but the content is quite good. Ruby programmers, at least here, should have enough foundations to be able to understand these "deep" dives into the interpreters, and the more you understand the hows and whys of the tools you build on the better your end product will eventually be


Language enthusiasts in general, not just Ruby programmers will enjoy this article, I think. In any case, I got to see a cool little optimisation that I hadn't thought of before.


Thanks a lot for the nice comments, guys! Sorry about the "link bait" - I really was just so surprised that the limit was 23, such a strange number, that I just had to put it in the title of the post. I didn't expect it to end up on HN.


Unicode of course takes up more space and fills up your buffer sooner. Looks like the jump happens after 8 chars.

  Benchmark.bm do |bench|
    run("と", bench)
    run("がと", bench)
    run("りがと", bench)
    run("ありがと", bench)
    run("ありがとあ", bench)
    run("ありがとあり", bench)
    run("ありがとありが", bench)
    run("ありがとありがと", bench)
    run("ありがとありがとあ", bench)
    run("ありがとありがとあり", bench)
    run("ありがとありがとありと", bench)
    run("ありがとありがとありがと", bench)
  end

               user     system      total        real
  2  chars  0.210000   0.000000   0.210000 (  0.212420)
  3  chars  0.200000   0.000000   0.200000 (  0.199957)
  4  chars  0.200000   0.000000   0.200000 (  0.199356)
  5  chars  0.200000   0.000000   0.200000 (  0.199142)
  6  chars  0.200000   0.000000   0.200000 (  0.198047)
  7  chars  0.190000   0.000000   0.190000 (  0.198984)
  8  chars  0.190000   0.000000   0.190000 (  0.196917)
  9  chars  0.250000   0.000000   0.250000 (  0.245808)
  10 chars  0.240000   0.000000   0.240000 (  0.247153)
  11 chars  0.250000   0.000000   0.250000 (  0.248083)
  12 chars  0.250000   0.000000   0.250000 (  0.247753)
  13 chars  0.240000   0.000000   0.240000 (  0.250674)
Grabbed those unicode chars from http://blog.trydionel.com/2010/03/23/some-unicode-tips-for-r... no clue what that says.


I'm going to split some hairs, because it matters for the topic at hand.

>Unicode of course takes up more space and fills up your buffer sooner. Looks like the jump happens after 8 chars.

It sounds like you are conflating unicode with UTF-8. There is more than one way to represent the unicode code points, and UTF-8 is one of them. Further, it seems like you assume that "unicode characters" have a constant size. This is a potentially dangerous misunderstanding of how UTF-8 works. UTF-8 code points have a variable number of bytes (from one to four bytes, IIRC.) You happen to have copied some code points that take 3 bytes each.

The UTF-8 encoding scheme is a great compromise, and the wikipedia article is easy to follow: http://en.wikipedia.org/wiki/UTF-8


I also used to believe Unicode and UTF-8 were different types of encoding until someone corrected me. I just remembered why I had thought such a thing in the first place:

http://msdn.microsoft.com/en-us/library/system.text.encoding...


You and probably everyone else the first time they encounter unicode / UTF-8. I wonder if it's because both terms start with 'U'.


Thanks for correcting my errors.


> no clue what that says.

I think it's supposed to be ありがとう ("arigatou"), which is is Japanese for "thank you" (the informal version). The characters are hiragana.

I have no idea why they left off the う at the end, though, except that it just turns the last o into a long o and many foreign speakers are insensitive to the difference between the two because they sound almost the same.


It's pretty common to leave out the う in informal speech.



I don't use it. For Japanese, try Rikaichan (a Firefox translation plugin) which lets you look up unfamiliar kanji without leaving the website. You can also look words up on http://www.alc.co.jp/ but it might be hard to use unless you can read at least basic Japanese.


Yes, hiragana are from an unicode range that encodes to 3 UTF-8 characters each, so ruby appears to use UTF-8. ありがと is "arigato".


You probably mean "three bytes in UTF-8", talking about "UTF-8 characters" is really confusing.


You're absolutely right, I should have said "three bytes" or "three octets".


ありがとう -> thanks, note you are missing the う on the end.


You're welcome.

That's what I get for checking up on my limited unicode knowledge before posting, and then just C&Ping.

Thanks.


Cool! Thanks for taking the time to test this - while the idea of testing with unicode chars occurred to me while writing the post, I didn't have time to get to it...


Interesting deep dive; but remember that calling :+ to append to strings is a pessimization (in both Ruby and Python); :join'ing a list of 2 strings is about as fast as :+'ing them together, but :join'ing 3 strings is about twice as fast.


I was curious, so I benchmarked this with Ruby MRI. join('') beat reduce(:+) even on two strings, but is twice as fast only on four or more strings.

   $ ruby -v
   ruby 1.8.7 (2010-01-10 patchlevel 249) [universal-darwin11.0]
   $ ruby benchmark.rb
         user     system      total        real
   add  2  0.390000   0.000000   0.390000 (  0.385910)
   join 2  0.300000   0.000000   0.300000 (  0.305812)

   add  3  0.530000   0.000000   0.530000 (  0.527388)
   join 3  0.320000   0.010000   0.330000 (  0.329390)

   add  4  0.720000   0.000000   0.720000 (  0.720500)
   join 4  0.350000   0.000000   0.350000 (  0.353237)
Code: https://gist.github.com/1562617


In case you were wondering why, it is most likely* because `#join` is appending to a single string, whereas the `#reduce` call is creating intermediate strings for each step of the fold.

* I'd have to check the implementation of `#join` to be sure.


I thought so too, but appending to a single string does even worse than reduce(:+):

  add 2  0.390000   0.000000   0.390000 (  0.390529)
  +=  2  0.540000   0.000000   0.540000 (  0.537450)

  add 3  0.530000   0.000000   0.530000 (  0.534131)
  +=  3  0.760000   0.000000   0.760000 (  0.752280)

  add 4  0.660000   0.000000   0.660000 (  0.668154)
  +=  4  0.960000   0.000000   0.960000 (  0.954727)
Code: https://gist.github.com/1562994


+= is not append

  str += append_str
is the equivalent to:

  str.dup << append_str
That is, it copies the string and then appends to the copy. Benchmarking the speed of a true append is difficult because in order to preserve the original string in the benchmark you must dup it anyhow (done here outside of the bench.report block). However, the bigger the strings get, the more pronounced the advantage of appending is.

                user     system      total        real
  add  2      0.260000   0.000000   0.260000 (  0.263414)
  join 2      0.320000   0.000000   0.320000 (  0.325341)
  append 2    0.230000   0.010000   0.240000 (  0.235669)

  add  3      0.500000   0.000000   0.500000 (  0.497219)
  join 3      0.840000   0.020000   0.860000 (  0.866401)
  append 3    0.260000   0.010000   0.270000 (  0.268676)

  add  4      0.360000   0.040000   0.400000 (  0.397778)
  join 4      0.970000   0.030000   1.000000 (  0.997321)
  append 4    0.280000   0.000000   0.280000 (  0.281020)

  add  5      0.460000   0.000000   0.460000 (  0.454780)
  join 5      0.910000   0.030000   0.940000 (  0.946600)
  append 5    0.330000   0.000000   0.330000 (  0.336214)
Code: https://gist.github.com/1563290


I thought I remembered reading somewhere that Python optimizes this so that both forms are basically equivalent, performance-wise. At least in relatively recent versions of Python.


It's a bit more complex, it's implementation-specific, it's definitely a hack and it's not very reliable:

* It's a CPython-only optimization

* It does not work if the first member of the concatenation is an interned string (such as a literal string, so `s = s + "abc"` may trigger the optimization but `s = "abc" + s` will never do so)

* It only works if there's a single reference to the string in the system.

* It only works for the shapes `s = s + s1` and `s += s1` and no other[0], including (excluding?) `s = s + s1 + s2` or `someFunction(s + s1)`

The way it works is that, when it sees `s = s + "abc"` or `s += "abc"` and there's a single reference to `s` (this one), CPython will call `realloc(3)` on the string's internal buffer (technically it calls `_PyString_Resize` which calls `PyObject_Realloc` unless you've compiled python --without-pymalloc then it calls a macro over realloc(3), but you get the point) instead of allocating a brand new string. And it does not work on unicode objects in CPython 2 (only str, it works on both str and bytes in CPython 3).

Interestingly, this optimization was introduced by Armin Rigo[1], one of the lead Pypy devs for which this CPython optimization is an issue (it leads developers to use more string concatenation, which pypy can not really optimize as it does not use refcounting semantics)[2]

[0] http://bugs.python.org/issue980695#msg46258

[1] http://bugs.python.org/issue980695

[2] https://bugs.pypy.org/issue814 (nb: pypy's bug tracker displays the oldest comment at the bottom)

PS: I don't think I've noted how much HN's comment-editing "facilities" suck today. They really, really suck. What's a man got to do for markdown support in order to make formatted comments readable?


You could hack markdown support into the HN code base. It's open source. And once you've written it, you should have a decent chance of lobbying for its inclusion into the copy running news.ycombinator.com


He would have very little chance of getting that support included; HN's code is "markdown" in name only, and is hand-parsed.


Why would the austerity of current support hinder his lobbying efforts for his improved version? If compatibility is a problem, you can just run comments before flag day through the old code. Or write a converter.


You are thinking about HN as if it was just a piece of software. It isn't just a piece of software; it's also:

* an important piece of YC's infrastructure, so it doesn't have a normal dev team; some of the data going through HN is "hazmat".

* a demonstration of a new programming language Graham is ostensibly working on, which implies that doing things "the right way" is more important than on a typical project.

* in part a secret (the published HN code is not the code running on HN).


Since it only supports emphasis, paragraphs and auto-recognizing links it's clearly not even remotely close to markdown, even in name.


This was interesting and caused me to go off on a neat little tangent too.

I was curious about the VALUE declaration in:

    struct RString {
      long len;
      char *ptr;
      VALUE shared;
    };
From the Hacker Guide referenced, I found this definition:

    typedef unsigned long VALUE;
This is then casted, when needed, to a pointer to whatever type of struct you are dealing with. How does that work?

Well, on a 32-bit machine, if I'm correct in my reading, an unsigned long is 4 bytes in size and can contain these numbers: 0 to 4294967295

That last number is 4 gigabytes, which is the size of byte-addressable memory on a 32-bit machine. So VALUE can point anywhere. Neat!


Fun fact: a VALUE is not only a pointer. Ruby takes advantage of the fact that pointers are aligned to a certain boundary (so a few of the least significant bits are always zero) and stores flags in the least significant bits.

For example, if the LSB is 1, then the VALUE isn't a pointer - it's a Fixnum. This is why the maximum size of a Ruby Fixnum is one bit less than the pointer size on the machine.


The Objective C runtime on MacOS 10.7 also does this for NSNumber objects.


This is called 'type tagging' or just 'tagging'; needless to say, Lisp and Smalltalk and other interpreted language implementations have been doing it for decades now, and at one time there was direct hardware support for it in some architectures.


Guile manual has nice details about data representation:

http://www.gnu.org/software/guile/manual/html_node/Data-Repr...

Tagging: http://www.gnu.org/software/guile/manual/html_node/Faster-In...

Another interesting technique is NaN-tagging (used in LuaJIT): http://wingolog.org/archives/2011/05/18/value-representation...


Right! The 68k supported type tagging by having 24 bit addresses on a 32 bit machine. I'm not sure if allowing tagging was intentional when Motorola designed the 68k, but Mac OS did store flags in the upper bits of a pointer at some stage.


It was a quirk due to the number of address lines on the low end models, and it was a massive code smell, as 68020 and up could use 32 bit addresses so any code that did this would fail on machines with the faster CPUs.

Amiga Basic for example, wouldn't run on 68020 up because Microsoft used the upper 8 bits (amongst a whole slew of other horrible bugs and performance problems - it's probably the worst Microsoft product ever in terms of code quality)


> The 68k supported type tagging by having 24 bit addresses on a 32 bit machine.

Not quite: Having hardware support for type tagging means the machine knows that some bits are reserved for the tag and will therefore do arithmetic in a tag-preserving fashion. The 24-bit pointers in early m68k chips was just saving 8 bits worth of address lines.

The SPARC had tag support in hardware, in the form of special arithmetic opcodes. Further explanation:

http://compilers.iecc.com/comparch/article/91-04-088

> I'm not sure if allowing tagging was intentional when Motorola designed the 68k

Definitely not; it was a cost-saving measure, and a lot of software broke when a later processor made pointers fully 32-bit.

> but Mac OS did store flags in the upper bits of a pointer at some stage.

...and having a 32-bit-clean Finder was a big deal for a while because that meant it could run on the newer hardware.


That seems like a slightly dubious choice to me, since unsigned long isn't guaranteed to be as long as a pointer; for example, under Microsoft's 64-bit compiler it's still only 4 bytes, but pointers are obviously 8. Possibly there is some extra preprocessor cleverness to deal with it, or maybe they don't expect it to work under that compiler?


Yeah, there's preprocessor cleverness. It will choose either of unsigned, unsigned long or unsigned long long. There's a "#if defind HAVE_UINTPTR_T && 0" section, I'm curious why they're effectively disabling that.


Isn't this what C99's uintptr_t type tries to address?


Yes, exactly. I'm just a little curious since, without knowing anything about the code at all, unsigned long doesn't seem like a great choice. I'm sure the Ruby devs would know this though so I expect there must be some reason for it.


Here's the code: https://github.com/ruby/ruby/blob/trunk/include/ruby/ruby.h#...

It looks like uintptr_t support is there, just disabled for now.


Thanks - the fallback to long long explains it for me.


This article should really have the word "stack" in it someplace.


Not knowing much C, I was confused about why malloc() wouldn't get called for the RString structure. This is elucidating:

http://www.cs.usfca.edu/~wolber/SoftwareDev/C/CStructs.htm

Particularly this part: "// automatic allocation, all fields placed on stack"


Even if you're not putting the RString struct on the stack, the embedded string optimization means calling malloc() just once instead of twice.


You can allocate the string with one malloc:

    RString *s = malloc(sizeof(RString) + length); /* allocate 'length' bytes extra memory past the end of 's' */
    s->data = s + 1; /* to the extra memory past the start of the string struct */


Ruby strings are mutable, so you need to be able to free s->data without freeing the RString itself.


    s = realloc(s, sizeof(RString));


MRI objects are not relocatable so that won't work if realloc has to move the structure in memory


Nitpick: sizeof(RString) + length may overflow size_t.


Link bait titles irk me. Nevertheless nice post on some of the MRI internals.


Cool dip into Ruby internals.

If you roll your own ruby, instead of redoing all your strings, you could just change RSTRING_EMBED_LEN_MAX. This would cause more wasted memory if you have a lot of short strings (0 < len << RSTRING_EMBED_LEN_MAX), and probably isn't worth it since there isn't much performance improvement.

The most confusing part of this article was the actual RString struct implementation. Are the anonymous unions and structs used to control structure padding and alignment?


Values in interpreters written in C are frequently implemented as (manually) discriminated unions - i.e. unions that share a field at the start to indicate the type and contents of the remainder - because that's a handy way of implementing the polymorphism required for a straightforward interpreter. It's pretty much necessary to use structs inside unions in order to have a more than one field per layout; the struct is just grouping, so it doesn't need a type name.

So without looking at any of MRI source, I'd be willing to guess that most, if not all, of its structures representing Ruby values start with a field of type RBasic, and that type contains information necessary to distinguish and interpret the remainder of the value.


Yeah. I just looked at the source because I was confused about how it knew how long the embedded string was (since the length field is in the other half of the union and ruby strings can have embedded \0 bytes), and RBasic is a struct containing a VALUE referring to a class and a VALUE "flags" that tends to have a lot of bit fiddling done to it.

Apparently out of the non-reserved bits, one is used to tell whether the string is embedded or not and five more are combined to give the string's length. Makes sense!


The "aux" union contains either the reference count or the string capacity. The idea must be that shared strings are immutable so the capacity is not meaningful. Conversely, if the string is not shared, the reference count is not meaningful.

The outer "as" union either contains the 24 byte char array or the 24 byte (len + ptr + aux) heap string information.


Thanks! Cool idea about recompiling with a new value for RSTRING_EMBED_LEN_MAX - never thought of that!

Yes, the inner unions/structs are used to tell the C compiler exactly where the different data values should go. They are in fact not anonymous: e.g. "heap" and "aux" - the names appear right after the closing brace.


It's too bad that Ruby requires strings to be GC-ed, otherwise you could get away with an even faster stack string up to a certain size. Basically you would do the same thing as RString, but when you declare a string on the stack, there's no malloc unless you need >N chars.

https://github.com/dblock/baseclasses/tree/master/String for an implementation (a bit thick).


Why would str2=str create a new string?

How are RStrings modified when they are referenced by other shared RStrings?

Is there any way to create a shared RString other than calling .dup?


Great questions! To be honest I don't have enough knowledge of the MRI internals yet to be able to answer these 100% correctly. Maybe I'll write a follow up post or an update to this one explaining these issues when I have time for more research.

But for today:

1. str2 = str doesn't actually create a new string. RString represents a string value, not a string object. So str2=str creates a new RString because that essentially defines what str2's value refers to… Think of RString as an internal pointer to the value that str or str2 is referring to. Sorry - not a good explanation :(

2. How are shared RStrings modified? Not sure yet, but I'm curious to find out. Will let you know on my site somehow.

3. Yes, simply calling str2 = str or in a variety of other ways will do it.


I'm 99% sure you're wrong about 1). The internal pointer is the VALUE that is pointing to the struct RString. `str2 = str` will make another VALUE point to the same struct RString. Consider

  str = "foo"
  str2 = str
  str.object_id == str2.object_id #=> true
I'm fairly sure that implies that str and str2 share everything including the RString object, so it couldn't really have one of str/str2 have a field that points to the other.

I am not sure about 2) and 3) but I'd expect some copy-on-write mechanism based on a flag field in the RBasic struct, and taking substrings might be O(1) thanks to sharing too.


Here's a better explanation for #1:

In Ruby, most classes (including String) are reference types. This is why `str2 = str` doesn't create a new string. It doesn't create a new RString either - str and str2 are both pointers to the same RString struct.


AFAIK str2 = str will just share the RVALUE-Pointer.

I think to actually see shared strings behavior, you would have to do str2 = str.clone or str2 = str[0..-1].


I'll admit to only having skimmed much of this article, but that's a lot of words to say this:

"It turns out that the MRI Ruby 1.9 interpreter is optimized to handle strings containing 23 characters or less more quickly than longer strings."

The rest of the article seems to back that some with benchmarking numbers that suggest allocating a 23 character string is about 50% faster than allocating a 24 character string, which in this particular test worked out to about 200 milliseconds difference in the time it takes to allocate 1 MILLION strings, which makes the time savings about 0.2 picoseconds (200 nanoseconds) per allocation if I remember my SI units right.


It is also a nice introduction to C for rubyists -- it explains the basic ruby c object model and how different kinds of strings are laid out in memory and the implications that has on performance. Ultimately, the author shares your conclusion.


200ms / 1million = 200 nanoseconds, yes, but that's 0.2 microseconds. If you really want it in picoseconds, it's 200000 of those.


Robert Anton Wilson would be proud (http://en.wikipedia.org/wiki/23_enigma).


Does that maximum length need to be a magic number, or could it be an expression written in terms of `offsetof` and `sizeof`?


It could be (and probably should be):

   sizeof(long) + sizeof(char *) + sizeof(long)
But it's possible the DEFINE for RSTRING_EMBED_LEN_MAX is that and not a fixed number. You need the DEFINE since it would be needed elsewhere to check the length of a string before storing it.


Link-bait title…

Author directly contradicts the title in the tl;dr at the end.


The way wtn said it might be a bit provocative, but here's the last paragraph of the post:

""" I don’t think you should refactor all your code to be sure you have strings of length 23 or less. That would obviously be ridiculous. The 50% speed increase sounds impressive, but actually the time differences I measured were insignificant until I allocated 100,000s or millions of strings – how many Ruby applications will need to create this many string values? And even if you do need to create many string objects, the pain and confusion caused by using only short strings would overwhelm any performance benefit you might get. """

So, basically, "Never create Ruby strings longer than 23 characters" is not true at all. In some specific cases, it might be true. A more accurate title might have been "Why ruby strings with more than 23 characters are handled differently." (Or something similar)

But then, it was an interesting post and I enjoyed it; so it doesn't really matter I guess.


I have this sinking feeling reading that article took more time than I'll ever save by knowing this.

Computers go unimaginably fast now. Really. Humans can't intuitively comprehend how fast it is. I doubt that this fact will save perceptible time for more than a dozen of its readers.


Okay, knowledge of this specific optimisation wouldn't be that useful except in some very unlikely scenarios. Nevertheless, I found this to be an interesting read about some Ruby MRI internals.

But I have to disagree with your attitude toward optimisations. Algorithmic improvements (and implementation optimisations, but generally less so) can improve performance by many orders of magnitude. Sure, sometimes it just doesn't matter, but there are plenty of cases when you should pay attention to this. For example: any web service that runs on more than a couple of servers. At some point, it will be cheaper to spend more time writing efficient software than to add hardware. Another example: mobile devices. Can you make your software 20% faster? Great, that means 20% more battery life.

I have a recent example which left a strong impression on myself. I have implemented a bot which plays Kalah[1] for a school project. I've used C and I've optimised the program as much as reasonable in a week or so. After I was done, I've looked on the web for a strong implementation to play against. I've found a bot implemented in Ruby. To my surprise, my implementation was about 4 orders of magnitude faster. A friend implemented the same thing (including algorithmic optimisations) in Java. Guess what, his was still 2 orders of magnitude slower than mine.

TLDR: Yeah, most of the time computers are fast enough. But if you have hotspots, it often pays off to optimise that code.

[1] http://en.wikipedia.org/wiki/Kalah


>I have to disagree with your attitude toward optimisations

How can you, you don't know it?

This is a tiny little fact about string runtime speed in something that is not the tight loop of the program in question running.

Optimizing on this basis is called a premature optimization. You've not pointed a profiler at the code, you don't know a significant portion of the runtime is in this string creation function, so you make larger, more complicated code that getting around this implementation detail of the current ruby interpreter.

Do you know if you made your program faster or slower by doing this change? No, you don't. You have to add this optimization at the end, if and only if you see that this routine is taking a bunch of time. Why?

Adding this extra code can make your program not fit in your processor cache, HUGE SLOWDOWN. Adding this extra code may make extra non-trivial work to work with data that dwarfs the small time saved by thousands (it wouldn't take much to do that). Adding this extra code can make your program require extensive weeks of rewrite and testing, costing thousands more dollars that could instead be spent on better hardware. Adding extra code to work around this could cause optimization in a compiler or library to not kick in, slowing your code.

When optimizing, you should be hand optimizing only the hotspots, not the rest. This article is talking about something that has almost no chance of being the hot spot, and acts like it's essential to never do it. Hence, it is untrustworthy, as the author did not know enough to know how unimportant this fact is to most people, or is dishonest enough to not express that to his audience while knowing it himself.


>> I have to disagree with your attitude toward optimisations > How can you, you don't know it?

This bit

> Computers go unimaginably fast now. Really. Humans can't intuitively comprehend how fast it is.

seems to imply that you think computers are so fast that performance shouldn't be more than an afterthought.

I'll repeat myself: I don't think anyone should explicitly try to exploit this implementation detail. It's just that you seemed to dismiss all performance considerations.

> When optimizing, you should be hand optimizing only the hotspots, not the rest. This article is talking about something that has almost no chance of being the hot spot, and acts like it's essential to never do it.

No, it doesn't. The first paragraph starts with "Obviously this is an utterly preposterous statement: it’s hard to think of a more ridiculous and esoteric coding requirement."


Maybe it's not entirely rational, but I'm not a fan of this attitude even disregarding the points about scaling and mobile devices in the sibling post.

Computers are unimaginably fast, but people still wince at software that feels unresponsive or has long startup times. Dynamic, interpreted languages like Ruby are still considered prohibitively slow and not (or rarely) used for system programming, fancy 3d engines or embedded programming. Some people allegedly still write web browsers in C++!

If dynamic languages want to grow their (unquestionably already significant) niche, they can't wait for hardware improvements to make the order-of-magnitude difference to C not matter, and they can't afford to pass up optimization opportunities (that their bare-metal competition is probably taking as well).

Now you can write that off as an implementation detail and ignore whether it buys you perceptible time savings or not, but I don't feel it's right to handwave away all optimisation considerations. The article is an interesting little insight in how things are done in the engine room and what kind of shenanigans are employed to ultimately make your RoR web app load a teensy bit faster, and I don't consider the time I spent reading it (and subsequently having a look at the surrounding areas in the code) wasted at all.


The fact this is only a 50% time increase is why casting the piece as an important optimization is frankly making the piece illegitimate.

If it took 10x or 100x the time, then you have something to talk about. 1.5x? Not Worth Mentioning. Definitely crappy hyperbole that linkbaited a bunch of people to waste their time reading it.

If you're worrying about 1.5x speedups in non-tight loop parts C++ code, I would contend you're probably looking in the wrong place. Even in a tight loop, 33% faster isn't much to write home about.

The thing about all this is: It doesn't make your web app load a "teensy bit faster". The time required to make millions of strings like this is still far far far below that of human detection.

I'd contend, if it made the memory footprint of the program any larger (with larger code pages), accommodating this could Very well slow your startup time down if it made the code page a bit bigger cause a processor cache miss to load another module you had to write to handle using artificially small strings.

This isn't handwaving at the issue of dynamic language runtime speed. That's a strawman of your own construction. This is me pissed off at such a dramafilled title being slapped on an inconsequentially small speed difference in Ruby strings.

I've been similarly pissed off in meetings where people spent 2 hours arguing for an "optimization" that would have saved a total of ~400 ms total if we sold 100 million units and they ran for an average of 20 years each.

This isn't about scaling. This isn't about optimization. This isn't about making dynamic languages faster. This is about saving functionally no time, ever, and usually wasting people's time and making the code slower with premature optimization.


Did you read the article? The very first sentence is

"Obviously this is an utterly preposterous statement: it’s hard to think of a more ridiculous and esoteric coding requirement."

Or how about this:

"Don’t worry! I don’t think you should refactor all your code to be sure you have strings of length 23 or less. That would obviously be ridiculous. The speed increase sounds impressive, but actually the time differences I measured were insignificant until I allocated 100,000s or millions of strings – how many Ruby applications will need to create this many string values? And even if you do need to create many string objects, the pain and confusion caused by using only short strings would overwhelm any performance benefit you might get.

For me I really think understanding something about how the Ruby interpreter works is just fun! I enjoyed taking a look through a microscope at these sorts of tiny details. I do also suspect having some understanding of how Matz and his colleagues actually implemented the language will eventually help me to use Ruby in a wiser and more knowledgeable way. We’ll have to see… stay tuned for some more posts about Ruby internals!"

I truly do not understand the emotional reaction you were having when you wrote this comment. It sounds like you've had some issues in the past with your time being wasted debating pointless optimizations, and that's what you were reacting to.

The article is not advocating pointless optimizations. The article is simply exploring a cool little piece of MRI.

It sounds like you have a lot of experience with dealing with optimization, though. It'd be cool if you wrote something educational about optimization.


Except in the most egregious cases, how many optimization articles ever save you more time than it takes you to read them? As you say, computers are unimaginably fast.


Indeed. I am just continually amazed at the lack of caution people make when expressing these sentiments

"Never use a ruby string longer than 23 characters!!!"....or you'll just take a slightly less infinitesimal amount of time.


I think the title is meant to be a hook.


so premature optimization in code is evil for more than one reason. First, it obfuscates the code. Second, the slowest processor around is the wetware, and it aint getting any faster. Making it simpler to understand/explain/write/compile/debug will save geometrically more time than it saves in almost every case.


Isn't this just the small string optimization with copy on write?


No - instead of allocating memory on the heap for the C character array aka "string", Ruby is the text itself directly in the RString object.

These short strings would actually NOT have COW optimization because they're structs and cloning one would clone its embedded string as well (as there are no pointers involved in < 24 byte strings).


isn't that the definition of the small string optimization?

This is how the dinkumware implementation of std::string has behaved for years. The basic form of the structure is a buffer and a pointer. If the pointer is filled in, it points to a heap string and follows cow semantics. Otherwise, the buffer is used, accomplishing the small string optimization.

I suppose I should have elaborated more. It just feels like the OP is "discovering" the wheel.


Absolutely. Keep in mind that the OP is coming from a Ruby background and TFA targets people more familiar with high-level interpreted languages than with C/C++.

BTW, I think though I cannot be sure that both the GCC and MSVC std::string implementations use this optimization in release mode, but I gotta dash and don't have time to verify this fact, so take it with a grain of salt, if you will.


good to know - thanks




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: