Hacker News new | past | comments | ask | show | jobs | submit login

Much this, use case. Statistically, my use case is 95% cat cat cat cat... (hundreds of them) and then seqwrite of some form, never accessing it randomly. I'm generating json, xml, sql, etc to send it over socket. Sometimes it is utility library, sometimes me (or me writing utility library). My string type is always an array, but this type-shift is not obvious to new coders.

"Why StringBuilder" was once very popular on SO, not sure for now. I oversimplified editors example, their structures are not for 'everyday' use, but I think that simple linked list in string would wipe the entire obvious class of cat-in-loop performance issues.




> Statistically, my use case is 95% cat cat cat cat... (hundreds of them)

If you use a linked-list representation like you proposed, your performance would die a death by 1000 allocations. A simple string class backed by an exponentially growing contiguous memory buffer (i.e., what 99% of string classes do) sounds much better from a performance standpoint. Especially if you are able to know (or estimate) the size of your response, and allocate once upfront.


I'm surely able to write a C routine that uses custom 1.5x-growing buffer and does everything, including subexpr indentation shift etc, in one pass.

The point is that I cannot delegate this work to newbie/lowcost who will cat-cat-cat strings allocated by other newbie thousand times with no idea that it is slow and consuming. Even telling them how to do it properly I'll spend more machine cycles than simply sending it in production. Even then, very source strings are not under my control and I'm no world dictator to tell a bunch of maintainers who don't even know me what they should do for my performance case. It is real world, not my cool homegrown toolkit.

It is not how-to talk, it is why not make already inefficient method slightly more forgiving by appropriate means.

If you don't like a list, take an array, but I like to see tests against it before taking that "death" argument. Maybe tomorrow I'll test all three (multicat, list, array) of immutable strings here.


https://pastebin.com/APHt8C5V (valid for 1 month)

In short, 1M iterations of 1) s=s..x, 2) insert(t,x)+concat(t) 3) l={l,x}+concat(flatten(l)). Time in integer seconds passed.

  > lua5.1.exe x.lua
  ..      1850000 bytes, 10000 iterations         -- n / 100, since it never ends with 1M
  time:   23
  array   185000000 bytes, 1000000 iterations
  time:   5
  list    185000000 bytes, 1000000 iterations
  time:   6                                       -- seems pretty alive
edit: table.concat() seems to eat most of the time here, so lists are effectively instant, as are arrays for 1M. (Full GC included in all test times.)

edit2: on n=10M without actual concat(): array takes 10s, list 19s. Conclusion is, malloc() is not too slow compared to 2x realloc step used in Lua tables.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: