Hacker News new | past | comments | ask | show | jobs | submit login

Personally I see no difference between the current statement of

  sep.join(list) is idiomatic
and possible statement of

  s.flatten() before BM/SIMD is idiomatic
except that join is ubiquitous and BM/SIMD searching is never heard of by a regular developer in modern boring language. Not to mention that few of us who aware can implement or use it properly.

edit: searching/iterating methods could simply flatten() string on demand, because one doesn't search in semi-composed string [to trash performance back again]. If string doesn't have to be searched, but is for socket/file, writing methods could `s.fetch_next(BUFSIZ)` and skip huge concatenation before BUFSIZ-chunking-again at all. What's the reason to concatenate prematurely AND by hand, if it could be done automatically and only if needed?




> BM/SIMD searching is never heard of by a regular developer in modern boring language

Of course not, because it's usually implemented in the standard library of whatever language you're using. But if your strings aren't stored in contiguous memory, then having to flatten every string into contiguous memory before searching also presents a significant performance cost.

> What's the reason to concatenate prematurely AND by hand, if it could be done automatically and only if needed?

I feel like you're missing my point. My point isn't that "contiguous memory is better than not contiguous memory." My point is that "contiguous memory strings are bad design" is completely unwarranted outside of the context of a specific use case. Outside of specific use cases, the only thing we should be doing is discussing trade offs, and not declaring a monopoly on what constitutes "bad design."


> What's the reason to concatenate prematurely AND by hand, if it could be done automatically and only if needed?

First, it's simplicity (of implementation, explanation, etc. by far).

Second, it depends very significantly on your use case.

If your strings are of an average length of 4 bytes (not uncommon, even much shorter in code that does ''.join('<',tag,'>',body,'</',tag,'>') if bodies are short) then you are very likely to take some x4 to x10 memory compared to what you actually need (e.g. if you have allocations and 64-bit pointers).

With respect to your socket example, iovecs could be used instead of flattening if you have access to low-level OS primitives, but (a) it would be much slower if the sections are short, and (b) most programming languages and frameworks do not actually give you access to iovec (or equivalent) io calls, for either sockets or files.

Ropes have been available since forever in C++, going as far back as the original SGI STL in 1995, RogueWave in 1991 IIRC, and countless other independent implementations. for your typical use case, the complexity (in time and space) that they provide is apparently a net negative.

The "s.flatten() before XYZ is idiomatic" is exactly the kind of leaky abstraction that everyone hates, and it requires an understanding that is lacking among most programmers (guaranteeing half the flatten()s will be useless and half of those needed will be missing).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: