No he did not benchmark. Thus my remark on no technical argument.
Linus stipulated that the number of CPU instructions retired stays the same, so there will be no sizeable improvement of performance. That's hand-waving, not a benchmark. Somebody else posted a micro-benchmark showing better performance in his particular case.
((in the glibc discussion)) Linus stood for users in the short term (works here and now). Not for good development practices (read manuals, fix bugs, use modern hardware facilities -- here HAS_FAST_COPY_BACKWARD CPU feature).
EDIT:
nb., the ratio of instructions retired to CPU clock ticks depends on caching issues & such -- thus a particular memory access pattern will perform better than other. That's indicated by the HAS_FAST_COPY_BACKWARD CPU feature.
EDIT 2:
I'm aware I'm making risky remarks here; Linus have worked at Transmeta for several years, so he knows way more about CPU design and internals than I do.
>((in the glibc discussion)) Linus stood for users in the short term (works here and now). Not for good development practices (read manuals, fix bugs, use modern hardware facilities -- here HAS_FAST_COPY_BACKWARD CPU feature).
If you're creating an API(i.e. glibc) that everyone and their brother uses, it's a bad idea to break binary comparability in an update. You might be able to get away with it for a full version number change, but for point updates, it's a bad idea. If you're going to change it, add a new function.
You don't break binary compatibility in a major component without an extremely good reason(i.e. it doesn't work anymore). Forcing everyone to recode or even recompile to work with an OS-level component is just idiotic.
> If you're creating an API(i.e. glibc) that everyone and their brother uses, it's a bad idea to break binary comparability in an update.
Nobody argues for breaking binary compatibility. Because that's not the case there. Both functions remain perfectly compatible with client applications -- unless those apps are buggy themselves.
What got changed was an implementation detail -- order of bytes copying. Think of it as of a private member or method of an object, if you're an OOP head. Must be untouchable by outer code, and the outer code must not rely on its internal behavior.
Certain applications were making a silent assumption about the internal workings of the function -- and they used the wrong function. Working around that is not keeping binary compatibility; that's promoting bugs. [0]
> If you're going to change it, add a new function.
That's the current state of affairs. The memmove(3) vs. memcpy(3) distinction was created decades ago for this very reason. memcpy(3) may take extra optimization, because it makes no assumption about relation of source & target memory regions. memmove(3) may be easier to use, because it performs the right thing if the source & target memory overlap. The developer picks either. There's no other difference between them -- especially in the API. The best thing is, replacing one with another takes nothing -- you just change function name; arguments are exactly the same. Perhaps for the very reason of ease of replacing one with another.
From the man pages:
The memmove() function copies n bytes from memory area src to memory area dest. The memory areas may overlap:
The memcpy() function copies n bytes from memory area src to memory area dest. The memory areas must not overlap.
See? Nothing confusing there.
----
[0] A few years ago everybody and their brother were deriding Microsoft for pushing hacks into Windows XP to remain backward compatible with selected application -- like memory allocation workaround for the SimCity game. They did unsound engineering for business reasons -- increasing adoption rate of Windows XP. Now don't ask us to backpedal on that, and push bug compatibility into opensourece software. Don't ask for cart before the horse -- Flash Player before any non-buggy program. Don't ask Linux to follow in every footstep of Microsoft -- let's learn from mistakes.
please also note, the memcpy(3) and memmove(3) functions aren't glibc's invention. They aren't specific to GNU, Linux or glibc. You'll find the functions in every modern UNIX and similar OS, no matter if based on glibc or some other system libraries.
Again, from the man page:
CONFORMING TO
SVr4, 4.3BSD, C89, C99, POSIX.1-2001.
>A few years ago everybody and their brother were deriding Microsoft for pushing hacks into Windows XP to remain backward compatible with selected application -- like memory allocation workaround for the SimCity game. They did unsound engineering for business reasons -- increasing adoption rate of Windows XP. Now don't ask us to backpedal on that, and push bug compatibility into opensourece software. Don't ask for cart before the horse -- Flash Player before any non-buggy program. Don't ask Linux to follow in every footstep of Microsoft -- let's learn from mistakes.
You do realize that the reason MS was derided for that is because MS doesn't believe in backwards compatibility at all, right? They updated their API completely, which broke Simcity, then they went and added hacks to make it run. They had two more correct options. Either tell Maxis that it's their problem and that they needed to fix it, or actually maintain backwards compatibility.
Linux strives to maintain backwards compatibility, unless they absolutely must break it. This means that you maintain buggy behavior if it's used commonly. I work on multiplatform code, and moving to new Linux versions is almost trivially simple for us-99% of the time, we just have to turn on the new compiler and OS combination, and just build it. Most of the time, any actual porting efforts are done because there are new OS features that we can leverage for performance.
You have completely misstated the problem with SimCity.
"I first heard about this from one of the developers of the hit game SimCity, who told me that there was a critical bug in his application: it used memory right after freeing it, a major no-no that happened to work OK on DOS but would not work under Windows"
Still, the benchmark was for one particular case, and inconclusive (as in, ``testing can show X, but not lack of X'' ). Doesn't invalidate the finding of Ma Ling, comment #99, who got improvement in another testcase.
Linus stipulated that the number of CPU instructions retired stays the same, so there will be no sizeable improvement of performance. That's hand-waving, not a benchmark. Somebody else posted a micro-benchmark showing better performance in his particular case.
((in the glibc discussion)) Linus stood for users in the short term (works here and now). Not for good development practices (read manuals, fix bugs, use modern hardware facilities -- here HAS_FAST_COPY_BACKWARD CPU feature).
EDIT:
nb., the ratio of instructions retired to CPU clock ticks depends on caching issues & such -- thus a particular memory access pattern will perform better than other. That's indicated by the HAS_FAST_COPY_BACKWARD CPU feature.
EDIT 2:
I'm aware I'm making risky remarks here; Linus have worked at Transmeta for several years, so he knows way more about CPU design and internals than I do.