Hacker News new | past | comments | ask | show | jobs | submit login

Apple M1 supports optional x86-style memory event ordering, so that its x86 emulation could be made to work without penalty.

When SPARC got new microcode supporting unaligned access, it turned out to be a big performance win, as the alignment padding had made for a bigger cache footprint. That was an embarrassment for the whole RISC industry. Nobody today would field a chip that enforced alignment.

The alignment penalty might have been smaller back when clock rates were closer to memory latency, but caches were radically smaller then, too, so even more affected by inflated footprint.




> as the alignment padding had made for a bigger cache footprint

I argues with some of the Rust compiler members the other day about wanting to just ditch almost all alighnment restrictions because I of this exact thing. They laughed and basically told me i didnt know what i was talking about. I remember about 15 years ago when i worked at market making firm and we test this and it was a great gain - we started packing almost all our structs after that.

Now, at another MM shop, and we're trying to push the same thing but having to fight these areguments again (the only alignmets I want to keep are for AVX and hardware accessed buffers).


There are other things you need to take into account too - padding can make it more likely for a struct to divide evenly into cache lines, which can trigger false sharing. Changing the size of a struct from 128 bytes to 120 or 122 bytes will cause it to be misaligned on cache lines and reduce the impact of false sharing and that can significantly improve performance.

The last time I worked on a btree-based data store, changing the nodes from ~1024 bytes to ~1000 delivered something like a 10% throughput improvement. This was done by reducing the number of entries in each node, and not by changing padding or packing.


True. Another reason to avoid too much aligning is to help reduce reliance on N-way cache collision avoidance.

Caches on modern chips can handle keeping up to some small fixed number, often 4, objects all in cache whose addresses are at the same offset into a page, but performance may collapse if that number is exceeded. It is quite hard to tune to avoid this, but by making things not line up on power-of-two boundaries, we can avoid out-and-out inviting it.


FWIW, it's still better to lay out your critical structures carefully, so that padding isn't needed. That way, you win both the cache efficiency and the efficiencies for aligned accesses.


One of the forms of 'premature optimization' that's often worth doing. Just align everything you can to the biggest power of two data-size you can. Also, always use fixed sized data types. E.G. (modular int) uint32 or (signed 2s comp) sint32 rather than int


WebAssembly ditched alignment restrictions and we don't regret it. There is an alignment hint, but I am not aware of any engine that uses it.


Superstition is as powerful as it ever was.


It's definitely received wisdom that may once have been right and no longer is.

Most people are not used to facts having a half-life, but many facts do, or, rather, much knowledge, does.

We feel very secure in knowing what we know, and the reality is that we need to be willing to question a lot of things, like authority, including our very own. Now, we can't be questioning everything all the time because that way madness lies, but we can't never question anything we think we know either!

Epistemology is hard. I want a doll that says that when you pull the cord.


Sort of depends on the knowledge.

It's certainly true that in the tech industry things are CONSTANTLY shifting.

However, talk physics and you'll find that things rarely change, especially the physics that most college graduates learn.


There was a famous study about the half-lives of "facts" in different fields. They do seem to vary by field.


Is this superstition or more received wisdom, which may have been true at one point in the past as is now just orthodoxy?


Fifty bucks says it isn't even about performance, but is instead about passing pointers to C code. Zero-overhead FFI has killed a lot of radical performance improvements that Rust could have otherwise made.

I don't know, because nobody's actually posting a link to it.


This strikes me as likely. Bitwise compatibility with machine ABI layout rules has powerful compatibility advantages even in places where it might make code slower. (And, for the large majority of code, slower doesn't matter anyway.)

Of course C and C++ themselves have to keep to machine ABI layout rules for backward compatibility to code built when those rules were (still) thought a good idea. Compilers offer annotations to dictate packing for specified types, and the Rust compiler certainly also offers such a choice. So, maybe such annotations should just be used a lot more in Rust, C, and C++.

This is not unlike the need to write "const" everywhere in C and C++ because the inherited default (from before it existed) was arguably wrong. We just need to get used to ignoring the annotation clutter.

But there is no doubt there are lots of people who think padding to alignment boundaries is faster. And, there can be other reasons to align even more strictly than the machine ABI says, even knowing what it costs.


Rust structs have non-C layouts. You can optionally specifiy that a struct should be laid out in the same way that C does it.


Structs aren’t the problem here. It’s the primitives.

You can take a pointer to an i32 and pass that pointer to C code as int32_t. This means it has to have the same alignment in Rust that it has in C.


The topic at hand is, specifically, that nobody makes cores that enforce alignment restrictions anymore. So, it doesn't matter where the pointer goes. All that matters is if your compiler lays out its structs the same way as whoever compiled the code a pointer to one ends up in.

There are embedded targets that still enforce alignment restrictions, but you are even less likely to pass pointers between code compiled with different compilers, there.


> The topic at hand is, specifically, that nobody makes cores that enforce alignment restrictions anymore. So, it doesn't matter where the pointer goes.

Compilers can rely on alignment even if the CPU doesn't. LLVM does, which is why older versions of rustc had segfaults when repr(packed) used to allow taking references. While it would be pretty easy to get rustc to stop emitting aligned loads, getting Clang and GCC to stop emitting aligned loads might be trickier. https://github.com/rust-lang/rust/issues/27060


People arguing against changing struct layout rules are probably mainly interested, then, in maintaining backward compatibility with older Rust, itself.

Anyway, same policy applies: annotate your structs "packed" anywhere performance matters and bitwise compatibility with other stuff doesn't.


Rust does not keep backward compatibility like that. In the absence of any statement forcing a specific ABI, the only guaranteed compatibility is with code that's part of the same build, or else part of the current Rust runtime and being linked into said build.


Even in Rust, mapping memory between processes, or via files between past and future processes, happens. Although structure-blasting is frowned upon in some circles, it is very fast where allowed.

But... when you are structure-blasting, you are probably also already paying generous attention to layout issues.


TSO has a performance cost, on M1 this is 10-15% [1] loss from enabling TSO on native arm64 code (not emulated).

[1]: https://blog.yiningkarlli.com/2021/07/porting-takua-to-arm-p...


Yes, there are sound reasons for it to be optional. It is remarkable how little the penalty is, on M1 and on x86. Apparently it takes a really huge number of extra transistors in the cache system to keep the overhead tolerable.


TIL. I should have known this... Maybe I'll start packing my structs too.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: