Hacker News new | past | comments | ask | show | jobs | submit login

It is not possible to specify bitfields in a way that makes sense, is consistent with a bytewise view of memory, and is portable between big endian and little endian processors.

Say you have, for instance (using C notation)

  struct {
    unsigned one : 8;
    unsigned two : 8;
  };
The fields are supposed to be represented in memory in the same order they are declared, so one is the first byte and two is the second byte. This should have the same representation as if I declared it as two uint8_t fields. If I type pun it and load it into a register as a uint16_t then it depends on the hardware whether the low and high bytes are one and two or two and one.

It gets more tricky when you consider arbitrary bit widths.

  struct {
    unsigned u4 : 4;
    unsigned uC : 12;
  };
If the fields are allocated in order, is u4 the low bits of the first byte or the high bits? If you require it to be the low bits, then it works ok on a little endian machine, but on a big endian machine the uC field ends up split, so the 16 bit view looks like:

  CCCC4444CCCCCCCC



> It is not possible to specify bitfields in a way that makes sense, is consistent with a bytewise view of memory, and is portable between big endian and little endian processors.

I'm not sure I accept "consistency with a bytewise view of memory" as a well-defined, reasonable concept. I do expect to give a list of bit widths, and get a field that has these in consecutive order. Why would it randomly do weird things on an 8-bit boundary?

> If the fields are allocated in order, is u4 the low bits of the first byte or the high bits?

It's the low bits on LE, and the high bits on BE.

> If you require it to be the low bits, then it works ok on a little endian machine, but on a big endian machine the uC field ends up split

That's why the direction is defined to match the endianness; you get a consecutive chain of bits in either case.

> so the 16 bit view looks like: CCCC4444CCCCCCCC

It's 4444CCCCCCCCCCCC on BE, and CCCCCCCCCCCC4444 on LE. If you need something else, it's no longer a question of defining an ABI-consistent structure, but rather expressing a representation of an externally given constraint.


> > If the fields are allocated in order, is u4 the low bits of the first byte or the high bits?

> It's the low bits on LE, and the high bits on BE.

You are advocating for the current rule in C. As I said, C's rule implies 8 bit fields will be in different orders in memory on machines of different endianness, which makes it very difficult to use bitfields to get exact control over memory layout in a portable manner.


No, I'm advocating that the current behavior of C compiler is the only thing that really makes sense for ABI considerations. I opened my comment with: I'm not sure I accept "consistency with a bytewise view of memory" as a well-defined, reasonable concept. You're starting from an assumption that there is something "important" about 8 bit fields, and that bitfields are a tool to express some externally defined memory layout. But unless your language also has "endianed" types for larger integers, it's already impossible to do that. And most languages don't claim or try to work with externally defined memory layouts.

This entire topic disaggregates into 2 distinct categories: deterministic packing for architecture ABIs, which needs to be consistent but can be arbitrary. And representing externally defined structures, which is a matter of exact representation capabilities.


Is that not why functions like htonl exist: to convert between the host architecture's endianness and a platform-independent representation?

Your binary isn't going to be portable. The only reason your memory layout should be is if you intend to serialize it. But if you're going that extra step, you _need_ to convert it to a platform-independent format regardless -- otherwise, not even your ints deserialize correctly.


I agree, if you're changing endian-ness then bitwise compatibility between in-memory formats are out the window by definition. Even if we're just declaring a packed struct containing a single int32_t it's not gonna match at the bit/byte level.

Unless you define a single 'right' bit order and then swizzle/unswizzle every value being written to or read from a packed struct, but then that's becoming more of a serialize/unserialize which is a different thing.


Variations due to endianness (hardware platform) are to be expected but variations due to compiler (implementation-defined) can be avoided if the spec says so. The fact that so many CPUs are little-endian these days certainly doesn't make things easier.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: