Hacker News new | past | comments | ask | show | jobs | submit login
Packed structs in Zig make bit/flag sets trivial (hexops.com)
187 points by slimsag on Aug 30, 2022 | hide | past | favorite | 142 comments



if (mask & (WGPUColorWriteMask_Alpha|WGPUColorWriteMask_Blue)) { // alpha and blue are set.. }

Doesn't this give you if alpha OR blue is set?

Errors like this are another reason syntactical sugar for readability is important.


I’m seeing a lot of versions of the correct logic under this comment, so here’s mine!

    if((mask & WGPUColorWriteMask_Alpha) && (mask & WGPUColorWriteMask_Blue)) { //… }
This is the least confusing form I’ve seen that doesn’t require a function, macro, custom operator, etc.


You don't need syntactical sugar. Just write a function.

    static bool all_bits_set(i32 value, i32 mask) { return (value & mask) == mask; }
Here I'm assuming 32-bit values. In C, with relatively little support for generics, you can consider making multiple versions, possibly using _Generic (note, I haven't evaluated the sanity of using _Generic).

Alternatively you can use a #define. However, you need to use "mask" twice, so that gets tricky - either it requires care to keep the corresponding expression at the call site side-effect free. Or the macro needs to be written using compiler extensions like statement expressions and typeof() variable declarations à la Linux kernel.


Yeah the usual way in C would be to use statement expression (which is a GCC extension but is supported by at least clang and msvc)

    #define all_bits_set(value, mask) ({__typeof__(value) v = (value), m = (mask); (v & m) == m; })
I personally would go for the extension aboveor a single 64 bit function, but here's the _Generic version:

    static bool all_bits_set32(i32 value, i32 mask) { return (value & mask) == mask; }
    static bool all_bits_set64(i64 value, i64 mask) { return (value & mask) == mask; }
    #define all_bits_set(value, mask) _Generic((value), int32_t: all_bits_set32, int64_t: all_bits_set64)(value, mask)


The nice thing about this is that it's simple and works pretty much in any language so there's very little cognitive strain if you work with multiple languages at the same time.


Yes it would need to be:

    if ((mask & (WGPUColorWriteMask_Alpha|WGPUColorWriteMask_Blue)) == (WGPUColorWriteMask_Alpha|WGPUColorWriteMask_Blue)) { ... }
...which is quite a mouthful.


I’ve definitely desired an operator that combines these before. I’ve settled on &== as the best syntax in C-style languages, with a &== b being equivalent to a & b == b (but with a single evaluation of b).

  if (mask &== WGPUColorWriteMask_Alpha|WGPUColorWriteMask_Blue) { … }
I find only two very minor problems with it: firstly, that it’s not commutative (that is, a &== b is not equivalent to b &== a). Secondly, that it can be confused with the bitwise-and assignment operator &= (a &= b being equivalent to a = a & b for singly-evaluated lvalue a).

I’d then add |== for consistency and because it is conceivably useful. ^== is tempting, but since a ^== b would be just another spelling of a == 0, I’d skip it.


Faszinating: I also thought about exactly '&==' in the past. It seems to be natural choice for this.

But then, continuing that thinking, if you have '&==', you'd also need '&!=' -- that looks really confusing.

Can't we turn that into a '==0' test somehow? Maybe '^&', because '((a ^ b) & b) == 0' is equivalent to '(a & b) == b'. And, as somehow said: '~&' also works: '(~a & b) == 0' is also equivalent to '(a & b) == b'.

    if ((a ~& b) == 0) { ... }
Wait -- we can swap that into '&~' and we're back in C. However, it reverses the logical order, with the test bits first:

    if ((b & ~a) == 0) { ... }

    if (((WGPUColorWriteMask_Alpha|WGPUColorWriteMask_Blue) & ~mask) == 0) {
        ...
    }
So we're back -- this is standard C now. But I find it incomprehensible.


Now I’m trying to decide if you make it better or worse by replacing the == 0 with simple logical negation:

  if (!(b & ~a)) { … }
  if (!((WGPUColorWriteMask_Alpha|WGPUColorWriteMask_Blue) & ~mask)) { … }

  if (!(~a & b)) { … }
  if (!(~mask & (WGPUColorWriteMask_Alpha|WGPUColorWriteMask_Blue))) { … }
Actually, that latter makes a little more intuitive sense to me because of how ! and ~ are both negation¹, so they kinda cancel out and leave just the masking. Kinda.

—⁂—

¹ Fun fact: Rust uses ! for both logical and bitwise negation, backed by core::ops::Not, with bool → bool and {integer} → {integer}, since bool is a proper type and there’s no boolean coercion anywhere—so you would have to stick with `== 0` in Rust, though in practice you’d probably go all typey with the bitflags crate’s macro to generate good types with some handy extra methods, and write `mask.contains(WgpuColorWriteMask::Alpha | WgpuColorWriteMask::Blue)`.


You could also play the game of pretending that the scalar is an object and allow for syntactical sugar like:

    if (mask.contains(WGPUColorWriteMask_Alpha) || mask.contains(WGPUColorWriteMask_Blue)) { ... }


Fun fact: Rust’s bitflags macro crate generates just this, https://docs.rs/bitflags/latest/bitflags/example_generated/s....

  // To check for either flag:
  if mask.contains(WgpuColorWriteMask::Alpha) || mask.contains(WgpuColorWriteMask::Blue) { … }
  if mask.intersects(WgpuColorWriteMask::Alpha | WgpuColorWriteMask::Blue) { … }

  // To check for both flags:
  if mask.contains(WgpuColorWriteMask::Alpha | WgpuColorWriteMask::Blue) { … }


> I find only two very minor problems with it: firstly, that it’s not commutative

Why is that a problem? That is, if a has additional bits set compared to b, then ((a & b) == b) != ((b & a) == a), no?


Of common operators, / and - are the only ones that aren’t commutative.

For equality comparisons, there are two opposing conventions: actual == expected (the more popular, in my experience), and expected == actual (by no means rare). Having &== and |== be order-sensitive (though there’s absolutely no question in my mind about what the ordering should be) is mildly unfortunate.

It’s very minor.


Yeah ok. I'd consider that just a thing you gotta know. Like how for certain floating point values (a / b) * (c / d) can be fine while (a * c) / (b * d) gives you +/- infinity.

It's a separate operator after all, one I also miss a lot...


You can also do it as follows:

    if (!(~mask & (WGPUColorWriteMask_Alpha|WGPUColorWriteMask_Blue))) { ... }
Not quite as long but perhaps less readable.


I use:

    if (all_of(mask, WGPUColorWriteMask_Alpha | WGPUColorWriteMask_Blue))
with all_of() being a #define. Likewise none_of(), any_of().

No need for special operators.


If you are willing to assume that the masks each only have 1 bit set, this would be less of a mouthful although it would draw other objections:

  if (@popCount(mask & (WGPUColorWriteMask_Alpha|WGPUColorWriteMask_Blue)) == 2)
I do not know Zig, so the syntax might not be right. I did check to see that it has popcount [1].

If it has some concise way to flip all the bits, then this would be another possibility that isn't too verbose, but might raise other objections. Let fmask be mask with all the bits flipped (how would one do that in Zig?).

  if ((fmask & (WGPUColorWriteMask_Alpha|WGPUColorWriteMask_Blue)) == 0)
[1] https://ziglang.org/documentation/master/#popCount


I think the point of the example is that in zig you'd just do

  if (mask.alpha and mask.blue) {


In C I'm using a macro for stuff like that (improving readability, avoiding errors). In Zig I could use a `comptime` function, I presume?


It depends on what else you’re doing. In this situation normal function would suffice in either C or Zig.


What about?

    if (mask & WGPUColorWriteMask_Alpha & WGPUColorWriteMask_Blue) {
        // alpha and blue are set..
    }


> `mask & WGPUColorWriteMask_Alpha & WGPUColorWriteMask_Blue`

If WGPUColorWriteMask_Alpha and WGPUColorWriteMask_Blue doesn't share bits, isn't this garanteed 100% to be false?


Indeed, you would need to check that the mask ends up being exactly alpha and blue, otherwise it's just an automatic boolean conversion from integer to boolean, which many languages are doing away with due to all the bugs it produces.


Yeah, you cant do that in Go, even explicitly:

    // cannot convert i (variable of type int) to type bool
    bool(i)
youd need to use a function:

    func to_bool(i int) bool { return i != 0 }


Seems a bit mean-spirited to provide casts but not that one.


not really. Go has no concept of truthy:

https://developer.mozilla.org/docs/Glossary/Truthy

and I fully support that decision. If you want to use a boolean, you need to be explicit about it.


A statically typed language shouldn't indulge in truthiness, no. But a cast is isn't truthiness, it's truth: the meaning of "make this int impenetrable to all but truth and falsehood" is in practice well defined, there's nothing implicit happening here.

The compiler isn't going to make that function, it's going to optimize back to a cast to boolean. Why make the poor shmoe user type it out?


Cast int8 to int16 isn't the same thing as int to boolean. In the first case, you get the same number (ignoring wrapping issues). In the second case, you're asking the compiler to make an arbitrary decision about what numbers go to what booleans.


There is nothing arbitrary whatsoever about the conversion of any string of bits to a boolean, it is false iff none of the bits are 1.

If you want to call it arbitrary, it's been arbitrated decades ago, but Boolean logic is much older than computers and works as it does for a reason. I suspect you know that.


the number 9, is not the same thing as "true". some people may want it to be, it might be convenient for some people if a compiler treats them as the same thing, but they aren't. In my opinion, its an anti-pattern to have the compiler make these type of decisions. Instead of writing and reading code that is explicit, you're writing an abbreviated version of what you want, and assuming that the compiler will know what you mean.


It's Go pretending it's a kind of language it isn't, and making the user do the teapot dance instead of cooperating.

People keep telling me this koolaid is delicious but I just don't see it.


It seems you ran out of actual arguments, sorry to hear that.


You can't argue with someone who doesn't understand logic, which is literally true here. I have run out of jokes at Go's expense.


I think I made my point known. Converting int to boolean is not a settled task, as different people might not agree on what the correct way is to do that. False could be zero, or negative one, or any negative number, or even one. So its better not to have the compiler make that decision, and just have the code be explicit about what its trying to do.


"WGPUColorWriteMask_Alpha|WGPUColorWriteMask_Blue" creates a number with both the alpha and blue bits set, then "mask &" (that number) ands the mask with that number in which both bits are set, so it'll only return true if both alpha and blue are set. The parenthetical grouping of the operators is key here.

edit: oh, i think i'm wrong, nevermind.


No it won't. & is "binary and" and result will be WGPUColorWriteMask_Alpha or WGPUColorWriteMask_Blue if only one of them is set. Which is non zero so evaluated to true. So it checks either or those flags

Correct usage would be if you want both flags.

    (flag & (WGPUColorWriteMask_Alpha|WGPUColorWriteMask_Blue) == (WGPUColorWriteMask_Alpha|WGPUColorWriteMask_Blue)


I believe it would need to be something like if (mask & (WGPUColorWriteMask_Alpha|WGPUColorWriteMask_Blue) == (WGPUColorWriteMask_Alpha|WGPUColorWriteMask_Blue)) for both bits to be set here.


No, the expression will resolve to 'true' if any of the Alpha or Blue bits are set.


This is why bit sets are far better at this and should be used, i.e.:

    if WGPUColorWriteMask.Alpha in mask and WGPUColorWriteMask.Blue in mask: ...


That's a 'user error'. "Nothing wrong with the language"


If that’s user error, then there’s nothing wrong with C, C++, Java, JavaScript and myriad other languages. Then we don’t need any new ones on the basis of readability, elegance, expressiveness and myriad other subjective properties. Yet many discuss these aspects primarily.


I believe parent was sarcastic and agrees with you.


I have to go to chandler’s sarcasm class. Can I BE any more obvious?


That's a user error because there very much is something wrong with the language. The error is that it deals in truthy and falsy rather than a strict true^false dichotomy. So 1 is just as truthy as 2 is. And -234123 for that matter.


This is a nice syntax, but the functionality seems essentially the same as bit fields in C. Does this provide anything additional?


The in-memory representation of bit fields is implementation-defined. Therefore, if you're calling into an external API that takes a uint32_t like in the example without an explicit remapping, you may or may not like the results.

In practice, everything you're likely to come across will be little endian nowadays, and the ABI you're using will most likely order your struct from top to bottom in memory, so they will look the same most of the time. However, it's still technically not portable.


I've dealt with oddities talking between big endian powerpc using these. Its been a few years but the difference wasn't just the endianess I think? Still, dealing with the mapping was way easier than masking for large structs. Is big endian really dead now?


As I said, it's also ABI. Though admittedly, endianness would be encompassed by ABI, so it's all really just ABI


> In practice, everything you're likely to come across will be little endian nowadays

The internet?


Not really relevant to a discussion about CPUs and compiler implementations...


It's a language design feature that makes some sorts of networking code much easier to write. Why wouldn't that be relevant?


Nobody ever writes code for memory mapped network devices!!


Here is a sample of how you do this in C++ https://godbolt.org/z/W3bMcqM5P


Wow... I thought I had seen a lot of strange C++ code but I have _never_ seen packed structs using this syntax:

struct X { type alias : numeric_value = false; ... }

I love the power of C++ but there is _so much_ to the language. I'm sure there would also be some template meta programming solution even if this syntax was available.


Uh, that's standard C89. You know, the bit fields? Except for the "false" keyword, that's from stdbool.h and you need C99 for that to reliably exist.


The only real problem with C (and C++) bitfields is that the standard doesn’t say anything about how they’re laid out in memory, which unfortunately makes them mostly useless unless you either don’t care about layout or you’re targeting one specific compiler and that compiler documents its choices and doesn’t break them in a few years.

Sign-extension of 1-bit fields also messes people up all the time, but that’s “just” an easy-to-fix bug.


That is a really big problem in some applications. I wish the C committee would come out and say "use the layout that GCC 11 .x.x does" to make these a lot more usable.


I can't remember if GCC has actually documented their layout choices. Ideally we would pick something documented =)


The rough C equivalent looks like https://godbolt.org/z/YqGPa7ecd , using _Static_assert ( available C11 on wards and as a GCC built-in from 4.6 ) with no default values for the struct members


It's probably tough to find a metaprogramming solution to this, for two reasons:

1. Addressing happens at the byte level, not the bit level, so a type can't begin on any bit. You'd have to do your own addressing, such as in std::bitset.

2. For now, there's no reflection in the language, so you can't really assign names to members in a general way (hello, preprocessor). A solution might be to index by type; something like the following:

struct BitA{}; struct BitB{};

using ExampleBitFields = BitFields<BitA, BitB>;

bool checkBitA(const ExampleBitFields& x) { return x[BitA{}]; }

However, this has a lot of downsides.


I have done similar things a few times, and like JSON handling, the optimal API does not seem to be the same for every intended use. Size-focused cases want different things than speed-focused ones, and you need to deal with sign extension, etc.


Yup, I'd be very wary of all the template instantiation going on here. Sadly, I often find codegen is the most reasonable solution in a given situation. I wish we had a decent macro system, like Rust's


I thought you were not able to initialize bitfield members like that in struct. Am I misremembering, was that something else or is it compiler/c++ release dependent?


This is a c++20 feature


Ah right, I tried again

  <source>(6): error C7582: 't': default member initializers for bit-fields requires at least '/std:c++20'
Nice to see they also improved such "legacy" stuff


You can absolutely initialize bitfield members in structs, not sure where your memory is from… maybe it was broken on some specific compiler vendors/versions?


<source>(6): error C7582: 't': default member initializers for bit-fields requires at least '/std:c++20'

It is c++20 apparently https://godbolt.org/z/qvso544dr


Ah, I misunderstood / didn't look properly and failed to notice this is a default value in the struct definition. My bad.


Possible they were just pre-C++11


This has been possible since C89. See 3.5.2.1 of the C89 rationale [0].

It is architecture/compiler dependent though. This is explicitly acknowledged in the rationale document:

“Since some existing implementations, in the interest of enhanced access time, leave internal holes larger than absolutely necessary, it is not clear that a portable deterministic method can be given for traversing a structure field by field.”

[0] https://www.lysator.liu.se/c/rat/c5.html


I think they were referring to the default member initializer, a feature added in C++11 not exclusive to bitfields. I couldn't see any reference to anything similar in what you linked (admittedly, I skimmed though).

For example:

    struct X {
        int defaulted = 1; // 1 is the default member initializer
    };


>I think they were referring to the default member initializer

Ah, that would make more sense. The combination of bitfield and default initializer syntax does look especially odd, since both features are rarely used IME.

>I couldn't see any reference to anything similar in what you linked

Indeed, there is no way to specify default values for a C struct at definition time.


Bit fields in C are notoriously non-portable (exact physical layout depends on the compiler). Is Zig any better (apart from the fact that there is only one Zig compiler for now)?


That's the point of a packed struct: to explicitly declare the physical layout (instead of letting the compiler optimize it). If a packed struct ends up being non-portable between Zig implementations, then my impression is that one and/or the other must have a bug that must be fixed in order to actually comply with the language spec.


It is not possible to specify bitfields in a way that makes sense, is consistent with a bytewise view of memory, and is portable between big endian and little endian processors.

Say you have, for instance (using C notation)

  struct {
    unsigned one : 8;
    unsigned two : 8;
  };
The fields are supposed to be represented in memory in the same order they are declared, so one is the first byte and two is the second byte. This should have the same representation as if I declared it as two uint8_t fields. If I type pun it and load it into a register as a uint16_t then it depends on the hardware whether the low and high bytes are one and two or two and one.

It gets more tricky when you consider arbitrary bit widths.

  struct {
    unsigned u4 : 4;
    unsigned uC : 12;
  };
If the fields are allocated in order, is u4 the low bits of the first byte or the high bits? If you require it to be the low bits, then it works ok on a little endian machine, but on a big endian machine the uC field ends up split, so the 16 bit view looks like:

  CCCC4444CCCCCCCC


> It is not possible to specify bitfields in a way that makes sense, is consistent with a bytewise view of memory, and is portable between big endian and little endian processors.

I'm not sure I accept "consistency with a bytewise view of memory" as a well-defined, reasonable concept. I do expect to give a list of bit widths, and get a field that has these in consecutive order. Why would it randomly do weird things on an 8-bit boundary?

> If the fields are allocated in order, is u4 the low bits of the first byte or the high bits?

It's the low bits on LE, and the high bits on BE.

> If you require it to be the low bits, then it works ok on a little endian machine, but on a big endian machine the uC field ends up split

That's why the direction is defined to match the endianness; you get a consecutive chain of bits in either case.

> so the 16 bit view looks like: CCCC4444CCCCCCCC

It's 4444CCCCCCCCCCCC on BE, and CCCCCCCCCCCC4444 on LE. If you need something else, it's no longer a question of defining an ABI-consistent structure, but rather expressing a representation of an externally given constraint.


> > If the fields are allocated in order, is u4 the low bits of the first byte or the high bits?

> It's the low bits on LE, and the high bits on BE.

You are advocating for the current rule in C. As I said, C's rule implies 8 bit fields will be in different orders in memory on machines of different endianness, which makes it very difficult to use bitfields to get exact control over memory layout in a portable manner.


No, I'm advocating that the current behavior of C compiler is the only thing that really makes sense for ABI considerations. I opened my comment with: I'm not sure I accept "consistency with a bytewise view of memory" as a well-defined, reasonable concept. You're starting from an assumption that there is something "important" about 8 bit fields, and that bitfields are a tool to express some externally defined memory layout. But unless your language also has "endianed" types for larger integers, it's already impossible to do that. And most languages don't claim or try to work with externally defined memory layouts.

This entire topic disaggregates into 2 distinct categories: deterministic packing for architecture ABIs, which needs to be consistent but can be arbitrary. And representing externally defined structures, which is a matter of exact representation capabilities.


Is that not why functions like htonl exist: to convert between the host architecture's endianness and a platform-independent representation?

Your binary isn't going to be portable. The only reason your memory layout should be is if you intend to serialize it. But if you're going that extra step, you _need_ to convert it to a platform-independent format regardless -- otherwise, not even your ints deserialize correctly.


I agree, if you're changing endian-ness then bitwise compatibility between in-memory formats are out the window by definition. Even if we're just declaring a packed struct containing a single int32_t it's not gonna match at the bit/byte level.

Unless you define a single 'right' bit order and then swizzle/unswizzle every value being written to or read from a packed struct, but then that's becoming more of a serialize/unserialize which is a different thing.


Variations due to endianness (hardware platform) are to be expected but variations due to compiler (implementation-defined) can be avoided if the spec says so. The fact that so many CPUs are little-endian these days certainly doesn't make things easier.


Bit fields are portable since ages, since C89. Just pack it, use the smallest base type and don't leave holes. We were using them in perl5 forever, and this compiles on more platforms with more compilers then you know. Just use -mms-bitfields on mingw and use unsigned short instead of just unsigned.

E.g. https://www.nntp.perl.org/group/perl.perl5.porters/2008/02/m... for tricks.


> Bit fields are portable since ages, since C89.

Wrong. N1256 (ISO C99 spec), for instance, explicitly states in 6.7.2.1.10:

  The order of allocation of bit-fields within a unit
  (high-order to low-order or low-order to high-order)
  is implementation-defined.
Look it up.


In theory yes, in practice no. And the committee usually has no idea.


> Bit fields in C are notoriously non-portable (exact physical layout depends on the compiler).

The word "non-portable" is being thrown here too cheaply. By the definition used here, technically every C program that puts two integers consecutively on a struct is non portable, either. Not just due to alignment but due to endianess, etc.


Notoriously? I can't name any architecture/platform/compiler where bits aren't allocated consecutively in order of definition, without holes, in the same direction as system endianness...


Try this in clang and gcc:

    #pragma pack(1)
    struct S {
        char c:1;
        char d:1 __attribute__((aligned(2)));
        char e:1;
    };
    _Static_assert(sizeof(struct S) == 1, "wrong size");


gcc?

MSVC does things simply, gcc does not


D just went ahead and implemented bit fields. They work, and it's hard to find any way to improve on them.


It's hard but they aren't perfect. You shouldn't rely on them for packing memory at all if you want to talk to other programs.

Similarly bitfields mean you end up with "ints" that are actually 3 bits wide and so on.


> You shouldn't rely on them for packing memory at all if you want to talk to other programs.

While the layout is indeed implementation dependent, pragmatically if you stick to using ints the layouts are portable as far as I can tell. Just like the size of ints is implementation dependent, but is reliably 32 bits on 32 and 64 bit machines.


It also specified the layout (little endian) and disallowed reordering / padding / etc.


How bitfields are allocated within a byte is related, but independent from endianness.

endianness: how an array of bytes is interpreted as an integer/how an integer is layed out in memory as an array of bytes.

bitfield allocations: how subsequent bitfields are allocated within an integer, typically starting from least significant bit to most significant, or the other way around.


Fair, and a good point.

In this particular case, the two are related, because the LSBit-first bitfield allocations can spill over between bytes, giving LSByte-first endianness as well.


Admittedly I didn't know C had bit fields like that, so I didn't cover them. I've never seen them used in practice and I've used quite a lot of C libraries. I wonder why that is?

One major difference appears to be that C bitfields memory layout is compiler-dependant. The other major difference is Zig's arbitrary-bit-width integer types just leading to less footguns I would speculate


I don't know why you haven't come across them but personally I've used bitfields a lot over my career as a video game programmer at different companies. You can have arbitrary bit width integers as well. It's not really very different to zig.


> You can have arbitrary bit width integers as well.

This is news to me, how? AFAIK they'll only eventually make it into C23 with _BitInt(N).

...and apart from building a C++ wrapper class of course, but how would this pack with data outside the class - like the 4 + 28 bits example in the blog post.

Also IIRC when I tinkered with C/C++ bitfields, some compilers (at least MSVC I think?) didn't properly pack the bits (e.g. a single bit would be padded to a full byte, or they couldn't agree on a common size of the containing integer - e.g. one compiler packing <8 number of bits into an uint8_t, and another into a uint32_t). In the end C/C++ bitfields weren't all that useful for the use case described in the blog post, at least if portability across the three big compilers is needed (gcc, clang, msvc).


My apologies, I may have misunderstood what OP meant! I thought they assumed bitfields were always 1-bit in size.

BitInt looks neat. It sounds kind of like a bit array which we use quite a lot (which is as you suggest a class wrapper over an array of uint32_t templated on a size).

We use bitfields quite a bit across clang and MSVC targeting mobile, PC, and console, and haven't had any problems as far as I know.


Maybe other members in same struct was enforcing such packing? Such as https://godbolt.org/z/8G4v8McsP

You probably want static asserts for sizes in your code if you are trying to optimize your struct paddings


Could also be that MSVC has improved in this area. I think I experimented with VS2015 (since this was my 'base line compiler' at the time).


The order of a bitfield is explicitly implementation defined, at least in C++ [0] and C99. I have been bitten by this when compiling bit-fiddling code with a different compiler (for the same platform).

[0] section "Notes" in https://en.cppreference.com/w/cpp/language/bit_field


Interesting, I saw some low level code that assumes that bits are neatly packed.

Some C decisions really confuses me. What would be the point of this one?


The alignment restrictions for memory accesses on some architectures may prevent bitfields from crossing certain address boundaries, hence implementations may need to reorder and/or pad the fields. The alternative would have been to either fail compilation for bitfield combinations that are incompatible with the target platform, which would arguably be worse, because you couldn’t portably use bitfields at all then, or to force implementations to generate code with case distinctions, possibly depending on the actual alignment at runtime, performing the bitfield accesses using multiple memory accesses if necessary.


>when I tinkered with C/C++ bitfields, some compilers (at least MSVC I think?) didn't properly pack the bits

There's #pragma pack for that.


Far as I can tell 'don't use bit fields' is something students learn from their CS professors along with don't use goto, multiple returns, continue statements, pass structs by value, and always convert to big endian when sending data over a wire.

My experience is C compilers have ways packing and defining the order of bitfields and structs.


Most, even all of these, sound to me like cargo-culting. I certainly use multiple returns and continue statements all the time. I pass structs by value when I want to stress that they represent a value type that can / should be copied around (for example, a slice). As for the format to send data over a wire, I will either use the ntoh() / hton() macros, or move to a more standard binary format such as protobuffs.

The only thing I don't usually run into is the very topic of this article: bit fields.


The (IIRC) ELF specification explicitly recommends against using bitfields because of implementation defined layout.

The compilers often barely document exactly how they lay things out.


Yeah no, bitfields are notoriously implementation defined.

In the embedded world you often have to deal with vendor specific toolchains, and with their finicky compilers, you'd be surprised the weirdness you run into when using bitfields.

You will learn to question everything not specifically defined by the c standard.


>always convert to big endian when sending data over a wire

What is this awful advice. Only convert to big-endian where legacy demands it.


Meh. Network byte order is big endian. Otoh close to every device uses little endian internally and the conversion is pointless except for convention. Otoh it doesn't really matter, because the conversion is extremely fast (including hardware support in common processors).

Endianness is really perfectly named: a meaningless difference that generations of people fight holy wars over.


Network only applies to information in packet headers. How you actually pack bytes into your packets is entirely up to you and/or the protocol you're using.

It was a flip of a coin choice.


The LoRaWAN standard uses big endian.


My impression was that for interfacing with hardware, the memory layout isn’t always contiguous, leaving gaps for reserved fields, so people just use the explicit syntax shown in the article. I had been under the impression the layout of bit fields was guaranteed (much like struct fields) but if that is truly compiler specific as you say, I can see why it would not be useful for interfacing with hardware. I had always thought it was because embedded c sometimes deals with less capable compilers, so people often limit themselves to c89 or even older standards.


C Bitfields are fairly common in embedded code. While layout can vary across compilers and archs hardware driver writers assume a known arch and compiler. You really can't compile embedded targets with say msvs or clang. Also gcc seams to be the defacto compiler for most embedded nowadays.


cf K&R 6.9 "Bit-fields"


C bitfields are problematic and often aren't used in places where you'd think they'd be useful. Zig seems to be doing its best to get bitfields right.

Just one thing that Zig improves: in C if you need the bit offsets of the fields you'll still need lots of defines with the offsets/masks and such. In Zig it can be extracted from the packed struct at comptime


Don’t get me wrong, it’s a cool feature, the only problem I see here is zero readability, what if I’m looking at this code or trying to figure out the value of a register mangled by such a call, “I” find it very hard to read and understand what is happening there. I can read hex offsets and masks to know what’s what is happening to bitfield at any given time… just MHO.


The language I'm working on has the following syntax

  bitfield ColorWriteMaskFlags : u32 {
    red 1 bool,
    green 1 bool,
    blue 1 bool,
    alpha 1 bool,
  };
The number after the field name is the bit size. An (offset, size) pair can be used instead (offsets start at 0 when not explicit). After that can be nothing (the bit field is an unsigned number), or the word 'bool' (the bit field is a boolean) or the word 'signed' (the bit field is a signed number in two's complement). The raw value of the bitfield can always be accessed with 'foo.#raw'.

EDIT: There's also no restriction to how multiple fields can overlap, as long as they all fit within the backing type.


The one thing that would worry me here is the endianness e.g. is red the LSB or the MSB?

Also what happens to the padding (bits not covered by subfields)? How’s that going to look when shoved into a file or over a socket? how does the langage handle overflow (more bits in the bit fields than there are in the parent field)?


Red is the least significant bit, because offsets start at bit #0. Endianness is not a concern because you have to specify the underlying integer type, and bit endianness is not a thing, so it's just the native endianness (if you, e.g., need a network format with specific endianness and a 32-bit bitfield, you can instead use 4 consecutive 8-bit bitfields laid out in a consistent way). Unspecified bits are ignored; they'll be zero if the bitfield is initialized through conventional means but if they can be accessed directly through the raw value (or e.g. through memcpy, pointer casting, and any other direct memory access means), but when reading a field only its specified bits are read so unspecified bits doesn't change that result. When writing to a field, the source value is truncated to the field size, so you never end up writing to other bits.


> Red is the least significant bit, because offsets start at bit #0.

So red is the LSB because you decided it was the LSB. That is not a by-definition thing.

> Endianness is not a concern because you have to specify the underlying integer type, and bit endianness is not a thing

That’s not actually true. There are formats which process bytes LSB to MSB, and formats which process them MSB to LSB. E.g. git’s offset-encoding, the leading byte is a bitmap of continuation bytes, bit 0 indicates whether byte 7 is present.

Both are perfectly justifiable, one is offset-based, while the other is visualisation-based as bytes are usually represented MSB-first (as binary numbers, essentially).

> but when reading a field only its specified bits are read so unspecified bits doesn't change that result. When writing to a field, the source value is truncated to the field size, so you never end up writing to other bits.

I’m quite confused by “bitfield” do you mean the container field (the one that’s actually defined by the `bitfield` keyword) or the contained sub-fields?


> I’m quite confused by “bitfield” do you mean the container field (the one that’s actually defined by the `bitfield` keyword) or the contained sub-fields?

By 'field' I meant the contained sub-fields


One area where this is still less flexible than regular bit masking is that quite often I want to access groups of bits as a value. For instance if the bits represent in/out pins on a chip emulator, sometimes I want to access unique data bus pins, and sometimes I want to set or get the data bus value.

Maybe Zig can handle this case with unions though, I haven't tried this yet (this would require that unions can work on the 'bit level' in packed structs).


Swift has OptionSet. See https://developer.apple.com/documentation/swift/optionset. It allows setting, testing, or clearing multiple fields with a nice syntax.

One thing that is a bit noisy is that you have to specify the bit index when you name the individual bits, as in (from that article):

  static let secondDay = ShippingOptions(rawValue: 1 << 1)
Upside from that is that it makes it clear what bit each value specifies, and won’t easily accidentally change them when you reorder definitions, or insert or remove them. I can see why they made that choice.

I also guess OptionSet could have been implemented in Swift in a third party library, whereas this Zig feature cannot.

I also think/guess neither language guarantees multiple bits would get read or written in one instruction. If your hardware needs that, you probably have to go down a level, or look at the disassembly of your code to check what your compiler did.


I've come to really like Nim's built-in `set` type for that sorta scenario. It's just a bit-vector but it took me months before I realized that. You can cast an `int32` to a `set[MyEnum]` and be able to do set unions, differences, etc. Makes it easy to define a mask just by `const myMask = {A, B, E}`.

Making a Zig BitSet would probably be doable, for those cases where bitfields are overkill.

1: https://nim-lang.org/docs/manual.html#types-set-type


You can do this and more in Zig. The only caveat is that for MMIO you need a volatile pointer.

https://www.scattered-thoughts.net/writing/mmio-in-zig/

It gets tricky if you need more control over loads and stores, or if there are different address spaces.


I don't have experience with this area, but...

> this would require that unions can work on the 'bit level' in packed structs

Just as Zig has packed structs, it also has packed unions. So that part shouldn't be an issue.


`@bitCast` addresses this use case:

    const std = @import("std");
    const expect = std.testing.expect;

    test {
        const group1: P = .{ .a = true, .b = true };
        const group2: P = .{ .c = true };
        const group3 = @bitCast(P, @bitCast(u4, group1) | @bitCast(u4, group2));
        try expect(group3.a);
        try expect(group3.b);
        try expect(group3.c);
        try expect(!group3.d);
    }

    const P = packed struct {
        a: bool = false,
        b: bool = false,
        c: bool = false,
        d: bool = false,
    };


So `unsigned flag:1`?


After you set all the pragmas or attributes to not add any additional padding/alignment, and that you really, really mean it. Maybe add a few static asserts so you know at compile time if you got the size right.

These are not standard, so you need some preprocessor magic to choose the right thing. And so on...


I just use 'bool flag:1' in C.


> This all works, people have been doing it for years in C, C++, Java, Rust, and more. In Zig, we can do better.

We can also do better in those other languages, too. For example, in Rust, I can use a crate like `bitfield` which gives me a macro with which I can write

    bitfield! {
        pub struct Color(u32);
        red, set_red: 0;
        green, set_green: 1;
        blue, set_blue: 2;
        alpha, set_alpha: 3;
    }
Don't get me wrong: it's cool that functionality like this is built-in in Zig, since having to rely on third-party functionality for something like this is not always what you want. But Zig is not, as this article implies, uniquely capable of expressing this kind of thing.


Is it something I'd ever want to rely on third-party functionality for?


C has them and their implementation seems to be universally disliked. Using an external crate with an implementation that people do like, with the possibility to substitute another if you disagree, seems better than to force a specific implementation into the language (that people will then replace with external dependencies or that people will learn to avoid as a concept).


Sometimes, there's value in providing a standard way of doing things. Even if it isn't perfect in all cases (or even a median case), then at least most people coalesce around how it's used and its limitations.

But yeah, sometimes it's better to have options. If it's common functionality though, there will likely be 1000 different implementations of it that all just slightly differ [0]. Perhaps it were better for that effort to be put into making the standard better.

I don't think there's a universally correct answer by any means, but for something so common as bitflags, I think I personally lean towards having a standard. Replacing an implementation wholesale feels like it should be reserved as a last resort.

Either way, I think mature pieces of software (languages especially) strive to provide a good upgrade path. Inevitably, the designers made something that doesn't match current needs. Even if it's just that "current needs" changed around them.

[0]: And if we subscribe to Sturgeon's Law, 90% of those are crap, anyway. Though they might not appear so on the surface...


As long as the compiled code is just as efficient as it would be had it been built-in to the language, I don't see the issue? Bitfield ops are very low-level constructs that are only useful in very specific project types. Their portable usage can be tricky, it's not something a majority of coders should reach for.

That's the luxury of a standard build system: essential but rarely used features can be left out of the core language / lib because adding them back in is just a crate import away.


Yes? If you have lots of bitfields and the convenience / readability is worth it, why wouldn’t you?


I meant as opposed to having it built in.


There are lots of reasons to not want it built-in e.g. it makes the language more complex, and if bad semantics are standardised you’re stuck with them.

If the language supports implementing a feature externally then it’s a good thing, as it allows getting wide experience with the feature without saddling the language with it, and if the semantics are fine and it’s in wide-spread use, then nothing precludes adding it to the language later on.

It’s much easier to add a feature to a language than to remove it.


I’ve got to admit having used packed structs in rust quite a bit i’m also a little confused about what Zig is improving on.

Many years ago I wrote a rust program to decode some game save data and it looks like what you’d expect.

https://github.com/aconbere/monster-hunter/blob/master/src/o...


In my opinion, syntactic sugar is, in fine, more useful and positive than "powerful" or "expressive" features that can lead to shorter but harder to decipher code.

Syntactic sugar neve hurts.


I wouldn't say this is syntactic sugar here. packed structs are the most valuable tool in the language to handle interop with other languages and building protocols. The fact that you can use that tool to build bitfields feels like an interesting, cool and useful side effect. However, it's not their purpose.

Note that the main difference between packed structs and regular structs is not the dense bit packing, rather that regular structs are allowed to reorder the fields however they wish in memory. The compiler is free to optimize your struct. It's not free to do so in other languages (like C for example). Thus you get a way to define structs exactly, with bit level precision, to build complex protocols where you can decide what every bit means.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: