Packed structs in Zig make bit/flag sets trivial

arjvik · on Aug 30, 2022

if (mask & (WGPUColorWriteMask_Alpha|WGPUColorWriteMask_Blue)) { // alpha and blue are set.. }

Doesn't this give you if alpha OR blue is set?

Errors like this are another reason syntactical sugar for readability is important.

DecoPerson · on Aug 30, 2022

I’m seeing a lot of versions of the correct logic under this comment, so here’s mine!

    if((mask & WGPUColorWriteMask_Alpha) && (mask & WGPUColorWriteMask_Blue)) { //… }

This is the least confusing form I’ve seen that doesn’t require a function, macro, custom operator, etc.

jstimpfle · on Aug 30, 2022

You don't need syntactical sugar. Just write a function.

    static bool all_bits_set(i32 value, i32 mask) { return (value & mask) == mask; }

Here I'm assuming 32-bit values. In C, with relatively little support for generics, you can consider making multiple versions, possibly using _Generic (note, I haven't evaluated the sanity of using _Generic).

Alternatively you can use a #define. However, you need to use "mask" twice, so that gets tricky - either it requires care to keep the corresponding expression at the call site side-effect free. Or the macro needs to be written using compiler extensions like statement expressions and typeof() variable declarations à la Linux kernel.

tredre3 · on Sept 1, 2022

Yeah the usual way in C would be to use statement expression (which is a GCC extension but is supported by at least clang and msvc)

    #define all_bits_set(value, mask) ({__typeof__(value) v = (value), m = (mask); (v & m) == m; })

I personally would go for the extension aboveor a single 64 bit function, but here's the _Generic version:

    static bool all_bits_set32(i32 value, i32 mask) { return (value & mask) == mask; }
    static bool all_bits_set64(i64 value, i64 mask) { return (value & mask) == mask; }
    #define all_bits_set(value, mask) _Generic((value), int32_t: all_bits_set32, int64_t: all_bits_set64)(value, mask)

shaded-enmity · on Aug 30, 2022

The nice thing about this is that it's simple and works pretty much in any language so there's very little cognitive strain if you work with multiple languages at the same time.

flohofwoe · on Aug 30, 2022

Yes it would need to be:

    if ((mask & (WGPUColorWriteMask_Alpha|WGPUColorWriteMask_Blue)) == (WGPUColorWriteMask_Alpha|WGPUColorWriteMask_Blue)) { ... }

...which is quite a mouthful.

chrismorgan · on Aug 30, 2022

I’ve definitely desired an operator that combines these before. I’ve settled on &== as the best syntax in C-style languages, with a &== b being equivalent to a & b == b (but with a single evaluation of b).

  if (mask &== WGPUColorWriteMask_Alpha|WGPUColorWriteMask_Blue) { … }

I find only two very minor problems with it: firstly, that it’s not commutative (that is, a &== b is not equivalent to b &== a). Secondly, that it can be confused with the bitwise-and assignment operator &= (a &= b being equivalent to a = a & b for singly-evaluated lvalue a).

I’d then add |== for consistency and because it is conceivably useful. ^== is tempting, but since a ^== b would be just another spelling of a == 0, I’d skip it.

beeforpork · on Aug 30, 2022

Faszinating: I also thought about exactly '&==' in the past. It seems to be natural choice for this.

But then, continuing that thinking, if you have '&==', you'd also need '&!=' -- that looks really confusing.

Can't we turn that into a '==0' test somehow? Maybe '^&', because '((a ^ b) & b) == 0' is equivalent to '(a & b) == b'. And, as somehow said: '~&' also works: '(~a & b) == 0' is also equivalent to '(a & b) == b'.

    if ((a ~& b) == 0) { ... }

Wait -- we can swap that into '&~' and we're back in C. However, it reverses the logical order, with the test bits first:

    if ((b & ~a) == 0) { ... }

    if (((WGPUColorWriteMask_Alpha|WGPUColorWriteMask_Blue) & ~mask) == 0) {
        ...
    }

So we're back -- this is standard C now. But I find it incomprehensible.

chrismorgan · on Aug 30, 2022

Now I’m trying to decide if you make it better or worse by replacing the == 0 with simple logical negation:

  if (!(b & ~a)) { … }
  if (!((WGPUColorWriteMask_Alpha|WGPUColorWriteMask_Blue) & ~mask)) { … }

  if (!(~a & b)) { … }
  if (!(~mask & (WGPUColorWriteMask_Alpha|WGPUColorWriteMask_Blue))) { … }

Actually, that latter makes a little more intuitive sense to me because of how ! and ~ are both negation¹, so they kinda cancel out and leave just the masking. Kinda.

—⁂—

¹ Fun fact: Rust uses ! for both logical and bitwise negation, backed by core::ops::Not, with bool → bool and {integer} → {integer}, since bool is a proper type and there’s no boolean coercion anywhere—so you would have to stick with `== 0` in Rust, though in practice you’d probably go all typey with the bitflags crate’s macro to generate good types with some handy extra methods, and write `mask.contains(WgpuColorWriteMask::Alpha | WgpuColorWriteMask::Blue)`.

silvestrov · on Aug 30, 2022

You could also play the game of pretending that the scalar is an object and allow for syntactical sugar like:

    if (mask.contains(WGPUColorWriteMask_Alpha) || mask.contains(WGPUColorWriteMask_Blue)) { ... }

chrismorgan · on Aug 30, 2022

Fun fact: Rust’s bitflags macro crate generates just this, https://docs.rs/bitflags/latest/bitflags/example_generated/s....

  // To check for either flag:
  if mask.contains(WgpuColorWriteMask::Alpha) || mask.contains(WgpuColorWriteMask::Blue) { … }
  if mask.intersects(WgpuColorWriteMask::Alpha | WgpuColorWriteMask::Blue) { … }

  // To check for both flags:
  if mask.contains(WgpuColorWriteMask::Alpha | WgpuColorWriteMask::Blue) { … }

magicalhippo · on Aug 30, 2022

> I find only two very minor problems with it: firstly, that it’s not commutative

Why is that a problem? That is, if a has additional bits set compared to b, then ((a & b) == b) != ((b & a) == a), no?

chrismorgan · on Aug 30, 2022

Of common operators, / and - are the only ones that aren’t commutative.

For equality comparisons, there are two opposing conventions: actual == expected (the more popular, in my experience), and expected == actual (by no means rare). Having &== and |== be order-sensitive (though there’s absolutely no question in my mind about what the ordering should be) is mildly unfortunate.

It’s very minor.

magicalhippo · on Aug 30, 2022

Yeah ok. I'd consider that just a thing you gotta know. Like how for certain floating point values (a / b) * (c / d) can be fine while (a * c) / (b * d) gives you +/- infinity.

It's a separate operator after all, one I also miss a lot...

suyjuris · on Aug 30, 2022

You can also do it as follows:

    if (!(~mask & (WGPUColorWriteMask_Alpha|WGPUColorWriteMask_Blue))) { ... }

Not quite as long but perhaps less readable.

childintime · on Aug 30, 2022

I use:

    if (all_of(mask, WGPUColorWriteMask_Alpha | WGPUColorWriteMask_Blue))

with all_of() being a #define. Likewise none_of(), any_of().

No need for special operators.

tzs · on Aug 30, 2022

If you are willing to assume that the masks each only have 1 bit set, this would be less of a mouthful although it would draw other objections:

  if (@popCount(mask & (WGPUColorWriteMask_Alpha|WGPUColorWriteMask_Blue)) == 2)

I do not know Zig, so the syntax might not be right. I did check to see that it has popcount [1].

If it has some concise way to flip all the bits, then this would be another possibility that isn't too verbose, but might raise other objections. Let fmask be mask with all the bits flipped (how would one do that in Zig?).

  if ((fmask & (WGPUColorWriteMask_Alpha|WGPUColorWriteMask_Blue)) == 0)

[1] https://ziglang.org/documentation/master/#popCount

nurbl · on Aug 30, 2022

I think the point of the example is that in zig you'd just do

  if (mask.alpha and mask.blue) {

tempodox · on Aug 30, 2022

In C I'm using a macro for stuff like that (improving readability, avoiding errors). In Zig I could use a `comptime` function, I presume?

messe · on Aug 30, 2022

It depends on what else you’re doing. In this situation normal function would suffice in either C or Zig.

lobo_tuerto · on Aug 30, 2022

What about?

    if (mask & WGPUColorWriteMask_Alpha & WGPUColorWriteMask_Blue) {
        // alpha and blue are set..
    }

ElectricalUnion · on Aug 30, 2022

> `mask & WGPUColorWriteMask_Alpha & WGPUColorWriteMask_Blue`

If WGPUColorWriteMask_Alpha and WGPUColorWriteMask_Blue doesn't share bits, isn't this garanteed 100% to be false?

fwsgonzo · on Aug 30, 2022

Indeed, you would need to check that the mask ends up being exactly alpha and blue, otherwise it's just an automatic boolean conversion from integer to boolean, which many languages are doing away with due to all the bugs it produces.

svnpenn · on Aug 30, 2022

Yeah, you cant do that in Go, even explicitly:

    // cannot convert i (variable of type int) to type bool
    bool(i)

youd need to use a function:

    func to_bool(i int) bool { return i != 0 }

samatman · on Aug 30, 2022

Seems a bit mean-spirited to provide casts but not that one.

svnpenn · on Aug 30, 2022

not really. Go has no concept of truthy:

https://developer.mozilla.org/docs/Glossary/Truthy

and I fully support that decision. If you want to use a boolean, you need to be explicit about it.

samatman · on Aug 30, 2022

A statically typed language shouldn't indulge in truthiness, no. But a cast is isn't truthiness, it's truth: the meaning of "make this int impenetrable to all but truth and falsehood" is in practice well defined, there's nothing implicit happening here.

The compiler isn't going to make that function, it's going to optimize back to a cast to boolean. Why make the poor shmoe user type it out?

svnpenn · on Aug 30, 2022

Cast int8 to int16 isn't the same thing as int to boolean. In the first case, you get the same number (ignoring wrapping issues). In the second case, you're asking the compiler to make an arbitrary decision about what numbers go to what booleans.

samatman · on Aug 30, 2022

There is nothing arbitrary whatsoever about the conversion of any string of bits to a boolean, it is false iff none of the bits are 1.

If you want to call it arbitrary, it's been arbitrated decades ago, but Boolean logic is much older than computers and works as it does for a reason. I suspect you know that.

svnpenn · on Aug 30, 2022

the number 9, is not the same thing as "true". some people may want it to be, it might be convenient for some people if a compiler treats them as the same thing, but they aren't. In my opinion, its an anti-pattern to have the compiler make these type of decisions. Instead of writing and reading code that is explicit, you're writing an abbreviated version of what you want, and assuming that the compiler will know what you mean.

samatman · on Aug 31, 2022

It's Go pretending it's a kind of language it isn't, and making the user do the teapot dance instead of cooperating.

People keep telling me this koolaid is delicious but I just don't see it.

svnpenn · on Aug 31, 2022

It seems you ran out of actual arguments, sorry to hear that.

samatman · on Aug 31, 2022

You can't argue with someone who doesn't understand logic, which is literally true here. I have run out of jokes at Go's expense.

svnpenn · on Aug 31, 2022

I think I made my point known. Converting int to boolean is not a settled task, as different people might not agree on what the correct way is to do that. False could be zero, or negative one, or any negative number, or even one. So its better not to have the compiler make that decision, and just have the code be explicit about what its trying to do.

inhahe · on Aug 30, 2022

"WGPUColorWriteMask_Alpha|WGPUColorWriteMask_Blue" creates a number with both the alpha and blue bits set, then "mask &" (that number) ands the mask with that number in which both bits are set, so it'll only return true if both alpha and blue are set. The parenthetical grouping of the operators is key here.

edit: oh, i think i'm wrong, nevermind.

shultays · on Aug 30, 2022

No it won't. & is "binary and" and result will be WGPUColorWriteMask_Alpha or WGPUColorWriteMask_Blue if only one of them is set. Which is non zero so evaluated to true. So it checks either or those flags

Correct usage would be if you want both flags.

    (flag & (WGPUColorWriteMask_Alpha|WGPUColorWriteMask_Blue) == (WGPUColorWriteMask_Alpha|WGPUColorWriteMask_Blue)

sqrt_1 · on Aug 30, 2022

I believe it would need to be something like if (mask & (WGPUColorWriteMask_Alpha|WGPUColorWriteMask_Blue) == (WGPUColorWriteMask_Alpha|WGPUColorWriteMask_Blue)) for both bits to be set here.

flohofwoe · on Aug 30, 2022

No, the expression will resolve to 'true' if any of the Alpha or Blue bits are set.

dom96 · on Aug 30, 2022

This is why bit sets are far better at this and should be used, i.e.:

    if WGPUColorWriteMask.Alpha in mask and WGPUColorWriteMask.Blue in mask: ...

jbverschoor · on Aug 30, 2022

That's a 'user error'. "Nothing wrong with the language"

bayindirh · on Aug 30, 2022

If that’s user error, then there’s nothing wrong with C, C++, Java, JavaScript and myriad other languages. Then we don’t need any new ones on the basis of readability, elegance, expressiveness and myriad other subjective properties. Yet many discuss these aspects primarily.

klabb3 · on Aug 30, 2022

I believe parent was sarcastic and agrees with you.

jbverschoor · on Aug 30, 2022

I have to go to chandler’s sarcasm class. Can I BE any more obvious?

hurril · on Aug 30, 2022

That's a user error because there very much is something wrong with the language. The error is that it deals in truthy and falsy rather than a strict true^false dichotomy. So 1 is just as truthy as 2 is. And -234123 for that matter.

MobiusHorizons · on Aug 30, 2022

This is a nice syntax, but the functionality seems essentially the same as bit fields in C. Does this provide anything additional?

scaredginger · on Aug 30, 2022

The in-memory representation of bit fields is implementation-defined. Therefore, if you're calling into an external API that takes a uint32_t like in the example without an explicit remapping, you may or may not like the results.

In practice, everything you're likely to come across will be little endian nowadays, and the ABI you're using will most likely order your struct from top to bottom in memory, so they will look the same most of the time. However, it's still technically not portable.

galangalalgol · on Aug 30, 2022

I've dealt with oddities talking between big endian powerpc using these. Its been a few years but the difference wasn't just the endianess I think? Still, dealing with the mapping was way easier than masking for large structs. Is big endian really dead now?

scaredginger · on Aug 30, 2022

As I said, it's also ABI. Though admittedly, endianness would be encompassed by ABI, so it's all really just ABI

throwawaymaths · on Aug 30, 2022

> In practice, everything you're likely to come across will be little endian nowadays

The internet?

scaredginger · on Aug 30, 2022

Not really relevant to a discussion about CPUs and compiler implementations...

hansvm · on Aug 30, 2022

It's a language design feature that makes some sorts of networking code much easier to write. Why wouldn't that be relevant?

throwawaymaths · on Aug 31, 2022

Nobody ever writes code for memory mapped network devices!!

sqrt_1 · on Aug 30, 2022

Here is a sample of how you do this in C++ https://godbolt.org/z/W3bMcqM5P

flumpcakes · on Aug 30, 2022

Wow... I thought I had seen a lot of strange C++ code but I have _never_ seen packed structs using this syntax:

struct X { type alias : numeric_value = false; ... }

I love the power of C++ but there is _so much_ to the language. I'm sure there would also be some template meta programming solution even if this syntax was available.

Joker_vD · on Aug 30, 2022

Uh, that's standard C89. You know, the bit fields? Except for the "false" keyword, that's from stdbool.h and you need C99 for that to reliably exist.

stephencanon · on Aug 30, 2022

The only real problem with C (and C++) bitfields is that the standard doesn’t say anything about how they’re laid out in memory, which unfortunately makes them mostly useless unless you either don’t care about layout or you’re targeting one specific compiler and that compiler documents its choices and doesn’t break them in a few years.

Sign-extension of 1-bit fields also messes people up all the time, but that’s “just” an easy-to-fix bug.

pclmulqdq · on Aug 30, 2022

That is a really big problem in some applications. I wish the C committee would come out and say "use the layout that GCC 11 .x.x does" to make these a lot more usable.

stephencanon · on Aug 30, 2022

I can't remember if GCC has actually documented their layout choices. Ideally we would pick something documented =)

rrampage · on Aug 30, 2022

The rough C equivalent looks like https://godbolt.org/z/YqGPa7ecd , using _Static_assert ( available C11 on wards and as a GCC built-in from 4.6 ) with no default values for the struct members

scaredginger · on Aug 30, 2022

It's probably tough to find a metaprogramming solution to this, for two reasons:

1. Addressing happens at the byte level, not the bit level, so a type can't begin on any bit. You'd have to do your own addressing, such as in std::bitset.

2. For now, there's no reflection in the language, so you can't really assign names to members in a general way (hello, preprocessor). A solution might be to index by type; something like the following:

struct BitA{}; struct BitB{};

using ExampleBitFields = BitFields<BitA, BitB>;

bool checkBitA(const ExampleBitFields& x) { return x[BitA{}]; }

However, this has a lot of downsides.

pclmulqdq · on Aug 30, 2022

I have done similar things a few times, and like JSON handling, the optimal API does not seem to be the same for every intended use. Size-focused cases want different things than speed-focused ones, and you need to deal with sign extension, etc.

scaredginger · on Aug 30, 2022

Yup, I'd be very wary of all the template instantiation going on here. Sadly, I often find codegen is the most reasonable solution in a given situation. I wish we had a decent macro system, like Rust's

shultays · on Aug 30, 2022

I thought you were not able to initialize bitfield members like that in struct. Am I misremembering, was that something else or is it compiler/c++ release dependent?

cesaref · on Aug 30, 2022

This is a c++20 feature

shultays · on Aug 30, 2022

Ah right, I tried again

  <source>(6): error C7582: 't': default member initializers for bit-fields requires at least '/std:c++20'

Nice to see they also improved such "legacy" stuff

eqvinox · on Aug 30, 2022

You can absolutely initialize bitfield members in structs, not sure where your memory is from… maybe it was broken on some specific compiler vendors/versions?

shultays · on Aug 30, 2022

<source>(6): error C7582: 't': default member initializers for bit-fields requires at least '/std:c++20'

It is c++20 apparently https://godbolt.org/z/qvso544dr

eqvinox · on Aug 30, 2022

Ah, I misunderstood / didn't look properly and failed to notice this is a default value in the struct definition. My bad.

scaredginger · on Aug 30, 2022

Possible they were just pre-C++11

MontyCarloHall · on Aug 30, 2022

This has been possible since C89. See 3.5.2.1 of the C89 rationale [0].

It is architecture/compiler dependent though. This is explicitly acknowledged in the rationale document:

“Since some existing implementations, in the interest of enhanced access time, leave internal holes larger than absolutely necessary, it is not clear that a portable deterministic method can be given for traversing a structure field by field.”

[0] https://www.lysator.liu.se/c/rat/c5.html

scaredginger · on Aug 30, 2022

I think they were referring to the default member initializer, a feature added in C++11 not exclusive to bitfields. I couldn't see any reference to anything similar in what you linked (admittedly, I skimmed though).

For example:

    struct X {
        int defaulted = 1; // 1 is the default member initializer
    };

MontyCarloHall · on Aug 30, 2022

>I think they were referring to the default member initializer

Ah, that would make more sense. The combination of bitfield and default initializer syntax does look especially odd, since both features are rarely used IME.

>I couldn't see any reference to anything similar in what you linked

Indeed, there is no way to specify default values for a C struct at definition time.

tempodox · on Aug 30, 2022

Bit fields in C are notoriously non-portable (exact physical layout depends on the compiler). Is Zig any better (apart from the fact that there is only one Zig compiler for now)?

yellowapple · on Aug 30, 2022

That's the point of a packed struct: to explicitly declare the physical layout (instead of letting the compiler optimize it). If a packed struct ends up being non-portable between Zig implementations, then my impression is that one and/or the other must have a bug that must be fixed in order to actually comply with the language spec.

fanf2 · on Aug 30, 2022

It is not possible to specify bitfields in a way that makes sense, is consistent with a bytewise view of memory, and is portable between big endian and little endian processors.

Say you have, for instance (using C notation)

  struct {
    unsigned one : 8;
    unsigned two : 8;
  };

The fields are supposed to be represented in memory in the same order they are declared, so one is the first byte and two is the second byte. This should have the same representation as if I declared it as two uint8_t fields. If I type pun it and load it into a register as a uint16_t then it depends on the hardware whether the low and high bytes are one and two or two and one.

It gets more tricky when you consider arbitrary bit widths.

  struct {
    unsigned u4 : 4;
    unsigned uC : 12;
  };

If the fields are allocated in order, is u4 the low bits of the first byte or the high bits? If you require it to be the low bits, then it works ok on a little endian machine, but on a big endian machine the uC field ends up split, so the 16 bit view looks like:

  CCCC4444CCCCCCCC

eqvinox · on Aug 30, 2022

> It is not possible to specify bitfields in a way that makes sense, is consistent with a bytewise view of memory, and is portable between big endian and little endian processors.

I'm not sure I accept "consistency with a bytewise view of memory" as a well-defined, reasonable concept. I do expect to give a list of bit widths, and get a field that has these in consecutive order. Why would it randomly do weird things on an 8-bit boundary?

> If the fields are allocated in order, is u4 the low bits of the first byte or the high bits?

It's the low bits on LE, and the high bits on BE.

> If you require it to be the low bits, then it works ok on a little endian machine, but on a big endian machine the uC field ends up split

That's why the direction is defined to match the endianness; you get a consecutive chain of bits in either case.

> so the 16 bit view looks like: CCCC4444CCCCCCCC

It's 4444CCCCCCCCCCCC on BE, and CCCCCCCCCCCC4444 on LE. If you need something else, it's no longer a question of defining an ABI-consistent structure, but rather expressing a representation of an externally given constraint.

fanf2 · on Aug 30, 2022

> > If the fields are allocated in order, is u4 the low bits of the first byte or the high bits?

> It's the low bits on LE, and the high bits on BE.

You are advocating for the current rule in C. As I said, C's rule implies 8 bit fields will be in different orders in memory on machines of different endianness, which makes it very difficult to use bitfields to get exact control over memory layout in a portable manner.

eqvinox · on Aug 30, 2022

No, I'm advocating that the current behavior of C compiler is the only thing that really makes sense for ABI considerations. I opened my comment with: I'm not sure I accept "consistency with a bytewise view of memory" as a well-defined, reasonable concept. You're starting from an assumption that there is something "important" about 8 bit fields, and that bitfields are a tool to express some externally defined memory layout. But unless your language also has "endianed" types for larger integers, it's already impossible to do that. And most languages don't claim or try to work with externally defined memory layouts.

This entire topic disaggregates into 2 distinct categories: deterministic packing for architecture ABIs, which needs to be consistent but can be arbitrary. And representing externally defined structures, which is a matter of exact representation capabilities.

vitus · on Aug 30, 2022

Is that not why functions like htonl exist: to convert between the host architecture's endianness and a platform-independent representation?

Your binary isn't going to be portable. The only reason your memory layout should be is if you intend to serialize it. But if you're going that extra step, you _need_ to convert it to a platform-independent format regardless -- otherwise, not even your ints deserialize correctly.

taneq · on Aug 30, 2022

I agree, if you're changing endian-ness then bitwise compatibility between in-memory formats are out the window by definition. Even if we're just declaring a packed struct containing a single int32_t it's not gonna match at the bit/byte level.

Unless you define a single 'right' bit order and then swizzle/unswizzle every value being written to or read from a packed struct, but then that's becoming more of a serialize/unserialize which is a different thing.

tempodox · on Aug 30, 2022

Variations due to endianness (hardware platform) are to be expected but variations due to compiler (implementation-defined) can be avoided if the spec says so. The fact that so many CPUs are little-endian these days certainly doesn't make things easier.

rurban · on Aug 30, 2022

Bit fields are portable since ages, since C89. Just pack it, use the smallest base type and don't leave holes. We were using them in perl5 forever, and this compiles on more platforms with more compilers then you know. Just use -mms-bitfields on mingw and use unsigned short instead of just unsigned.

E.g. https://www.nntp.perl.org/group/perl.perl5.porters/2008/02/m... for tricks.

tempodox · on Aug 30, 2022

> Bit fields are portable since ages, since C89.

Wrong. N1256 (ISO C99 spec), for instance, explicitly states in 6.7.2.1.10:

  The order of allocation of bit-fields within a unit
  (high-order to low-order or low-order to high-order)
  is implementation-defined.

Look it up.

rurban · on Aug 30, 2022

In theory yes, in practice no. And the committee usually has no idea.

AshamedCaptain · on Aug 30, 2022

> Bit fields in C are notoriously non-portable (exact physical layout depends on the compiler).

The word "non-portable" is being thrown here too cheaply. By the definition used here, technically every C program that puts two integers consecutively on a struct is non portable, either. Not just due to alignment but due to endianess, etc.

eqvinox · on Aug 30, 2022

Notoriously? I can't name any architecture/platform/compiler where bits aren't allocated consecutively in order of definition, without holes, in the same direction as system endianness...

defen · on Aug 30, 2022

Try this in clang and gcc:

    #pragma pack(1)
    struct S {
        char c:1;
        char d:1 __attribute__((aligned(2)));
        char e:1;
    };
    _Static_assert(sizeof(struct S) == 1, "wrong size");

mhh__ · on Aug 30, 2022

gcc?

MSVC does things simply, gcc does not

WalterBright · on Aug 30, 2022

D just went ahead and implemented bit fields. They work, and it's hard to find any way to improve on them.

mhh__ · on Aug 30, 2022

It's hard but they aren't perfect. You shouldn't rely on them for packing memory at all if you want to talk to other programs.

Similarly bitfields mean you end up with "ints" that are actually 3 bits wide and so on.

WalterBright · on Aug 30, 2022

> You shouldn't rely on them for packing memory at all if you want to talk to other programs.

While the layout is indeed implementation dependent, pragmatically if you stick to using ints the layouts are portable as far as I can tell. Just like the size of ints is implementation dependent, but is reliably 32 bits on 32 and 64 bit machines.

addaon · on Aug 30, 2022

It also specified the layout (little endian) and disallowed reordering / padding / etc.

leni536 · on Aug 30, 2022

How bitfields are allocated within a byte is related, but independent from endianness.

endianness: how an array of bytes is interpreted as an integer/how an integer is layed out in memory as an array of bytes.

bitfield allocations: how subsequent bitfields are allocated within an integer, typically starting from least significant bit to most significant, or the other way around.

addaon · on Aug 30, 2022

Fair, and a good point.

In this particular case, the two are related, because the LSBit-first bitfield allocations can spill over between bytes, giving LSByte-first endianness as well.

slimsag · on Aug 30, 2022

Admittedly I didn't know C had bit fields like that, so I didn't cover them. I've never seen them used in practice and I've used quite a lot of C libraries. I wonder why that is?

One major difference appears to be that C bitfields memory layout is compiler-dependant. The other major difference is Zig's arbitrary-bit-width integer types just leading to less footguns I would speculate

bananaboy · on Aug 30, 2022

I don't know why you haven't come across them but personally I've used bitfields a lot over my career as a video game programmer at different companies. You can have arbitrary bit width integers as well. It's not really very different to zig.

flohofwoe · on Aug 30, 2022

> You can have arbitrary bit width integers as well.

This is news to me, how? AFAIK they'll only eventually make it into C23 with _BitInt(N).

...and apart from building a C++ wrapper class of course, but how would this pack with data outside the class - like the 4 + 28 bits example in the blog post.

Also IIRC when I tinkered with C/C++ bitfields, some compilers (at least MSVC I think?) didn't properly pack the bits (e.g. a single bit would be padded to a full byte, or they couldn't agree on a common size of the containing integer - e.g. one compiler packing <8 number of bits into an uint8_t, and another into a uint32_t). In the end C/C++ bitfields weren't all that useful for the use case described in the blog post, at least if portability across the three big compilers is needed (gcc, clang, msvc).

bananaboy · on Aug 30, 2022

My apologies, I may have misunderstood what OP meant! I thought they assumed bitfields were always 1-bit in size.

BitInt looks neat. It sounds kind of like a bit array which we use quite a lot (which is as you suggest a class wrapper over an array of uint32_t templated on a size).

We use bitfields quite a bit across clang and MSVC targeting mobile, PC, and console, and haven't had any problems as far as I know.

shultays · on Aug 30, 2022

Maybe other members in same struct was enforcing such packing? Such as https://godbolt.org/z/8G4v8McsP

You probably want static asserts for sizes in your code if you are trying to optimize your struct paddings

flohofwoe · on Aug 30, 2022

Could also be that MSVC has improved in this area. I think I experimented with VS2015 (since this was my 'base line compiler' at the time).

jeffreygoesto · on Aug 30, 2022

The order of a bitfield is explicitly implementation defined, at least in C++ [0] and C99. I have been bitten by this when compiling bit-fiddling code with a different compiler (for the same platform).

[0] section "Notes" in https://en.cppreference.com/w/cpp/language/bit_field

shultays · on Aug 30, 2022

Interesting, I saw some low level code that assumes that bits are neatly packed.

Some C decisions really confuses me. What would be the point of this one?

layer8 · on Aug 30, 2022

The alignment restrictions for memory accesses on some architectures may prevent bitfields from crossing certain address boundaries, hence implementations may need to reorder and/or pad the fields. The alternative would have been to either fail compilation for bitfield combinations that are incompatible with the target platform, which would arguably be worse, because you couldn’t portably use bitfields at all then, or to force implementations to generate code with case distinctions, possibly depending on the actual alignment at runtime, performing the bitfield accesses using multiple memory accesses if necessary.

yyyk · on Aug 30, 2022

>when I tinkered with C/C++ bitfields, some compilers (at least MSVC I think?) didn't properly pack the bits

There's #pragma pack for that.

Gibbon1 · on Aug 30, 2022

Far as I can tell 'don't use bit fields' is something students learn from their CS professors along with don't use goto, multiple returns, continue statements, pass structs by value, and always convert to big endian when sending data over a wire.

My experience is C compilers have ways packing and defining the order of bitfields and structs.

gonzus · on Aug 30, 2022

Most, even all of these, sound to me like cargo-culting. I certainly use multiple returns and continue statements all the time. I pass structs by value when I want to stress that they represent a value type that can / should be copied around (for example, a slice). As for the format to send data over a wire, I will either use the ntoh() / hton() macros, or move to a more standard binary format such as protobuffs.

The only thing I don't usually run into is the very topic of this article: bit fields.

mhh__ · on Aug 30, 2022

The (IIRC) ELF specification explicitly recommends against using bitfields because of implementation defined layout.

The compilers often barely document exactly how they lay things out.

stevenhuang · on Aug 30, 2022

Yeah no, bitfields are notoriously implementation defined.

In the embedded world you often have to deal with vendor specific toolchains, and with their finicky compilers, you'd be surprised the weirdness you run into when using bitfields.

You will learn to question everything not specifically defined by the c standard.

hoseja · on Aug 30, 2022

>always convert to big endian when sending data over a wire

What is this awful advice. Only convert to big-endian where legacy demands it.

wongarsu · on Aug 30, 2022

Meh. Network byte order is big endian. Otoh close to every device uses little endian internally and the conversion is pointless except for convention. Otoh it doesn't really matter, because the conversion is extremely fast (including hardware support in common processors).

Endianness is really perfectly named: a meaningless difference that generations of people fight holy wars over.

0x457 · on Aug 30, 2022

Network only applies to information in packet headers. How you actually pack bytes into your packets is entirely up to you and/or the protocol you're using.

It was a flip of a coin choice.

Gibbon1 · on Aug 30, 2022

The LoRaWAN standard uses big endian.

MobiusHorizons · on Aug 30, 2022

My impression was that for interfacing with hardware, the memory layout isn’t always contiguous, leaving gaps for reserved fields, so people just use the explicit syntax shown in the article. I had been under the impression the layout of bit fields was guaranteed (much like struct fields) but if that is truly compiler specific as you say, I can see why it would not be useful for interfacing with hardware. I had always thought it was because embedded c sometimes deals with less capable compilers, so people often limit themselves to c89 or even older standards.

elcritch · on Aug 30, 2022

C Bitfields are fairly common in embedded code. While layout can vary across compilers and archs hardware driver writers assume a known arch and compiler. You really can't compile embedded targets with say msvs or clang. Also gcc seams to be the defacto compiler for most embedded nowadays.

082349872349872 · on Aug 30, 2022

cf K&R 6.9 "Bit-fields"

audunw · on Aug 30, 2022

C bitfields are problematic and often aren't used in places where you'd think they'd be useful. Zig seems to be doing its best to get bitfields right.

Just one thing that Zig improves: in C if you need the bit offsets of the fields you'll still need lots of defines with the offsets/masks and such. In Zig it can be extracted from the packed struct at comptime

spaintech · on Aug 30, 2022

Don’t get me wrong, it’s a cool feature, the only problem I see here is zero readability, what if I’m looking at this code or trying to figure out the value of a register mangled by such a call, “I” find it very hard to read and understand what is happening there. I can read hex offsets and masks to know what’s what is happening to bitfield at any given time… just MHO.

lsof · on Aug 30, 2022

The language I'm working on has the following syntax

  bitfield ColorWriteMaskFlags : u32 {
    red 1 bool,
    green 1 bool,
    blue 1 bool,
    alpha 1 bool,
  };

The number after the field name is the bit size. An (offset, size) pair can be used instead (offsets start at 0 when not explicit). After that can be nothing (the bit field is an unsigned number), or the word 'bool' (the bit field is a boolean) or the word 'signed' (the bit field is a signed number in two's complement). The raw value of the bitfield can always be accessed with 'foo.#raw'.

EDIT: There's also no restriction to how multiple fields can overlap, as long as they all fit within the backing type.

masklinn · on Aug 30, 2022

The one thing that would worry me here is the endianness e.g. is red the LSB or the MSB?

Also what happens to the padding (bits not covered by subfields)? How’s that going to look when shoved into a file or over a socket? how does the langage handle overflow (more bits in the bit fields than there are in the parent field)?

lsof · on Aug 30, 2022

Red is the least significant bit, because offsets start at bit #0. Endianness is not a concern because you have to specify the underlying integer type, and bit endianness is not a thing, so it's just the native endianness (if you, e.g., need a network format with specific endianness and a 32-bit bitfield, you can instead use 4 consecutive 8-bit bitfields laid out in a consistent way). Unspecified bits are ignored; they'll be zero if the bitfield is initialized through conventional means but if they can be accessed directly through the raw value (or e.g. through memcpy, pointer casting, and any other direct memory access means), but when reading a field only its specified bits are read so unspecified bits doesn't change that result. When writing to a field, the source value is truncated to the field size, so you never end up writing to other bits.

masklinn · on Aug 30, 2022

> Red is the least significant bit, because offsets start at bit #0.

So red is the LSB because you decided it was the LSB. That is not a by-definition thing.

> Endianness is not a concern because you have to specify the underlying integer type, and bit endianness is not a thing

That’s not actually true. There are formats which process bytes LSB to MSB, and formats which process them MSB to LSB. E.g. git’s offset-encoding, the leading byte is a bitmap of continuation bytes, bit 0 indicates whether byte 7 is present.

Both are perfectly justifiable, one is offset-based, while the other is visualisation-based as bytes are usually represented MSB-first (as binary numbers, essentially).

> but when reading a field only its specified bits are read so unspecified bits doesn't change that result. When writing to a field, the source value is truncated to the field size, so you never end up writing to other bits.

I’m quite confused by “bitfield” do you mean the container field (the one that’s actually defined by the `bitfield` keyword) or the contained sub-fields?

lsof · on Aug 30, 2022

> I’m quite confused by “bitfield” do you mean the container field (the one that’s actually defined by the `bitfield` keyword) or the contained sub-fields?

By 'field' I meant the contained sub-fields

flohofwoe · on Aug 30, 2022

One area where this is still less flexible than regular bit masking is that quite often I want to access groups of bits as a value. For instance if the bits represent in/out pins on a chip emulator, sometimes I want to access unique data bus pins, and sometimes I want to set or get the data bus value.

Maybe Zig can handle this case with unions though, I haven't tried this yet (this would require that unions can work on the 'bit level' in packed structs).

Someone · on Aug 30, 2022

Swift has OptionSet. See https://developer.apple.com/documentation/swift/optionset. It allows setting, testing, or clearing multiple fields with a nice syntax.

One thing that is a bit noisy is that you have to specify the bit index when you name the individual bits, as in (from that article):

  static let secondDay = ShippingOptions(rawValue: 1 << 1)

Upside from that is that it makes it clear what bit each value specifies, and won’t easily accidentally change them when you reorder definitions, or insert or remove them. I can see why they made that choice.

I also guess OptionSet could have been implemented in Swift in a third party library, whereas this Zig feature cannot.

I also think/guess neither language guarantees multiple bits would get read or written in one instruction. If your hardware needs that, you probably have to go down a level, or look at the disassembly of your code to check what your compiler did.

elcritch · on Aug 30, 2022

I've come to really like Nim's built-in `set` type for that sorta scenario. It's just a bit-vector but it took me months before I realized that. You can cast an `int32` to a `set[MyEnum]` and be able to do set unions, differences, etc. Makes it easy to define a mask just by `const myMask = {A, B, E}`.

Making a Zig BitSet would probably be doable, for those cases where bitfields are overkill.

1: https://nim-lang.org/docs/manual.html#types-set-type

Comevius · on Aug 30, 2022

You can do this and more in Zig. The only caveat is that for MMIO you need a volatile pointer.

https://www.scattered-thoughts.net/writing/mmio-in-zig/

It gets tricky if you need more control over loads and stores, or if there are different address spaces.

alphazino · on Aug 30, 2022

I don't have experience with this area, but...

> this would require that unions can work on the 'bit level' in packed structs

Just as Zig has packed structs, it also has packed unions. So that part shouldn't be an issue.

AndyKelley · on Aug 31, 2022

`@bitCast` addresses this use case:

    const std = @import("std");
    const expect = std.testing.expect;

    test {
        const group1: P = .{ .a = true, .b = true };
        const group2: P = .{ .c = true };
        const group3 = @bitCast(P, @bitCast(u4, group1) | @bitCast(u4, group2));
        try expect(group3.a);
        try expect(group3.b);
        try expect(group3.c);
        try expect(!group3.d);
    }

    const P = packed struct {
        a: bool = false,
        b: bool = false,
        c: bool = false,
        d: bool = false,
    };

shultays · on Aug 30, 2022

So `unsigned flag:1`?

ok123456 · on Aug 30, 2022

After you set all the pragmas or attributes to not add any additional padding/alignment, and that you really, really mean it. Maybe add a few static asserts so you know at compile time if you got the size right.

These are not standard, so you need some preprocessor magic to choose the right thing. And so on...

infradig · on Aug 30, 2022

I just use 'bool flag:1' in C.

andolanra · on Aug 30, 2022

> This all works, people have been doing it for years in C, C++, Java, Rust, and more. In Zig, we can do better.

We can also do better in those other languages, too. For example, in Rust, I can use a crate like `bitfield` which gives me a macro with which I can write

    bitfield! {
        pub struct Color(u32);
        red, set_red: 0;
        green, set_green: 1;
        blue, set_blue: 2;
        alpha, set_alpha: 3;
    }

Don't get me wrong: it's cool that functionality like this is built-in in Zig, since having to rely on third-party functionality for something like this is not always what you want. But Zig is not, as this article implies, uniquely capable of expressing this kind of thing.

zppln · on Aug 30, 2022

Is it something I'd ever want to rely on third-party functionality for?

jeroenhd · on Aug 30, 2022

C has them and their implementation seems to be universally disliked. Using an external crate with an implementation that people do like, with the possibility to substitute another if you disagree, seems better than to force a specific implementation into the language (that people will then replace with external dependencies or that people will learn to avoid as a concept).

epicide · on Aug 30, 2022

Sometimes, there's value in providing a standard way of doing things. Even if it isn't perfect in all cases (or even a median case), then at least most people coalesce around how it's used and its limitations.

But yeah, sometimes it's better to have options. If it's common functionality though, there will likely be 1000 different implementations of it that all just slightly differ [0]. Perhaps it were better for that effort to be put into making the standard better.

I don't think there's a universally correct answer by any means, but for something so common as bitflags, I think I personally lean towards having a standard. Replacing an implementation wholesale feels like it should be reserved as a last resort.

Either way, I think mature pieces of software (languages especially) strive to provide a good upgrade path. Inevitably, the designers made something that doesn't match current needs. Even if it's just that "current needs" changed around them.

[0]: And if we subscribe to Sturgeon's Law, 90% of those are crap, anyway. Though they might not appear so on the surface...

speed_spread · on Aug 30, 2022

As long as the compiled code is just as efficient as it would be had it been built-in to the language, I don't see the issue? Bitfield ops are very low-level constructs that are only useful in very specific project types. Their portable usage can be tricky, it's not something a majority of coders should reach for.

That's the luxury of a standard build system: essential but rarely used features can be left out of the core language / lib because adding them back in is just a crate import away.

masklinn · on Aug 30, 2022

Yes? If you have lots of bitfields and the convenience / readability is worth it, why wouldn’t you?

zppln · on Aug 30, 2022

I meant as opposed to having it built in.

masklinn · on Aug 30, 2022

There are lots of reasons to not want it built-in e.g. it makes the language more complex, and if bad semantics are standardised you’re stuck with them.

If the language supports implementing a feature externally then it’s a good thing, as it allows getting wide experience with the feature without saddling the language with it, and if the semantics are fine and it’s in wide-spread use, then nothing precludes adding it to the language later on.

It’s much easier to add a feature to a language than to remove it.

aconbere · on Aug 31, 2022

I’ve got to admit having used packed structs in rust quite a bit i’m also a little confused about what Zig is improving on.

Many years ago I wrote a rust program to decode some game save data and it looks like what you’d expect.

https://github.com/aconbere/monster-hunter/blob/master/src/o...

stephc_int13 · on Aug 30, 2022

In my opinion, syntactic sugar is, in fine, more useful and positive than "powerful" or "expressive" features that can lead to shorter but harder to decipher code.

Syntactic sugar neve hurts.

laserbeam · on Aug 30, 2022

I wouldn't say this is syntactic sugar here. packed structs are the most valuable tool in the language to handle interop with other languages and building protocols. The fact that you can use that tool to build bitfields feels like an interesting, cool and useful side effect. However, it's not their purpose.

Note that the main difference between packed structs and regular structs is not the dense bit packing, rather that regular structs are allowed to reorder the fields however they wish in memory. The compiler is free to optimize your struct. It's not free to do so in other languages (like C for example). Thus you get a way to define structs exactly, with bit level precision, to build complex protocols where you can decide what every bit means.