Hacker News new | past | comments | ask | show | jobs | submit login
A Better Way to Implement Bit Fields (andrewkelley.me)
159 points by luu on Nov 14, 2018 | hide | past | favorite | 33 comments



IMHO it would be better to capitalize the OTG structure field names according to the standard. This would turn "avalidoven" into "AvalidOvEn", which makes much more sense: it's the "A-side session valid override enable bit", which allows the software to override the "Avalid" signal with the value from AvalidOvVal.

Also, this needs a (2017) tag.


>which allows the software to override the "Avalid" signal with the value from AvalidOvVal.

How is "Avalid" any more meaningful though?

"The AValid signal is used to indicate if the session for an A-peripheral is valid. This signal is 1b when Vbus is above 2V"

They could use some better naming.


How is your comment even remotely relevant to the post topic?


Read the article. At the end the author says

> By the way it's nice to know that the USB protocol has a bit to indicate "a valid oven". I wonder when that is used.

The bit fields he uses as an example related to USB protocol implementation.


The post discusses bit-fields in the context of implementing some part of the USB OTG specification. Author makes a joke about USB having bits for “a valid oven”.


Bitfields are integrated in Ada and created through representation clauses : http://sworthodoxy.blogspot.com/2014/03/ada-vs-c-bit-fields....

You can also set the endianness (bit and byte order) : https://www.adacore.com/gems/gem-140-bridging-the-endianness... (I think it was added in a recent language revision).

Add to that the 'Component_Size attribute and the ability to specialize representation information easily : https://www.adacore.com/gems/gem-27 and https://www.adacore.com/gems/gem-28) and also the ability to recursively bound-check record structures (see https://docs.adacore.com/gnat_rm-docs/html/gnat_rm/gnat_rm/i... and https://dwheeler.com/lovelace/s17s4.htm for some background on 'valid and specific Ada safety extensions) you get a pretty elegant and powerful way to interact with low-level constructs.


Ah Ada. Reminded me of my high school years playing with Turbo pascal before moving to c/c++. First concurrent language I ever worked with. Thought that feature was brilliant, so much nicer than pthreads. I took a small threaded c project of mine and replaced the pthreading with Ada and linked the c logic to it using gnat/gcc. Then along came Go, et al, and my interest in Ada unfortunately faded.

Now you got me thinking about Ada again.


To jog your memory and for the curious reading this, a chapter of the new 'learn Ada' interactive tutorial by AdaCore introduces tasking: https://learn.adacore.com/courses/intro-to-ada/chapters/task...

The whole interactive book deserves a look. You can then jump to the Spark book :-D https://learn.adacore.com/courses/intro-to-spark/index.html that sadly doesn't seem to talk about tasking but Spark supports part of it.

I'm actually looking forward the next few years, as AdaCore seems to be working on memory ownership-based guarantees a la rust (at least partially) and this might make Ada (or Spark in a first pass) a good candidate for truly fearless concurrency :-D.


I recall that Herb Sutter's C++ Metaclasses proposal has an example of bitfield metaclass [1]. Zig's design is reasonable if it also makes use of those non-2^k-bit integer types in other places, and indeed it does (e.g. alignment is commonly `u29`). However if it is not doable, I think Sutter's design (auto-generated overlay on normal integers) makes more sense.

[1] http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/p070...


I have several questions. If I have too bool bit fields next to each other, how are these ordered in a given byte? Even when I have a less then 8 bit field it can span over a byte boundary. Are the low bits go into the next byte or the high bits? The blog post only mentions fields greater than 8 bits.

If a field greater than 8 bits is byte-aligned, it is represented in memory with the endianness of the host. If the field is not byte-aligned, it is represented in memory in big-endian.

So is a 12 bit fild is byte-aligned if it starts at the "first" bit of a byte (whatever that is)? Is there a way to manually specify the endianness of a field? There are certainly cases where a byte-aligned big-endian field is needed on a little-endian host.

I also don't like that keeping track of the fields' address happens in comments. It would be great if there were a language element that would specify the address of a field inside a packed struct, so the compiler could verify its correctness.


Now if only Rust had bitfields like this. It was proposed years ago but nothing has come from it.


Yes, well, now that Rust is aiming for first class support for embedded by the end of 2018, I expect there to be more pressure to find a good solution for bitfields. For that target audience, many will consider the current situation a show stopper.


Bitfield support is far more useful for emulators and for dealing with binary file formats and binary network protocols.

Embedded stuff typically means kernels and drivers. For that, the bitfields are in hardware registers. This is troublesome because the hardware might not act like normal RAM. Issues you will encounter:

1. The hardware demands a particular access size. For example, a nice big array of 4-bit values may need to be accessed only by 16-bit operations because it is on a 16-bit data bus that lacks byte enable lines. Another example is that different access sizes may go to different registers.

2. The hardware takes an abnormal action when accessed. Reading and writing might go to separate registers. Reading could fetch a byte from a FIFO, so the language/compiler is not free to do speculative or repeated reads. Writing could output to a FIFO. Reading or writing could clear bits, possibly in a different register, or initiate a coprocessor action.

3. The bus might allow reordering that can affect the device. For example, the bus might cancel operations that seem to be superseded by later operations (write after write, or read after read) but enforce ordering in other cases. The compiler must respect this in some way, possibly by suppressing optimization or by generating a compile error when the constraints appear to be violated.

You also get ugly layout issues like split fields, but that hits emulators and file formats too.


There are a few crates around that help you get over the hurdle with some (but not too much) awkwardness, like https://github.com/dzamlo/rust-bitfield


C bitfields are terrible. But Erlang has brilliant handling of bitfields, I think he should look at those and compare his implementation to that.


Erlang's bit syntax is indeed brilliant, but it's a bit different from C's bit fields: bit fields are both mapping and storage, Erlang's bit syntax is only mapping (parsing and serialisation) e.g. your bit-wide flag is going to expand to a full integer (erlang so infinite precision & heap allocated) at runtime, which may or may not be desirable.

Furthermore IIRC an Erlang pattern can be either matching or creation, you can't "abstract" a pattern to do both, so if you need to both parse and generate your binary thing will need to be written twice (at least).

This may or may not be an issue, but at the end of the day it's not quite equivalent to C's bitfields. It's very convenient though.


Lua does these very well too in my opinion. No messing around with bit-wise operators or masks - it's just an array of bits!


Could you give an example? E.g. if you have a 3 bits signed integer V followed by two 1-bit flags (F1 and F2), followed by a 3-bits unsigned integer L, followed by binary data D of length L bytes, followed by a little-endian signed integer I over two bytes, how do you parse it?

IIRC in Erlang it's something along the lines of

    <<V:3/signed, F1:1, F2:1, L:3, D:L/binary, I:16/signed-little>>


I went to look this up... I think actually the features I describe are more features of Wireshark which provides a Lua API, which is what confused me.

That Erlang syntax reminds of the python 'struct' [0], except it's generating the unpacking logic at compile time, am I right? Yes that is indeed very cool - it doesn't even seem that hard to do I'd wonder why other languages don't have a similar facility!

[0] https://docs.python.org/2/library/struct.html


> That Erlang syntax reminds of the python 'struct' [0], except it's generating the unpacking logic at compile time, am I right?

Kinda, however there are a few superior items to the bit syntax:

* In Erlang, the bindings are in the expression itself (whether packing or unpacking) which makes the correspondance easier.

* Because the bindings are embedded and set left to right, it becomes possible to use one binding as parameter to the next pattern e.g. `L:3, D:L/binary` will first parse 3 bits as an unsigned integer L, then extract the next L bytes as a binary substring.

* Also Erlang's bit syntax can do sub-byte matching, I don't think python's struct can.

* And Erlang's bit syntax is much clearer and orthogonal (modulo knowing the — convenient — defaults) e.g. where extracting a little-endian 32-bit signed integer with struct is "<i" with Erlang it's :32/integer-signed-little (32 units, integer, signed, little-endian; the default unit size for integers is the bit). More verbose, but also completely explicit and more flexible, and very easy to change.


That is wonderful ... I’ve been looking for something like this for years. Don’t suppose you know off the top of your head are there any Erlang compilers for java and if they’re any good?


I wrote ocaml-bitstring (based on Erlang) and it also works by generating the code to handle the bits at compile time. It's fully general too - eg. if you describe a structure of 1 bit followed by an array of bytes then it will add the necessary bit shifts automatically when reading the bytes. Or if you describe a struct where everything is aligned then at compile time it will generate the most efficient code. You can even have variable length bitfields (a huge headache when implementing, but it does deliver the best possible code based on what it can prove at compile time).

[1] https://rwmj.wordpress.com/2010/02/03/on-the-awesomeness-of-...


> I'd wonder why other languages don't have a similar facility

Actually this is the kind of thing lex/yacc/Bison, javacc etc. do ... except I think only for text ...


Hm. I like the approach except for the overloaded meaning of packed. In C, packing means byte-aligned packing, rather than bit-packing. These seem like separate concepts. (Both are useful! I just think bit-packing should get its own special name.)


Yeah, packing is a pretty well defined to mean adding additional empty bits to keep fields word-aligned.


In my experience that's referred to as padding, not packing. A packed struct is what you get when you remove all of the padding. Packing is therefore removing padding, not adding it.


Word-aligned? No, that's the opposite of packing!

I'd say "packing" is ambiguous and could refer to either bit-aligned or byte-aligned data. But if your fields are less than 8 bits wide, I would assume "packing" means bit-packing.


E.g., if your packed struct is { bool a; uint32_t b; }, you probably don't want a totally misaligned-by-one-bit 'b'.


Byte-aligned, in C.


I can't remember what programming language it was, but I recall one that had managed to eliminate primitive numeric and bitwise logical operations by treating them as arrays of bits (booleans).


Verilog?


I think BitC might have done something like this.


Excellent work with Zig and great documentation with detailed examples. Beautiful syntax.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: