The object size has to be at least the alignment size so that arrays work properly--&somearray[1] needs to be properly aligned, and that only works if the object size is a multiple of the alignment: sizeof myint >= _Alignof(myint) && (sizeof myint % _Alignof(myint)) == 0.
As the proposal says, the bit alignment of these types is min(64, next power-of-2(>=N)). (Of course, the alignment can't be smaller than 8 bits, which the proposal fails to account for.) Assuming CHAR_BIT==8, it follows that:
So the amount of padding can be considerable. But that doesn't matter much. What they're trying to conserve is the number of value bits that need to be processed, and in particular minimize the number of logic gates required to process the value. Inside the FPGA presumably the value can be represented with exactly N bits, regardless of how many padding bits there are in external memory.
Where does the spec say that it does that? As far as I can tell C only allows objects to have sizes in whole number of bytes, and that includes booleans.
Although a _Bool type can be used for a bit field (having size of 1 bit) but you can't use sizeof with a bit field.
A byte is CHAR_BIT bits, where CHAR_BIT is required to be at least 8 (and is exactly 8 for the vast majority of implementations).
The word "byte" is commonly used to mean exactly 8 bits, but C and C++ don't define it that way. If you want to refer to exactly 8 bits without ambiguity, that's an "octet".
I think you worded this pretty well. One thing I'd add (and that annoys me about C & C++) is that the size guarantees for integral types boil down to is that CHAR_BIT = sizeof(char) and that sizeof(char) <= sizeof(short) <= sizeof(int) <= sizeof(long). sizeof(T*) (for any T) is not even defined well, and can be OS/compiler specific. Makes cross-platform 32/64-bit support painful, especially because there were no strictly sized integer types before C11 & C++11. Although C11 & C++11 define types like int32_t and int64_t, they're not actually required to be those sizes! The various x-bit types only have to at least be large enough to store x-bits. So, on a hypothetical 40-bit CPU, sizeof(int32_t) could vary well be 40-bits, if that's the natural "word" size for the CPU.
The devil is always in the details, and the devil is very, very annoying...