Since we are talking container formats, I would recommend something in the IFF/RIFF family (which includes "true" Four-byte-type IFFs like RIFF, WAV, WEBP, and variants like PNG, MOV, MP4). They are about as simple as you can get as a container format. It has its limitations but it's basically the simplest "boxed" format you can implement - Type-Length-Value.
I would absolutely recommend basing on some sort of container format (vs "raw" data structures), it makes it really easy to add functionality without breaking backwards compatibility.
Basically all container formats follow this idea of "collection of boxes" pattern. The IFF series are box-length based. Ogg uses capture patterns but no explicit box size (I think). The former means you have to know the box size ahead of time (or seek back and write it in), the latter is better for streaming, but harder to seek (without indexes).
I would absolutely recommend basing on some sort of container format (vs "raw" data structures), it makes it really easy to add functionality without breaking backwards compatibility.
Basically all container formats follow this idea of "collection of boxes" pattern. The IFF series are box-length based. Ogg uses capture patterns but no explicit box size (I think). The former means you have to know the box size ahead of time (or seek back and write it in), the latter is better for streaming, but harder to seek (without indexes).