If you use proto-bufs, it means you already have a system for internal auto-schematization. Why not pack all the fields together and use a bit-vector header to signify which fields are present and which fields have default values? I'd LOVE to see a document DB with ~1 bit overhead per field.
Yes, that's pretty much what we're going to do. It's a bit hard to guarantee everything in a fully concurrent, sharded environment so it'll take a bit of time, but that's basically the plan.