The "missing 1" is a waste-category that is implicitly re-scaled.
The explicit 1 formulation is used in binary softmax, and the implicit (not seen 1) is used in multinomial softmax. I suspect this is the old "notation B looks silly in terms of notation A's standards."
The explicit 1 formulation is used in binary softmax, and the implicit (not seen 1) is used in multinomial softmax. I suspect this is the old "notation B looks silly in terms of notation A's standards."