I find its documentation quite poor though: "If specified, adds a new batch of zeros to the key and value sequences at dim=1."
Doesn't describe the implications even briefly. If they add just your second sentence to that description, it'll immediately become so much more useful.
Doesn't describe the implications even briefly. If they add just your second sentence to that description, it'll immediately become so much more useful.