Unfortunately, the SHA cpu extensions that will soon be available in Skylake Xeon parts (and the crypto extensions in ARMv8-a) only support sha2-256 (and SHA256/224, and the ill-advised sha1... why is Intel adding instructions or microcode for a hash function that's being phased out?). So you have a choice between a faster-in-software-on-64bit sha-512/256 and a faster-in-hardware sha-256. Unless there's some way to get partial speed-up of sha-512 using sha-256 instructions, but at a glance they don't look low-level enough to apply to sha-512... do they?
Or you can ignore both sha256 and sha512/256 and use something else like sha-3 or blake2b. Blake2b obviously has less attention on it so more likely to harbor a weakness, but it's fast in software. And sha-3 will get cpu extensions eventually, and it'll hopefully be a better thought out inplementation than just support for the 256-bit variant.
In that case HMAC-SHA-256 may be a good choice. It too is immune to length extension attacks, and the HMAC construct has proven itself to greatly augment the strength of the underlying hashing algorithm (e.g. MD5 is considered broken, but HMAC-MD5 is not). It's just twice as expensive as SHA-256, so I'm not sure if that's faster than SHA-512 on software versus HMAC-SHA-256 on hardware.
Or you can ignore both sha256 and sha512/256 and use something else like sha-3 or blake2b. Blake2b obviously has less attention on it so more likely to harbor a weakness, but it's fast in software. And sha-3 will get cpu extensions eventually, and it'll hopefully be a better thought out inplementation than just support for the 256-bit variant.