I've already responded regarding the need to calculate the gradient, but after reading up on the affine-invariant sampler I'm wondering if it wouldn't be an idea to use a hybrid setup where you have several chains running alongside like the affine-invariant sampler and use those to approximately solve the Hamiltonian dynamics... not sure what will happen, but it removes some of the 'arbitrarity' in the proposal of steps for the affine-invariant method, and actually uses the already computed values of the distribution, which should (in theory) be better.
Suppose I'll have to try it now.