Correction: Finding the *optimal algorithm* (minimal number of operations) for c...

Correction: Finding the optimal algorithm (minimal number of operations) for computing a Jacobian is NP-complete, but evaluating it in a multiple of the cost of a forward evaluation is standard.

Also, many optimizers that are popular in ML only need gradients (in which case the Jacobian is just the gradient vector). Second order methods are important in applications with ill-conditioning (such as bundle adjustment or large-scale GPR), but they have lots of exploitable structure/sparsity. The situation is not nearly as dire as you suggest.