Not my area of expertise, but I was under the impression that finite element analysis, like other sparse algebra problems, are reasonably well suited for GPUs, which are much more parallel than 2 or 3 cores. Have you looked into that?
The time domain codes work well with GPUs and multiple cores, but the frequency domain ones don’t. I don’t know enough of what’s going on under the hood, but it’s like that for all of them.