But there isn't such a thing as a "general CPU", is there? Some are more general than others, but just like VM's, CPU's are designed and optimized for a specific flow. You're expected to adapt your problem to the architecture - tagged memory and different underlying architecture could make FP considerably faster (in an ideal world, not saying it'd be easy at all), but instead we have to find ways of shoehorning one paradigm on top of another.