It's not that simple, when you look at those architectures like AMD's fusion platform, the differences of latencies and bandwidth play a huge role, making it a much more nuanced story. One of the papers showing this: http://link.springer.com/article/10.1007/s00450-012-0209-1#p...
The paper's unfortunately paywalled, but it being two years old, and with AMD's fusion platform having added a few interesting features in the interim, like being able to just throw a pointer over the wall to the GPU, instead of copying the entire data structure over, makes me wonder if the paper needs to be revised to address this. GPGPU is a rapidly changing field, and just a couple years is a big difference.