GPGPUs are notoriously hard to extract high performance. If you're an enterprise customer with no readily-available GPGPU code, Xeon Phi makes much more sense GPUs for a few reasons.
First, the talent pool for HPC x86 programmers is an order of magnitude larger than for expert GPGPU programmers - Xeon Phi is just a virtual x86 server rack with TCP/IP messaging.
Second, the amount of time and effort to extract useful performance from GPGPUs is quite a lot; if it's for internal use and you're not selling the code to the masses, you're likely to get the same amount of performance with less time on the Phi, unless you're going for "the best, regardless of money & time".
Last, most enterprise customers will want ECC + other compute features. They're sold in the pro-level 3k+ Teslas, which happen to be more expensive than the Phi.
Where GPGPU does make sense: consumer-level hardware using already-written software (workstations and hobbyists in particular) and businesses where performance/watt is crucial at any cost.
Phi architecture is closer to a GPU than a rack of x86 servers.
With 60 cores reading memory over a common ring bus latency will kill you unless you tile your loops to maximize cache reuse [1], at which point you might as well write a GPU code which preloads blocks of data to local memory and works there.
Also, to beat performance of normal x86 CPU you must use vector instructions, what gives you all the little problems GPU warps are known to cause.
a Phi is not quite as you imagine it, it's more like a single machine with 60/61 cores (when you cat /proc/cpuinfo, there are 60/61 entries).
the main optimisation techniques for GPUs aren't difficult to grasp (in my opinion), although not all classes of problem are suited to execution on GPU.