Did you use a wrapper or work directly with the solver? If you require runtimes that are that small the compilation time into the proto-model fed into the solver is likely a bottleneck.
Usually the time horizon for solving these kinds of problems is on the scale of minutes/hours so the tool chain isn't optimized for that time span.
I tried both ways: compiling into the proto model to use the newer solver, and also using the older solver directly. But the profiling in each case showed that the solving time dominated. I don't think it was an issue of what the library is optimized for so much as it was just too big of a problem to solve in too little time. Even my hand rolled solution that gives non-optimal answers can take 5-10ms on difficult queries.
We ended up requiring users to adjust their queries a little to make it easier to spot the targeted preintersections. This seemed like a decent trade-off considering the latency impact of the alternative.
Actually, we tried both exhaustive search and sufficient search (where we had a soft-deadline and accepted the best solution reached within the deadline). Even with this dark launch found queries in our stream that had unacceptable delays. We were in contact with the team through the development process, so we at least had some reason for belief that we were using best practices. And you're right that ultimately the CP solver wasn't the right tool for this job -- simplifying the problem and giving up some capabilities was the solution. We had hoped that a general solution would be feasible, as that would allow expressing some things efficiently which we couldn't in our existing setup. But it was not to be.
Usually the time horizon for solving these kinds of problems is on the scale of minutes/hours so the tool chain isn't optimized for that time span.