My guess is that this would still be a bottleneck in practice. LuaJit has a similar feature, and Mike Pall has said several times that if you care about performance, you're almost certainly better off using 100% Lua, because calling into C is slow.
Though, it seems much easier to work around for the implementation than the existing Python C API.
With LuaJIT calling C from the ffi is fast (from jitted code), just the traditional Lua C interface is slow. Lua code can be faster as it can optimise through the boundary of course.
Though, it seems much easier to work around for the implementation than the existing Python C API.