That sounds like a possible solution but a bit overcomplicated. Might as well stick to something like C++ than risk additional complexity in my opinion.
Sure. You will still have uncertainty around memory allocation timing, even in C++. For example tcmalloc or jemalloc may need to take a global lock in order to satisfy a heap allocation if thread-local spans are exhausted.