As I understand it, kernel preemption of a user thread happens (and has to happen) on the core that's running the thread. The kernel is not a separate process, but rather a part of every process. What you're describing sounds more like a hypervisor than a kernel. That distinction isn't purely semantic; the two operate at different security levels on the CPU and use different instructions and mechanisms to do their respective jobs.
edit: That having been said, I may be misinterpreting what you described; there's a comment in another thread by @zeusk which says to me that more or less this (single core used/reserved for making priority decisions) is already the case on many multi-core systems anyway, thanks to IPI (inter-processor interrupts). So, presumably, the prioritization core handles the preemption interrupts, then runs decision logic on what threads actually need to be preempted, and sends those decisions out to the respective core(s) using IPI, which causes the kernel code on those cores to unconditionally preempt the running thread.
However, I'd wonder still about the risk of memory barriers or locks starving out the kernel scheduler in this kind of architecture. Maybe the CPU can arbitrate the priority for these in hardware? Or maybe the kernel scheduler always runs for a small portion of every time slice, but only takes action if an interrupt handler has set a flag?
edit: That having been said, I may be misinterpreting what you described; there's a comment in another thread by @zeusk which says to me that more or less this (single core used/reserved for making priority decisions) is already the case on many multi-core systems anyway, thanks to IPI (inter-processor interrupts). So, presumably, the prioritization core handles the preemption interrupts, then runs decision logic on what threads actually need to be preempted, and sends those decisions out to the respective core(s) using IPI, which causes the kernel code on those cores to unconditionally preempt the running thread.
However, I'd wonder still about the risk of memory barriers or locks starving out the kernel scheduler in this kind of architecture. Maybe the CPU can arbitrate the priority for these in hardware? Or maybe the kernel scheduler always runs for a small portion of every time slice, but only takes action if an interrupt handler has set a flag?