Hacker News new | past | comments | ask | show | jobs | submit login

With the right cooling setup I've been able to get Xeons to run permanently in turbo mode, kind of a back door overclock. You would have to experiment.

For lowest latency applications I void avoid using RT priorities. Better to run each core 100% with busy waiting and if you do so with RT prio you can prevent the kernel from running tasks such as vmstat leading to lockup issues. Out of the box there is currently no way to 100% isolate cores in Linux. There is some ongoing work on that: https://lwn.net/Articles/816298/

Oh yeah, the whole NUMA page migration stuff is not well documented. You'll also proactive compaction to deal with in the future: https://nitingupta.dev/post/proactive-compaction/




If you can make sure something else won't emit watts (other cores, something using AVX) then yeah turbo can work.

You can patch the kernel thread creation to not use specific CPUs... IDK of anyone publically maintaining a patchset, but look at https://github.com/torvalds/linux/blob/master/kernel/kthread... line 386.

We've only been using SCHED_FIFO on Linux, which is basically busywaiting if you only have one thread scheduled. Though I am interested in trying the SCHED_DEADLINE.

I hope someone adds a kcompact processor mask / process state disabling compaction.


There was also this recent patch https://lwn.net/Articles/816211/ to deal with kthread affinities. Even with isolcpus I find I still need to run pgrep -P 2 | xargs -i taskset -p -c 0 {} and deal with the workqueues.

Have you tried "A full task-isolation mode for the kernel": https://lwn.net/Articles/816298/ ?

Running only a single thread per core, I see no difference between SCHED_FIFO vs SCHED_OTHER. Except SCHED_FIFO can cause lockups if running 100% since cores are not completely isolated (ie vmstat timer and some other stuff).

Yes, it's annoying you cannot disable compaction. There is also work on pro-active compaction now: https://nitingupta.dev/post/proactive-compaction/


I like the Tosatti/WindRiver/Lameter patch. (Except the naming: _possible is the same meaning as _available, but they mean different things here depending if kthreads or user threads?) Just needs a proc interface.

With a patch like this you can force bottom half (interrupts), top half (kthreads) and user threads to all be on different cores.

The 'full task-isolation mode' seems wierd, because why should you drop out of isolation because of something outside your control like paging or TLB? Anyway, mlockall that. Its fine to be told I guess (except signals take time) but why drop out of isolation and risk glitches in re-isolating? It doesn't seem very polished.

Something else occured to me: you still have to be careful about data flow and having enough allocatable memory. E.g., a lot of memory local to core 0 will be consumed by buffer cache, it can be beneficial to drop it (free; numstat -ms; sync; echo 1 > /proc/sys/vm/drop_caches; free; numastat -ms).

I find it bizarre that all this THP compaction stuff is for workloads that are commonly run under a virtual machine, i.e. another layer of indirection.


As I understand 'full task-isolation mode' will prevent compaction, completely disable vmstat timer etc. So it provides additional isolation. Since you already switched into kernel mode, might as well deliver a signal to let you know it happened. If the signal is masked there should be no overhead at all except a branch to check the signal mask.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: