there is less context switches because you cant write some data to several distinct file descriptor using a single system call.
You can also read data from several file descriptor in a single System Call.
This way you significantly reduce the number of System call instead of doing one blocking read() per connection.
I believe context switch round trip (from user space to kernel to user space) is much more expensive than simply switching between goroutine of the same process.
Memory usage per thread is a property of the GC, not M:N threading. You can have very small stacks in a 1:1 implementation too.