As I recall the minimum total user + kernel stack size is 10kB in the Linux kernel, so 1M threads is 10GB of space. It should be doable, though you will probably have to bump up kernel limits.
A million threads is an extreme case, though. No system can reliably spawn that many threads that are actually doing something interesting without a very large amount of memory. When you leave the realm of microbenchmarks you have to expect that an unknown quantity of threads will have deep call stacks at any given time, so you really need to give yourself leeway to avoid the risk of OOM.
So a big advantage that Go's M:N goroutine model brings to the table is how cheap they are. Cheap enough that tons of concurrency-related stuff you want to do in application code, like implementing a highly-concurrent algorithm, can be done with goroutines directly, without having to think too hard about mechanical sympathy and e.g. translate logical concurrency to physical threading. Go processes commonly have 1M or even 10M active goroutines at once.
So I don't think it's fair to say POSIX threads are comparable or whatever if they don't have this property.
Go processes do not typically have 1M or 10M active goroutines at once. The initial stack size for goroutines is 2kB, so 10M goroutines would mean 20GB just for stacks, even assuming that the goroutines never grow their stack (which cannot be assumed for anything nontrivial). The 2kB minimum stack size is on the same order of magnitude as the 10kB POSIX thread stack size.