this article is oddly really knowledgeable and also really unknowledgeable at the same time. despite its length, it doesn't explain what an initramfs is or how it's created until basically the end (excluding the appendix). an initramfs is simply a cpio archive with a minimum of one file, an executable "/init", then compressed with a standard compressor. currently gzip, xz, and lz4 are the useful ones.
this means that an optimized initramfs creator ought to take as long as a tar file creator, because it basically is a glorified tar file creator. on my system, with a full cache, mkinitcpio takes about 1.5 seconds to run. my custom initramfs generator (not released yet) takes 0.03 seconds to run using "cat" as compressor. it is written entirely in POSIX shell and uses only the external commands ldd, gen_init_cpio, and the compressor. however, if gzip -9 is used for the compressor, then compressing the 12 MB takes 1.8 seconds. so, we can see that compression can significantly inflate (pun intended) the time consumed. as a matter of fact, looking at https://github.com/distr1/distri/blob/master/cmd/distri/init..., it appears to pass no options to gzip, so I assume the default is used, probably -6. http://man7.org/linux/man-pages/man8/dracut.8.html indicates that dracut uses gzip -9 by default, which would increase the time required. if dracut is configured in a non-default manner (e.g. lz4 -12) that would further increase the time.
the author credits Go, concurrency, no external dependencies, and threads (?!) for the improved performance. this is manifestly unnecessary: no threads are needed to improve the performance of "tar", firstly because it is I/O bound, and secondly because the compressor is usually the slowest part anyways. it might be possible to slightly improve the performance by using native code, but there is no substantial difference between running mkinitcpio in 1.0 vs 1.5 seconds, and there's definitely no difference between 10.0 and 11.5 seconds (lz4 -12 takes about 10 seconds to run, and I assume the remaining 1.5 seconds can be fully optimized to require 0 seconds).
regarding the rest of the post, it's... a little weird. I don't understand why anyone would manually trudge through modules.dep files when that is the main objective of the "modprobe" command, as opposed to "insmod", which does no dependency resolution. (modprobe also supports config files, aliases, and some other things, but the main and original objective is dependency resolution.) it's also a bad idea to reimplement libblkid: it supports a ton of filesystems, many of which one might actually want to use as a root filesystem, but are not supported by this basic implementation, including xfs, btrfs, or zfs. it also doesn't appear to support LUKS2. libblkid isn't even necessary: busybox findfs works perfectly well for most common filesystems, and runs very quickly (0.02 seconds on my machine). but, again, the main cost when booting with a necessarily cold cache is likely to be I/O, not spawning a process. I don't understand why someone would faff about with manually parsing ELF files when ldd works just fine with simple text parsing and handles interpreter, rpath, and transitive dependencies built-in.
this all just seems overly complicated to me. my initramfs generator is 77 lines of code, including automatic ldd dependency generation, and init is 37 lines, including LUKS decryption, dropbear for remote password entry, and e2fsck with automatic reboot if requested by e2fsck, for a total of about 110 lines. I didn't count the lines of code for minitrd, but adding together the two files linked is already about 1000 lines of code. mkinitcpio has far more flexibility and is only about 1500 lines for the core code.
this means that an optimized initramfs creator ought to take as long as a tar file creator, because it basically is a glorified tar file creator. on my system, with a full cache, mkinitcpio takes about 1.5 seconds to run. my custom initramfs generator (not released yet) takes 0.03 seconds to run using "cat" as compressor. it is written entirely in POSIX shell and uses only the external commands ldd, gen_init_cpio, and the compressor. however, if gzip -9 is used for the compressor, then compressing the 12 MB takes 1.8 seconds. so, we can see that compression can significantly inflate (pun intended) the time consumed. as a matter of fact, looking at https://github.com/distr1/distri/blob/master/cmd/distri/init..., it appears to pass no options to gzip, so I assume the default is used, probably -6. http://man7.org/linux/man-pages/man8/dracut.8.html indicates that dracut uses gzip -9 by default, which would increase the time required. if dracut is configured in a non-default manner (e.g. lz4 -12) that would further increase the time.
the author credits Go, concurrency, no external dependencies, and threads (?!) for the improved performance. this is manifestly unnecessary: no threads are needed to improve the performance of "tar", firstly because it is I/O bound, and secondly because the compressor is usually the slowest part anyways. it might be possible to slightly improve the performance by using native code, but there is no substantial difference between running mkinitcpio in 1.0 vs 1.5 seconds, and there's definitely no difference between 10.0 and 11.5 seconds (lz4 -12 takes about 10 seconds to run, and I assume the remaining 1.5 seconds can be fully optimized to require 0 seconds).
regarding the rest of the post, it's... a little weird. I don't understand why anyone would manually trudge through modules.dep files when that is the main objective of the "modprobe" command, as opposed to "insmod", which does no dependency resolution. (modprobe also supports config files, aliases, and some other things, but the main and original objective is dependency resolution.) it's also a bad idea to reimplement libblkid: it supports a ton of filesystems, many of which one might actually want to use as a root filesystem, but are not supported by this basic implementation, including xfs, btrfs, or zfs. it also doesn't appear to support LUKS2. libblkid isn't even necessary: busybox findfs works perfectly well for most common filesystems, and runs very quickly (0.02 seconds on my machine). but, again, the main cost when booting with a necessarily cold cache is likely to be I/O, not spawning a process. I don't understand why someone would faff about with manually parsing ELF files when ldd works just fine with simple text parsing and handles interpreter, rpath, and transitive dependencies built-in.
this all just seems overly complicated to me. my initramfs generator is 77 lines of code, including automatic ldd dependency generation, and init is 37 lines, including LUKS decryption, dropbear for remote password entry, and e2fsck with automatic reboot if requested by e2fsck, for a total of about 110 lines. I didn't count the lines of code for minitrd, but adding together the two files linked is already about 1000 lines of code. mkinitcpio has far more flexibility and is only about 1500 lines for the core code.