Hacker News new | past | comments | ask | show | jobs | submit login

Nice article. This part puzzled me though:

> The operating system reads data in page granularity, meaning it can only read at a minimum 4kB at a time. That means if you need to read read 1kB split in two files, 512 bytes each, you are effectively reading 8kB to serve 1kB, wasting 87% of the data read.

SSDs (whether SATA or NVMe) all read and write whole sectors at a time, right? I'm not sure what the sector size is, but 4 KiB seems like a reasonable guess. So I think you're reading the 8 KiB no matter what; it may just be a question of what layer you drop it at (right when it gets to the kernel or not). Also, doesn't direct IO require sector size-aligned operations?




All SATA SSDs and almost all NVMe SSDs use 512-byte LBAs by default, but many NVMe SSDs can be reconfigured to use 4096-byte LBAs by default. The underlying NAND flash memory these days typically has a native page size of 16kB.


Thanks. What are the tradeoffs with NVMe sector size? There must be some reason to want 4096 if they added that option. Is there less write amplification as you get closer to the native page size or does the firmware avoid that anyway with its write leveling?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: