Hacker News new | past | comments | ask | show | jobs | submit login

"for a magnetic drive, reading data sequentially will be significantly faster than random access (due to increased overhead of page faults),"

...err... no.

Magnetic drives have slow random access due to seek-time, i.e the time taken for the head and disk to physically change position.

In comparison to that, SSDs are effectively zero latency but they still have read-ahead/buffering/caching latencys to deal with.




The quoted text appears to be comparing the sequential and random-access speeds of magnetic disks, so the differences with respect to SSDs does not come into it. On the other hand, I do not understand what the author means in the following clause, where the slower random access is attributed to the overhead of page faults, unless the author has in mind a specific (and unmentioned) scenario involving memory-mapped access (and if page faults are the issue in that scenario when using magnetic disks, why would one not have the same issue when using SSDs? I would have thought the causality goes the other way: the overhead of page faults is higher when using magnetic disks because of their relatively slow random access.)


The difference is that magnetic disks have a reading head that must physically move (slow) so it's much faster to access data under the current head position. I guess access time is roughly the same for any data in an SSD.


Possibly... All I can surmise is that the OP may think disks are addressed in a memory-mapped fashion and hence may be subject to page-faults for some reason.

(Obviously, they're not).


Reading that generously and since we’re talking about abstractions... reading from disk via mmap does work via page faults! Except... it’s a layer up. Doing random reads on an mmap’d file will likely have terrible performance until those pages have been cached, but one layer down there’s no guarantee that sequential reads from an mmap’d file are going to be sequential reads from disk! (Because the file isn’t guaranteed to be laid out sequentially on disk)

Others in this discussion have talked about some abstractions being perfect and a consumer not needing to understand the layer beneath; I strongly disagree. Ultimately, the physical reality of the machine will come into play (disk, RAM, caches, network, CPU etc), and I am generally uncomfortable if I don’t have a solid feel for how the high-level operations in an abstraction are going to use those resources.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: