Hacker News new | past | comments | ask | show | jobs | submit login

Hm, yeah, I think a key-value store would be easier to implement. I haven't looked at redis for some time now, but last time I did, persistence was done through snapshotting and everything would really be loaded into memory at start time. So that wouldn't work for this use case, where all you can do is serve a static file.

But my question revolves around databases assuming that the disk they access is local or at least a fast network storage. I wonder if there are any databases optimized to access slow storage over low bandwidth, where you're really trying to optimize the amount of data read more than anything else.




Well, every database already optimizes disk access (at least until the recent years with the ‘just add SSDs’ attitude). However, they tend to assume that indexes should be loaded into memory. For this use-case, you'd want a database that can use indexes themselves from disk and treat them like partitioned storage: e.g. when reading data for years 2019 to 2021, only request parts corresponding to that, and not previous years. Dunno whether SQLite can have indexes partially in memory—with its niche of low-end devices and apps, it's quite possible that it can.

Actually, this sort of partial access (i.e. partitioning) is rather easy to implement by addressing separate data files by name, instead of using numeric ranges into a database. Basically just put the data into files named by the years (in my example); or bucket the data into chunks of arbitrary size and use the chunks as files. Elementary to extend this to multiple fields in the index. In short, partitioning based on actual field values can be much easier in the static-http approach than using opaque ranges. Probably also more effective if something like http2 allows requesting several files in one request—since you can avoid requesting too little or too much.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: