Hacker News new | past | comments | ask | show | jobs | submit login

All great, then you need to port it to something other than that one specific SSD and it's too much work.

We have abstraction boundaries for a reason, we give up a small % of performance and in return we can write code once (say SQLite) and use it in many scenarios. For something like SQLite it means it's been around for a long time and had a lot of optimization work done on it (and that probably outweighs the few % gain you might get from such tight integration).

You'd probably get a bigger performance gain from just not using SQL (eg. a DBM).




There are really only two FPGA vendors that can compete in this space, and Xilinx is the one that's clearly ahead for computational storage applications. They already provide a platform for storage accelerator IP to be shared between this Smart SSD and their pre-existing PCIe FPGA cards that connect to standard SSDs over PCIe or networks. So porting to another accelerator platform probably isn't as big an issue as you expect.

The bigger challenge I see for implementing something like SQLite on a SmartSSD is that you really don't want your database to exist on just one drive, so you need to figure out how to do HA across multiple SSDs while still offloading most of the computationally expensive database operations to the FPGAs instead of leaving it on the CPU. I think this will condemn SmartSSDs to always working at a slightly lower abstraction layer than what the application really wants.


Thinking about a theoretical SQLite database running on an SSD like this as a HA database for server-based systems is a poor design and a mistake. It would be a poor design even without the FPGA; SQLite simply isn't a HA replicated database.

Something like SQLite might make a decent alternative API to the flash storage layer of the SSD, though. Imagine if the storage controller of your SSD exposed a built "filesystem" that featured robust indexes, transactions, sorting, column families, etc. You could skip talking to the Linux block layer or any POSIX filesystem at all, and your optimized userspace software could directly talk to the storage controller in the SSD instead with a high level software API. This isn't far-fetched; Samsung also has a "Key-Value SSD" on the way that exposes the underlying flash storage using a (surprise!) high-level get/set KV API, for similar reasons.

A design where the controller is this powerful would also allow features like predicate pushdown in the query planner to be implemented. i.e., a `WHERE x > 7` can get pushed into the storage controller, and bad tuples that don't fit the predicate can get excluded/filtered out before getting pushed onto the memory bus. That will save significant processing time and memory bus traffic in aggregate, and it scales with the number of drives (such each drive has its own controller.) Not to mention tricks like inline hardware for sorting, compression, etc.

Outside of fancy SQLite-as-a-filesystem tricks, I suspect the allure of optimizations like predicate pushdown and inline sorting will be very attractive for OLAP systems. Time will tell if these things will stick around, but Xilinx at least seems sure as hell determined to make their way into the datacenter.


SSD's are so tiny these days you could probably bundle several of them into a 5.25" drive bay along with a controller PCB that is an FPGA interfacing with a striped and mirrored SQLLite. (Does "striping" even mean anything anymore in the age of SSDs?)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: