All great, then you need to port it to something other than that one specific SS...

wtallis · on Nov 11, 2020

There are really only two FPGA vendors that can compete in this space, and Xilinx is the one that's clearly ahead for computational storage applications. They already provide a platform for storage accelerator IP to be shared between this Smart SSD and their pre-existing PCIe FPGA cards that connect to standard SSDs over PCIe or networks. So porting to another accelerator platform probably isn't as big an issue as you expect.

The bigger challenge I see for implementing something like SQLite on a SmartSSD is that you really don't want your database to exist on just one drive, so you need to figure out how to do HA across multiple SSDs while still offloading most of the computationally expensive database operations to the FPGAs instead of leaving it on the CPU. I think this will condemn SmartSSDs to always working at a slightly lower abstraction layer than what the application really wants.

aseipp · on Nov 12, 2020

Thinking about a theoretical SQLite database running on an SSD like this as a HA database for server-based systems is a poor design and a mistake. It would be a poor design even without the FPGA; SQLite simply isn't a HA replicated database.

Something like SQLite might make a decent alternative API to the flash storage layer of the SSD, though. Imagine if the storage controller of your SSD exposed a built "filesystem" that featured robust indexes, transactions, sorting, column families, etc. You could skip talking to the Linux block layer or any POSIX filesystem at all, and your optimized userspace software could directly talk to the storage controller in the SSD instead with a high level software API. This isn't far-fetched; Samsung also has a "Key-Value SSD" on the way that exposes the underlying flash storage using a (surprise!) high-level get/set KV API, for similar reasons.

A design where the controller is this powerful would also allow features like predicate pushdown in the query planner to be implemented. i.e., a `WHERE x > 7` can get pushed into the storage controller, and bad tuples that don't fit the predicate can get excluded/filtered out before getting pushed onto the memory bus. That will save significant processing time and memory bus traffic in aggregate, and it scales with the number of drives (such each drive has its own controller.) Not to mention tricks like inline hardware for sorting, compression, etc.

Outside of fancy SQLite-as-a-filesystem tricks, I suspect the allure of optimizations like predicate pushdown and inline sorting will be very attractive for OLAP systems. Time will tell if these things will stick around, but Xilinx at least seems sure as hell determined to make their way into the datacenter.

javajosh · on Nov 12, 2020

SSD's are so tiny these days you could probably bundle several of them into a 5.25" drive bay along with a controller PCB that is an FPGA interfacing with a striped and mirrored SQLLite. (Does "striping" even mean anything anymore in the age of SSDs?)