In genomics land its less an object store and more DB built to house giant but very sparse columnar data (think the position of genetic variants in hundreds of thousands of people) with support for interval operations and some other genomics things that traditional DBs don't support.
That said, we evaluated TileDB for genomics recently and found it lacking for our use case.
Take a look at Redis. With the modules API it's possible to load pre-existing C libraries and consume them through simple Redis commands and there's Redis clients for each and every language out there.
I did a GSoC project years ago and I hated having to use MongoDB for manpulating VCF files (it was the time where "nosql" was all the rage and all project ideas had to have mongo somewhere). Redis Modules would have been the one thing that would have made sense (that API was introduced much later).
Stavros from TileDB here. Great description of the genomics use case for TileDB. We'd be interested in learning what limitations you've found. Happy to discuss over email as well (stavros@tiledb.com).
Hey Stavros. We were looking for a data-store to integrate into a clinical genomics LIMS that supports in-system analysis. We deal with de novo sequenced clinical samples (and not genotyped samples, which seems to be what TileDB-vcf had in mind?). There are some edge cases that TileDB-vcf explicitly disallows (updates/reinserts to the same sample, overlapping variants) that are not edge cases for us but rather common occurrences.
This is an API issue with TileDB-VCF. The core TileDB library supports inserts/appends/overwrites without issues and we just need to expose those operations in the TileDB-VCF APIs. Added to our backlog, thanks!
TileDB and Hail are rather complementary. We have customers that use TileDB to store and manage their variants, and Hail to perform GWAS (by exporting from TileDB to Hail format). We are currently designing a tighter integration with Hail. This expands on our vision for a universal data engine that integrates with pretty much everything out there and does not lock you in a single framework (e.g., Spark).
That was our feeling about the two products as well, the limitations w/ TileDB-vcf though sort of forced our hands. I was (and still am) of the opinion TileDB would be a good variant store since it does do so many of the things we want and does them well
Sure but that’s a very specific use case and not a “universal database”. Object store after all is just cloud speak for KV store which is the foundation of data retrieval. So I’m still unclear what this actually does. Has it invented something new or is it tying together mature tech in a really powerful way?
That said, we evaluated TileDB for genomics recently and found it lacking for our use case.