In genomics land its less an object store and more DB built to house giant but v...

kristoff_it · on July 20, 2020

Take a look at Redis. With the modules API it's possible to load pre-existing C libraries and consume them through simple Redis commands and there's Redis clients for each and every language out there.

I did a GSoC project years ago and I hated having to use MongoDB for manpulating VCF files (it was the time where "nosql" was all the rage and all project ideas had to have mongo somewhere). Redis Modules would have been the one thing that would have made sense (that API was introduced much later).

aroch · on July 20, 2020

Oh, hmm, that's very interesting and not something I would have thought about! :)

stavrospap · on July 20, 2020

Stavros from TileDB here. Great description of the genomics use case for TileDB. We'd be interested in learning what limitations you've found. Happy to discuss over email as well (stavros@tiledb.com).

aroch · on July 20, 2020

Hey Stavros. We were looking for a data-store to integrate into a clinical genomics LIMS that supports in-system analysis. We deal with de novo sequenced clinical samples (and not genotyped samples, which seems to be what TileDB-vcf had in mind?). There are some edge cases that TileDB-vcf explicitly disallows (updates/reinserts to the same sample, overlapping variants) that are not edge cases for us but rather common occurrences.

stavrospap · on July 20, 2020

This is an API issue with TileDB-VCF. The core TileDB library supports inserts/appends/overwrites without issues and we just need to expose those operations in the TileDB-VCF APIs. Added to our backlog, thanks!

hobofan · on July 20, 2020

> That said, we evaluated TileDB for genomics recently and found it lacking for our use case.

Could you share what solution you landed on instead?

aroch · on July 20, 2020

Hail.is is what we landed on. There are some things we don't like with it too but weren't deal breakers

stavrospap · on July 20, 2020

TileDB and Hail are rather complementary. We have customers that use TileDB to store and manage their variants, and Hail to perform GWAS (by exporting from TileDB to Hail format). We are currently designing a tighter integration with Hail. This expands on our vision for a universal data engine that integrates with pretty much everything out there and does not lock you in a single framework (e.g., Spark).

aroch · on July 20, 2020

That was our feeling about the two products as well, the limitations w/ TileDB-vcf though sort of forced our hands. I was (and still am) of the opinion TileDB would be a good variant store since it does do so many of the things we want and does them well

johnc1231 · on July 20, 2020

Hail team member here. I'd be curious to hear what the things you don't like about hail are. Hopefully they're things that are already on our road map

qeternity · on July 20, 2020

Sure but that’s a very specific use case and not a “universal database”. Object store after all is just cloud speak for KV store which is the foundation of data retrieval. So I’m still unclear what this actually does. Has it invented something new or is it tying together mature tech in a really powerful way?