It really isn't clear to me from your description why I might find TileDB useful...

Shelnutt2 · on July 20, 2020

Seth from TileDB here. There are several differences from Redis or S3.

Redis is primarily a in-memory key-value store. There is an option for persistence but it's not primarily designed for persisting data. TileDB is designed first and foremost for persisting data to disk or to a cloud object store. Redis does not directly/natively support persisting to cloud object stores, as it's not really in the design goals. Another difference is TileDB is a column store designed around dimensions (~primary index) and attributes (non-indexes columns), where redis is a key-value with several data types but effectively its single value (that is one key equals one value, it might be a list type, but you can't have 1 key equal to a structure or multiple datatypes). You must serialize your data structures to fit into the key-value or you need to keep track of multiple keys and indexes if you want to approximate a columnar storage using lists. Depending on your application and use case Redis and TileDB are more likely to complement each other than you to select one over the other.

Using S3 directly, either with parquet or flat files is an option and one that is widely used. The problem we see with this is the fact that neither parquet nor flat files are designed for cloud objects stores and they leave a lot up to the application to implement since there is no single storage engine designed around any of these formats. This is why we've seen in the community over the last few years additional frameworks come about such as Delta Lake, Iceberg, Hudi and others. These systems are built to help facilitate the eventual consistency of cloud object stores, the multi-writer/multi-read problem and handling things like updates and deletes. By contrast TileDB has many of the features that are needed built directly into the format design and into the storage engine. TileDB's MVCC design instead of a single file allows it to natively handle updates and insertions with cloud object stores. We have designed it with the eventual consistency in mind, and are safe from corrupt reads or writes without the need for a central locking or transaction system.

In addition to the advantages we believe the TileDB storage engine has with S3, I also want to mention that one of the main issues we wanted to solve with TileDB Cloud was sharing and access control of S3 data sets. S3 access policies do exist and can be used to facilitate sharing of objects with other users/accounts/public. However anyone who has dealt with the S3 policies has seen that they can grow quickly and become unwieldy as you try to limit different prefixes. It seems like many times companies end up making a bucket public to share the data instead of managing the access policies, and we all see the various data leaks that happen as a result. With TileDB Cloud we offer very easy and simple sharing capabilities. From our web console we aim for it to be trivial to select an array, and share it with another user, organization or even to make it public. [1]

For some quick examples, we do have some example jupyter notebooks for running python examples [2]. These are also available on TileDB Cloud, if you sign up we are giving $10 of free credit and you are able to launch a jupyterlab instance and see these examples preloaded. We also have some quickstart examples in different languages if you prefer something other than Python [3]. The quickstarts are not full use cases but they do give you an overview of basic API usage. I'd love to get some feedback from you on what type of lightweight examples you are looking for. We are always aiming to improve our documentation and make it easier to discover about TileDB.

[1] https://docs.tiledb.com/cloud/console/arrays/sharing-arrays

[2] https://github.com/TileDB-Inc/TileDB-Cloud-Example-Notebooks...

[3] https://docs.tiledb.com/main/quickstart

solidasparagus · on July 21, 2020

Thank you for that, it was very helpful. Those notebooks and quickstart examples are exactly what I was thinking of, I just hadn't found them.

To be honest, the splash page doesn't help me understand which of my problems the technology would solve and so I didn't dig deep enough to find those examples. I would say the splash page is a little heavy on what TileDB is and a little light on why I should care. But thank you for taking the time to explain it to me!