Hacker News new | past | comments | ask | show | jobs | submit login

I wanted to mention some things that are still struggles with our current platform at Simple.

Maintaining our Redshift schema can be time-consuming, especially since there can be upstream changes that affect the schema of the messages we consume, and we often aren't notified. We'll just be silently dropping the new columns, or loads into Redshift will start to fail. Postgres 9.5 is supposed to include DDL messages in logical decoding, which will help for our pipeline from Postgres databases, but we'd love to develop a more robust system for automatically updating the Redshift schema to accommodate changes in the incoming data format. Naive solutions would likely be susceptible to unwanted behavior coming from spurious messages. We could end up with a large number of mostly empty columns in some cases.

We've also had trouble maintaining knowledge about the schema. We'd like to have some sort of "data dictionary" that would be available for folks to understand where the data in Redshift is coming from and what each of the columns mean. This might end up being a standalone document, but there's concern about it staying up to date with changes in the schema. We've toyed with the idea of maintaining this sort of documentation as Javadoc-style comments directly in the repository where we define our Redshift schema and migrations, but we haven't found any existing tooling to render documentation from comments in SQL source files.




Consider applying for YC's first-ever Fall batch! Applications are open till Aug 27.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: