> All raw data lives in S3... I'm curious to hear if others use a similar pattern, or if there are better options.
You still may need to maintain information about what is in S3. If you're already using Snowflake you can probably achieve this via External tables. Otherwise you could run your own hive meta-store or use AWS external tables.
We're also trying to decide whether it's cheaper/easier to store things in S3 or just keep everything internal in snowflake. In some cases the compression is so good on some of our data, it's better keeping data stored in a table than in internal or external staging. Obviously this is bad if we ever have to move away from Snowflake, but we haven't committed to either approach and lots of data is still backed up in S3. Our total data warehouse is about 200TB at the moment and we're projecting significant growth over the next couple of years.
You still may need to maintain information about what is in S3. If you're already using Snowflake you can probably achieve this via External tables. Otherwise you could run your own hive meta-store or use AWS external tables.
We're also trying to decide whether it's cheaper/easier to store things in S3 or just keep everything internal in snowflake. In some cases the compression is so good on some of our data, it's better keeping data stored in a table than in internal or external staging. Obviously this is bad if we ever have to move away from Snowflake, but we haven't committed to either approach and lots of data is still backed up in S3. Our total data warehouse is about 200TB at the moment and we're projecting significant growth over the next couple of years.