Hacker News new | past | comments | ask | show | jobs | submit login

Generalized data architecture, understanding the roles and challenges of turning data lakes into data warehouse, etc. including the user personas that use each and their needs and limitations. Different types of databases and data storage types and what they excel at. How to save money on storage. Considerations of idempotency, replay-ability, rollback-ability.

You say all workflow engines are the same but even just reading the Pachyderm docs will give an idea of modern data engineering best practices - data versioning and data lineage, incremental computation, etc.

Temporal also has a very cool, modern approach (distributed, robust, event-driven) to generalized workflow management (non big data specific)- if you’re used to stuff like Airflow, Temporal is a jump 10 years forward.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: