Hacker News new | past | comments | ask | show | jobs | submit login

Yea, I wasn't familiar with Airbyte before writing that comment so now I'm seeing the value in it. We have tons of teams asking "how do I get this data into BigQuery" and the answer is usually "use this airflow operator to dump it into GCS and then use this airflow operator to load it into BigQuery" which isn't super useful for a non-technical person or even really any technical person not familiar with Airflow.

A mesh is certainly something in-between a lake and a warehouse... Something super simple that I've gotten good feedback on so far from DSs is just documenting the transformed data in place. It was really difficult to do this in our old ETL stack (data pulled from HBase, transformed to parquet + Hive in HDFS) but we've moved a lot of it over to Avro files loaded into BigQuery where we can just put decorators on our Scala transformation code that's writing the Avro files and that updates the schema with descriptions in BigQuery. Gives a nice bit of ownership to the engineering team and lets the DS using the data be a lot more autonomous. That boundary has to exist somewhere (or I guess in many places for a "mesh") so having it distinctly at data getting loaded in feels right to me.




>the answer is usually "use this airflow operator to dump it into GCS and then use this airflow operator to load it into BigQuery"

why not hide these steps behind a yaml file and construct dags from those. devs already are used to writing yaml files for ci, kubernetes ect.


Nice work there! I also think that the next challenge for data teams is all this data documentation and discovery work.

I still think that Airflow is great for power data engineers. Airbyte and dbt are positioned to empower data analysts (or lazy data engineers like me) to own the ELTs.


Agreed. I see a lot of folks coming up with one off solutions for pulling data out of 3rd party sources like Kustomer or Lever. Giving a centralized UI for setting that up would be a great service.

Seems like I have a fun weekend project.


what about meltano/singer




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: