Hacker News new | past | comments | ask | show | jobs | submit login

I would strongly suggest not using Airflow if your company doesn't already... and that's coming from someone working at the company that made it.

1) It has no knowledge of data dependencies. This means task success is very primitive and you end up with a bunch of "polling" tasks waiting eg; for a table partition to land. 2) The UI constantly hangs and/or crashes 3) Airflow "workers" using Celery are rarely correctly given the right numbers of tasks. OOM-ing, etc. are all commonplace even if using Docker.

And many, many more.

I strongly suggest using Apache Beam or Argo w/ Kubernetes instead. They can scale quite a bit more and deal with long running tasks well.




I'm actually finding airflow + beam + kubernetes to be a really powerful combination. Especially if you're on GCP where airflow is managed by Cloud Composer, kubernetes is managed by GKE, and beam is managed by Dataflow.

Our airflow cluster _does_ almost nothing. Our operators just tell kubernetes to run containers with commands or dataflow to run some beam template and waits for the results from afar.

I love this setup but man do I agree with your points about the UI. For a product so powerful I can't believe how problematic it is.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: