Use it when
- You want to visualize pipelines running in production.
- You want to generate pipelines dynamically as Python code.
- You want ETL pipelines to extract batch data from multiple sources and run data transformations.
- You want to automate machine learning model training.
- You want to create workflows as DAGs (Direct Acyclic Graphs), where each node is a task.
- You want to configure pipelines as Python code.
- You want to run a task on multiple workers managed by Celery, Dask, or Kubernetes.
- You want to define time limits for tasks or workflows to highlight anomalies or inefficiencies.
- You want a developer-friendly environment and UI.
Watch out
- Orchestrates batch workflows that rely on time-based scheduling.
- Airflow doesn't manage event-based jobs.
- Airflow does not offer versioning for pipelines.
- Requires significant user customization to work safely in production workloads.
- By default, Airflow uses SQLite, which may experience data losses in production.
- By default, Airflow runs one task at a time with a serial executor.
Example stacks
Airflow + MLflow stack
Installation
pip install apache-airflow