Use it when
- You want to visualize pipelines running in production.
- You want to generate pipelines as Python code.
- You want to run long-running processes like dumping data to and from databases or ML algorithms.
- You want to create linear task pipelines that share input and output information.
- You want target-based workflows that are created as pipelines of tasks that share input and output information.
- You want to configure pipelines as Python code.
- You want failure recovery features that allow recovering failed tasks without re-running the whole pipeline.
- You want an insightful visualizer.
- You want GUI shows the status of the tasks.
Watch out
- Hard to test.
- The central scheduler makes it challenging to parallelize tasks.
- Works better with linear tasks where one task output is another task input. Lots of branches and forks can slow the runtime a lot.
- There is no trigger, and pipelines won't start when all files are in place. You need a process (cronjob) to check that files are in place and start the pipeline.
Example stacks
Airflow + MLflow stack
Installation
pip install luigi