Use it when
- You want a framework for pipelining both data engineering- and data science-related tasks.
- You need a data science framework that supports collaboration in a single code base.
- You want to generate pipelines as Python code.
- You want to visualize data pipelines.
- You want to execute tasks in parallel efficiently.
- You want to manage data in data catalogs.
Watch out
- Data catalogs are difficult to implement when the existing data workflow is non-structured with flat file data and manual file movement.
Example stacks
Airflow + MLflow stack
Installation
pip install kedro