Luigi logo

Luigi

(0)
Pipeline orchestration

Luigi is a Python package that helps you build complex pipelines of batch jobs.

Use it when

  • You want to visualize pipelines running in production.
  • You want to generate pipelines as Python code.
  • You want to run long-running processes like dumping data to and from databases or ML algorithms.
  • You want to create linear task pipelines that share input and output information.
  • You want target-based workflows that are created as pipelines of tasks that share input and output information.
  • You want to configure pipelines as Python code.
  • You want failure recovery features that allow recovering failed tasks without re-running the whole pipeline.
  • You want an insightful visualizer.
  • You want GUI shows the status of the tasks.

Watch out

  • Hard to test.
  • The central scheduler makes it challenging to parallelize tasks.
  • Works better with linear tasks where one task output is another task input. Lots of branches and forks can slow the runtime a lot.
  • There is no trigger, and pipelines won't start when all files are in place. You need a process (cronjob) to check that files are in place and start the pipeline.

Example stacks

Airflow + MLflow stack

Installation

pip install luigi

Reviews