Apache Beam logo

Apache Beam

Pipeline orchestration

Apache Beam is a batch and streaming data processing tool for mission-critical production workloads.

Use it when

  • You want to create parallel-processing pipelines for either batch or streaming data.
  • You want to use Java, Python, or Go SDKs for programming.
  • You want to use runners that support Spark, Flink, Samza, Google Cloud Dataflow, Hazelcast Jet, Twister2 backends.
  • You want portability by clearly separating programming and runtime layers (data processing engine agnostic).
  • You want autoscaling of the resources.
  • You want an easy-to-maintain codebase.

Watch out

  • Beam capabilities are limited to which runners are used for execution.

Example stacks

Airflow + MLflow stack


pip install apache-beam