Use it when
- You want to create parallel-processing pipelines for either batch or streaming data.
- You want to use Java, Python, or Go SDKs for programming.
- You want to use runners that support Spark, Flink, Samza, Google Cloud Dataflow, Hazelcast Jet, Twister2 backends.
- You want portability by clearly separating programming and runtime layers (data processing engine agnostic).
- You want autoscaling of the resources.
- You want an easy-to-maintain codebase.
Watch out
- Beam capabilities are limited to which runners are used for execution.
Example stacks
Airflow + MLflow stack
Installation
pip install apache-beam