Use it when
- You want to transform your object storage into a Git-like repository.
- You want to manage your data lake like you manage your code.
- You want a data-format-agnostic tool.
- You want to store metadata in a relational database to avoid duplicating data.
- You want to isolate data in a Data Lake without copying it.
- You want to manage CI/CD of your data.
- You want a tool that works seamlessly with data frameworks like Airflow, Spark, Kafka, Presto Delta Lake, Databricks, etc.
Watch out
- There is no straightforward way to delete files. The actual file deletion works through retention policies.
- LakeFS has no support for federated identities.
- Data usage auditing may be problematic as underlying object storage sees connections from the lakeFS gateway user.
Example stacks
Airflow + MLflow stack