lakeFS logo


Data versioning

LakeFS is Git for object storage buckets that work with any data size.

Use it when

  • You want to transform your object storage into a Git-like repository.
  • You want to manage your data lake like you manage your code.
  • You want a data-format-agnostic tool.
  • You want to store metadata in a relational database to avoid duplicating data.
  • You want to isolate data in a Data Lake without copying it.
  • You want to manage CI/CD of your data.
  • You want a tool that works seamlessly with data frameworks like Airflow, Spark, Kafka, Presto Delta Lake, Databricks, etc.

Watch out

  • There is no straightforward way to delete files. The actual file deletion works through retention policies.
  • LakeFS has no support for federated identities.
  • Data usage auditing may be problematic as underlying object storage sees connections from the lakeFS gateway user.

Example stacks

Airflow + MLflow stack