BentoML logo


Model registry
Model serving

BentoML is an open platform that simplifies ML model deployment and enables you to serve your models at a production scale in minutes.

Use it when

  • You want a serving framework that supports a wide range of ML frameworks.
  • You want an end-to-end model serving solution which provides a model API server, model packaging, management, deployment automation, and offline batch serving features.
  • You want to do preprocessing and post-processing in serving endpoints.
  • You want built-in model monitoring features.
  • You wand support for adaptive micro-batching.
  • You want model registry features through integration with Yatai.
  • You want to run on Google Colab.

Watch out

  • Currently, there is no multi-language support. Only Python is supported.
  • BentoML does not handle horizontal scaling. Users have to separately build Kubernetes-based solutions or use cloud platforms like AWS Lambda, AWS ECS, and Google Cloud Run to scale served models.

Example stacks

Airflow + MLflow stack


pip install bentoml