Use it when
- You want a serving framework that supports a wide range of ML frameworks.
- You want an end-to-end model serving solution which provides a model API server, model packaging, management, deployment automation, and offline batch serving features.
- You want to do preprocessing and post-processing in serving endpoints.
- You want built-in model monitoring features.
- You wand support for adaptive micro-batching.
- You want model registry features through integration with Yatai.
- You want to run on Google Colab.
Watch out
- Currently, there is no multi-language support. Only Python is supported.
- BentoML does not handle horizontal scaling. Users have to separately build Kubernetes-based solutions or use cloud platforms like AWS Lambda, AWS ECS, and Google Cloud Run to scale served models.
Example stacks
Airflow + MLflow stack
Installation
pip install bentoml