Use it when
- You want serverless inferencing on Kubernetes.
- You want support for TensorFlow, XGBoost, scikit-learn, PyTorch, and ONNX.
- You want to autoscale GPU instances (up to scale to zero).
- You want pre-built Docker images for frameworks to get models in production.
- You want to do preprocessing and post-processing of data.
- You want built-in model monitoring features with Prometheus.
- You want out-of-the-box Istio integration.
- You want built-in canary deployments.
Watch out
- The default serving method is HTTP-based. Non-JSON input/outputs require a custom transformer and implementation.
- It does not support A/B tests and Mac authentication bypass out-of-the-box.
- KServe deploys one model per inference, limiting scalability to the available CPUs and GPUs.
Example stacks
Airflow + MLflow stack
Installation
curl -s "https://raw.githubusercontent.com/kserve/kserve/release-0.8/hack/quick_install.sh" | bash