KServe logo


Model serving

KServe is a highly scalable and standards-based model inference platform on Kubernetes.

Use it when

  • You want serverless inferencing on Kubernetes.
  • You want support for TensorFlow, XGBoost, scikit-learn, PyTorch, and ONNX.
  • You want to autoscale GPU instances (up to scale to zero).
  • You want pre-built Docker images for frameworks to get models in production.
  • You want to do preprocessing and post-processing of data.
  • You want built-in model monitoring features with Prometheus.
  • You want out-of-the-box Istio integration.
  • You want built-in canary deployments.

Watch out

  • The default serving method is HTTP-based. Non-JSON input/outputs require a custom transformer and implementation.
  • It does not support A/B tests and Mac authentication bypass out-of-the-box.
  • KServe deploys one model per inference, limiting scalability to the available CPUs and GPUs.

Example stacks

Airflow + MLflow stack


curl -s "https://raw.githubusercontent.com/kserve/kserve/release-0.8/hack/quick_install.sh" | bash