TensorFlow Serving logo

TensorFlow Serving

Model serving

TensorFlow Serving is a flexible, high-performance serving system for machine learning models designed for production environments.

Use it when

  • You want a serving framework dedicated to TensorFlow models.
  • You want to deploy a trained model as an endpoint.
  • You want efficient server architecture to serve a model to a large user pool simultaneously.
  • You want built-in model monitoring features.
  • You want built-in model versioning features.
  • You want to optimize hardware utilization by batching requests to a served model.
  • You want a REST and gRPC API endpoint to the served model.
  • You want support for exporting metrics to Prometheus.

Watch out

  • There is no way to ensure zero downtime when updating new models or existing ones.

Example stacks

Airflow + MLflow stack


echo "deb [arch=amd64] http://storage.googleapis.com/tensorflow-serving-apt stable tensorflow-model-server tensorflow-model-server-universal" | sudo tee /etc/apt/sources.list.d/tensorflow-serving.list
curl https://storage.googleapis.com/tensorflow-serving-apt/tensorflow-serving.release.pub.gpg | sudo apt-key add -
apt-get update && apt-get install tensorflow-model-server