Use it when
- You want a serving framework dedicated to TensorFlow models.
- You want to deploy a trained model as an endpoint.
- You want efficient server architecture to serve a model to a large user pool simultaneously.
- You want built-in model monitoring features.
- You want built-in model versioning features.
- You want to optimize hardware utilization by batching requests to a served model.
- You want a REST and gRPC API endpoint to the served model.
- You want support for exporting metrics to Prometheus.
Watch out
- There is no way to ensure zero downtime when updating new models or existing ones.
Example stacks
Airflow + MLflow stack
Installation
echo "deb [arch=amd64] http://storage.googleapis.com/tensorflow-serving-apt stable tensorflow-model-server tensorflow-model-server-universal" | sudo tee /etc/apt/sources.list.d/tensorflow-serving.list
curl https://storage.googleapis.com/tensorflow-serving-apt/tensorflow-serving.release.pub.gpg | sudo apt-key add -
apt-get update && apt-get install tensorflow-model-server