In my early post, I showed you how to bring your own custom docker image in training on Azure Machine Learning (AML).
In this post, I’ll show you how to bring your custom docker image in model deployment and inferencing (online prediction).
AML Base Container Images
To make entry script work, the container image used in Azure Machine Learning (AML) deployment should include the AML inferencing assets, such as, Nginx, Flask server, application insights module, etc. (These can be configured with azureml-inference-server-http
module.)
When you want to bring your own custom docker image for inferencing (online prediction), the easiest way is then to build from AML base image in Dockerfile. (See here for the list of these maintained AML images.)
Note : When you don’t need entry script in AML managed online endpoint, you can also use image without AML assets (without building from AML base image). You can host your own custom containers, for such as, TF serving, Triton Inference Server, pre-built MLeap serving image, so on and so forth in AML.
See here for details.Note : For popular frameworks, you can also use pre-built docker images for deployment (inferencing) in AML. See here for pre-built AML container images.
Example
For instance, when you want to run TensorRT inferencing (online prediction) on AML, you can build your own image from AML minimal inference GPU base image (mcr.microsoft.com/azureml/minimal-ubuntu18.04-py37-cuda11.0.3-gpu-inference:latest
) as follows.
In this example, I setup TensorRT runtime to speed up inferencing on NVIDIA GPU.
(NVIDIA package repository has been already added in this image, since this image is built from nvidia/cuda:11.0.3-cudnn8-devel-ubuntu18.04
. See here for setting up NVIDIA repository and TensorRT manually.)
Dockerfile
FROM mcr.microsoft.com/azureml/minimal-ubuntu18.04-py37-cuda11.0.3-gpu-inference:latestUSER root:rootRUN apt-get update && \apt-get install -y --no-install-recommends \libnvinfer8 \python3-libnvinfer-devRUN pip install nvidia-pyindexRUN pip install nvidia-tensorrt
This image is built and registered as tsmatz/azureml-tensorrt:8.4.3 in Docker Hub, and you can then use this image for AML inferencing quickly.
See here (notebook) for AML deployment example with this image.
Note : TensorRT is also included within the Triton Inference Server container in NGC. You can use no-code deployment to run Triton Inference Server on AML. (See here.)
[Change Logs]
Aug 2022 Updated to meet AML CLI and Python SDK v2
Categories: Uncategorized
2 replies»