Introduction:
Azure AI Foundry, combined with AZ Prompt Flow, provides a robust framework for building and deploying AI-driven applications. By following this guide, you can efficiently develop AI workflows, deploy them to a managed online endpoint using either the Azure ML CLI or the Python SDK, and integrate them with various services for real-time inference.
Key Benefits of Azure AI Foundry
- Fully managed: Azure handles the infrastructure and management of your AI deployment, reducing operational overhead
- Optimized AI workflows: Streamlined processes for developing, deploying, and managing AI models, with built-in automation features
- Built-in inferencing: Native capabilities for running trained models and generating predictions from new data
- Seamless AI model integration: Easily incorporate your own models or pre-trained ones into the Foundry environment
Once your Prompt Flow is tested and validated, you can deploy it to Azure AI Foundry for real-time inferencing.
Inferencing is the process of applying new input data to a machine learning model to generate outputs.
Deployment Approaches
As told before, there are two primary methods for deploying your Prompt Flow to Azure AI Foundry:
1. Using Azure ML CLI
2. Using Python SDK:
In this blog, I will explain the approach #1: Using Azure ML CLI.
Step-by-Step Process for Azure ML CLI Deployment
The following steps will guide you through deploying a flow as a model in Azure ML, creating an online endpoint, and configuring deployments. This assumes you have tested your flow locally and set up all necessary dependencies, including the Azure ML workspace and required connections.
Azure ML CLI Approach
Pre-requisite :
1 Install Azure CLI and ML extension
>az extension add --name ml --yes
Use the below CLI to validate AZ ML is installed correctly
>az extension show --name ml
2 Make sure you have created the connection used in the flow in your Azure ML workspace
Deployment Steps
1. Registering a Machine Learning Flow as a Model in Azure ML
This command registers a defined machine learning flow as a managed model within Azure ML. This enables versioning, deployment, and MLOps capabilities for the flow.
Define the model metadata in a model.yaml file.
This file describes the model's name, version, and location.
> az ml model create --file honda-prod-model.yaml
Sample YAML for reference :
2. Creating an Online Endpoint for Real-time Inference
> az ml online-endpoint create --file honda-prod-endpoint.yaml
This command registers the endpoint with Azure ML and provisions the necessary infrastructure to handle incoming requests.
The honda-prod-endpoint.yaml file contains all the configuration details for your endpoint, including the name, authentication mode, and compute specifications.
After the endpoint is successfully created, you'll receive a response with the endpoint details, including its scoring URI. You can then proceed with deploying your model to this endpoint and configuring traffic distribution to optimize performance.
Sample YAML for reference :
3. Creating an Online Deployment in Azure ML
An online deployment in Azure Machine Learning (Azure ML) is a containerized environment where your model runs and serves real-time predictions. Each deployment is associated with an online endpoint, and multiple deployments can be managed under the same endpoint to support A/B testing, versioning, or gradual rollouts.
Deploying with 0% Traffic
To create a deployment without initially routing any traffic to it, use the following command:
>az ml online-deployment create --file honda-deployment.yaml
Deploying with 100% Traffic
Once the deployment is verified and tested, you can direct all traffic to it using:
>az ml online-deployment create --file honda-deployment.yaml --all-traffic
Sample YAML for reference :
Test the Deployed Model by Invoking the Endpoint
Use the below command to test whether the deployments are working:
>az ml online-endpoint invoke --name honda-chat-endpoint --request-file sample-request.json
Here's an example of what your sample-request.json might look like:
{
"input_data": {
"input_string": "What are the maintenance intervals for a 2024 Honda Civic?"
}
}
You can also test the endpoint using other tools like Postman or curl. When using these tools, you'll need:
The scoring URI (available from the endpoint details)
An authentication key or token
Properly formatted request payload
Conclusion
Using the Azure ML CLI provides a straightforward way to deploy your Prompt Flow to Azure AI Foundry. The command-line approach offers flexibility and can be easily incorporated into CI/CD pipelines for automated deployments.
Reference : https://github.com/Azure/azureml-examples