Azure OpenAI Safety through Content Filtering
- Theory
- Actual deployment
Theory:
The content filtering system detects and takes action on specific categories of potentially harmful content in both input prompts and output completions.
Azure OpenAI Service includes a content filtering system that works alongside core models. This system works by running both the prompt and completion through an ensemble of classification models aimed at detecting and preventing the output of harmful content.
In addition to the content filtering system, the Azure OpenAI Service performs monitoring to detect content and/or behaviors that suggest use of the service in a manner that may violate applicable product terms.
The content filtering system integrated in the Azure OpenAI Service contains neural multi-class classification models aimed at detecting and filtering harmful content; the models cover four categories (hate, sexual, violence, and self-harm) across four severity levels (safe, low, medium, and high). Content detected at the ‘safe’ severity level is labeled in annotations but isn’t subject to filtering and isn’t configurable.
Categories:
Severity levels:
Actual deployment:
Go to the Azure OpenAI Studio. You will find the Content filters section.
Click on create customised content filter.
As shown in Figure 1, enter the name of the configuration. Click on the sections which are green to change it to “Filter” mode. Save the configuration.
It will now look like as shown in Figure 2.
Now, it is time to use it for your Azure OpenAI model deployment.
Go to Deployments under Management section. Choose your deployment as shown below. It shows that Content Filtering is Default. It means there is no custom configuration applied yet.
Click on edit deployment for your model and go to Advanced options as shown below:
Choose your custom configuration from the drop down menu. Save and close. Validate that it has been applied. It now shows the name of the configuration we created under the Content Filtering tab.
That’s it. Your Azure OpenAI is now “Safe”.
Thanks.
Vishal Anand.