Multi-modal Multi-agentic Copilot Framework on Azure

5 min readJun 14, 2024

Structure:

Introduction
Agentic Copilot
Architecture Diagram
Technologies Used
LLM Used
Practical Hands-on
Integration
Conclusion

Introduction:

The Multi-modal Multi-agentic Copilot Framework on Azure is an advanced system designed to leverage the power of generative AI, large language models, assistants, and cloud computing to enhance productivity and decision-making. This framework integrates multiple modalities of large language models, such as text, voice, and visual inputs, allowing for a seamless and intuitive user experience. Further, by incorporating various AI agents, each specialized in different tasks and domains, this framework can efficiently manage and process diverse data types, providing users with comprehensive insights and solutions. Azure’s cloud infrastructure ensures scalability, security, and real-time processing capabilities, making the multi-modal multi-agentic assistants based copilot framework a versatile and powerful tool for enterprises looking to optimize their operations and drive innovation. Through this integrated system, users benefit from a cohesive and dynamic environment where AI-driven assistance is tailored to meet complex, multi-faceted needs, ultimately fostering a more intelligent and responsive digital ecosystem.

Agentic Copilot:

The agentic copilot is an advanced AI assistant that integrates function, code interpretation, and data ingestion to provide a highly customized user experience, leveraging a large language model capable of processing multiple modalities. The function component is crucial as it defines the specific tasks and operations the assistant can perform, ensuring it meets the precise needs of the user. The code interpreter is essential for executing dynamic code snippets, enabling real-time problem-solving and automation of complex workflows. The bring-your-data option enhances the assistant’s utility by allowing users to input their datasets, which the copilot can analyze and use to generate insights and recommendations. Instruction to dictate the assistant’s behaviour is pivotal, allowing users to tailor its responses and actions to align with their unique requirements and preferences. This combination of functionalities ensures that the agentic copilot is not just a reactive tool but a proactive partner in productivity, capable of adapting to and evolving with the user’s needs.

Architecture Diagram:

Technologies Used:

Azure OpenAI Studio, Assistants (in preview mode) with functions, code interpreters, instructions and data.

LLM Used:

Azure OpenAI GPT4o (multi-modal model) and GPT4.

Practical Hands-on:

Step1: Deployed Azure OpenAI GPT4o model (multi-modal) with required quota in the AI studio. Following which, setup the agentic copilot using the assistant’s features (agent based).

As shown in the figure 1.1, this setup included creating a function with python code to call an AppMod (application modernization) engineer based on the text based request. Enabled the code interpreter. Ingested the PDF file with CRMA (sample application) application code analysis report. Setup the instruction for assistant to behave like an AppMod analyzer.

Step2: Saved this agentic copilot leveraging assistant as shown below.

Step3: Started working with the agentic copilot as shown below:

Step4: As shown in figure 1.4, it gave an error related to the rate limit exceeded.

Step5: Edited the deployment, increased the rate limit from 10K to 198K.

Step6: Updated the deployment with the new rate limit as shown in figure 1.6.

The problem was fixed and it started working as shown in figure 1.7.

Step7: It executed the code internally by itself through code interpreter (enabled) to identify the file type, analyze the same and also decide on the alternate ways to read the content.

Step8: Code interpreter created its function to extract text from each of the page of the PDF file as shown in figure 1.8

Figure 1.8: Code interpreter decision making

Step9: Output completion resulted into the summary of CRMA application (sample) as shown from figure 1.9 to figure 1.13.

Figure 1.9 CRMA application summary snippet 1

Figure 1.10 CRMA application summary snippet 2

Figure 1.11: CRMA application summary snippet 3

Figure 1.12: CRMA application summary snippet 4

Figure 1.13: CRMA application summary snippet 5

Step10: Function execution

Asked to connect to AppMod engineer through natural language chat and it called the function which was created as part of the setup with python code. As we see in figure 1.14, it called the function to connect to the an engineer.

Figure 1.14: Function execution through chat

Step11: Further, another agentic copilot using assistant and deployment of a different LLM (GPT4) was created. Here is the list of the two assitants with different functions and data.

Figure 1.15: Two agentic copilots using assistant with different functions

Integration:

The studio for this deployment from the portal, provides two keys, an endpoint and deployment name which can be leveraged for integrating it with external applications.

Conclusion:

This agentic copilot framework is a combination of multiple advanced AI assistants that combines specific functions, code interpretation, and data handling capabilities, leveraging large language models for multi-modal processing. Its ability to execute multiple dynamic tasks, analyze users-provided data, and adapt behavior through tailored instructions makes it a versatile and proactive tool, enhancing productivity and meeting diverse users’ needs effectively.

Disclaimer: Views are my own and personal ones.

Feel free to clap, comment and share.

Regards,

Vishal Anand.