We’re excited to announce several updates to the Azure AI toolchain that will help developers quickly build custom AI solutions with more choice and flexibility.
AI is transforming every industry and creating new opportunities for innovation and growth. However, developing and deploying AI applications at scale requires a powerful and flexible platform that can handle the complex and diverse needs of modern businesses and create solutions based on organizational data. That’s why we’re excited to announce several updates to help developers leverage the Azure AI toolchain to quickly create custom AI solutions with greater choice and flexibility.
- Serverless fine-tuning for Phi-3-mini and Phi-3-medium models It enables developers to quickly and easily customize models for cloud and edge scenarios without having to provision compute.
- Phi-3-mini Update include Significant improvements in core quality, instruction execution and structured output.This allows developers to build better performing models without additional cost.
- Earlier this month, OpenAI’s latest model (GPT-4o mini) shipped the same day. Meta (Rama 3.1 405B)Mistral (large 2) We introduced Azure AI to give our customers more choice and flexibility.
Creating value through model innovation and customization
In April, we introduced the Phi-3 family of small, open-source models developed by Microsoft. The Phi-3 models are the most capable and cost-effective small language models (SLMs) available, outperforming models of the same size and larger. Fine-tuning small models without sacrificing performance is a great alternative when developers want to tailor AI solutions to meet specific business requirements and improve response quality. Starting today, developers can Fine-tuning Phi-3-mini and Phi-3-medium Leverage data to build AI experiences that are more relevant, safer, and more cost-effective for your users.
Given its small compute footprint, cloud and edge compatibility, the Phi-3 model is well suited for fine-tuning to improve baseline model performance in a variety of scenarios, including learning new skills or tasks (e.g., tutoring), or improving the consistency and quality of responses (e.g., tone or style of responses in chat/Q&A). We are already seeing Phi-3 applied to new use cases.
Microsoft and Khan Academy are working together to improve solutions for teachers and students around the world. As part of the collaboration, Khan Academy is using Azure OpenAI Services to power Khanmigo for Teachers, a pilot AI-based teacher assistant for educators in 44 countries, and is experimenting with Phi-3 to improve math tutoring. Khan Academy recently published a research paper highlighting how different AI models perform when assessing mathematical accuracy in tutoring scenarios, including benchmarking a fine-tuned version of Phi-3. Early data shows that when students make math errors, Phi-3 outperforms most other leading generative AI models in correcting and identifying student mistakes.
And we fine-tuned Phi-3 for this device. In June, we launched Phi Silica to give developers a robust and reliable model for building apps that deliver safe and secure AI experiences. Phi Silica is based on the Phi family of models and is designed specifically for the NPU of the Copilot+ PC. Microsoft Windows is the first platform with a state-of-the-art small language model (SLM) tailored for the neural processing unit (NPU) and the delivery inbox.
Today, you can fine-tune a Phi-3 model on Azure AI.
We are also excited to share that Azure AI’s Models-as-a-Service (serverless endpoint) capability is now generally available. Additionally, Phi-3-small is now available via serverless endpoints. Developers can quickly and easily get started with AI development without having to manage the underlying infrastructure. Phi-3-vision, a multimodal model in the Phi-3 family, was announced at Microsoft Build and is available through the Azure AI Model Catalog, and will soon be available through serverless endpoints. Phi-3-small (7B parameters) is available in two context lengths: 128K and 8K, while Phi-3-vision (4.2B parameters) is also optimized for chart and diagram comprehension and can be used to generate insights and answer questions.
We are seeing a tremendous response from the community to Phi-3. We Update for Phi-3-mini Last month, there were significant improvements in core quality and instruction execution. The model was retrained to significantly improve instruction execution and support for structured output. It also improved multi-turn conversation quality. <|system|> Introduced support for prompts and greatly improved inference capabilities.
The table below highlights improvements in the areas of following directions, structured output, and inference.
Benchmark | pie-3-mini-4k | pie-3-mini-128k | ||
Released on April ’24 | June ’24 Update | Released on April ’24 | June ’24 Update | |
Instructions Extra Hard | 5.7 | 6.0 | 5.7 | 5.9 |
Instruction Hard | 4.9 | 5.1 | 5 | 5.2 |
Output JSON structure | 11.5 | 52.3 | 1.9 | 60.1 |
XML structure output | 14.4 | 49.8 | 47.8 | 52.9 |
GPQA | 23.7 | 30.6 | 25.9 | 29.7 |
korean: | 68.8 | 70.9 | 68.1 | 69.7 |
average | 21.7 | 35.8 | 25.7 | 37.6 |
We are also continuously improving Phi-3 safety. A recent research paper highlighted Microsoft’s iterative “break-fix” approach to improving the safety of the Phi-3 model, which involved multiple rounds of testing and improvement, red teaming, and vulnerability identification. This approach significantly reduced harmful content by 75% and improved the model’s performance on responsible AI benchmarks.
Expand your model selection with over 1,600 models available in Azure AI.
With Azure AI, we are committed to providing our customers with the most comprehensive set of open and frontier models and cutting-edge tooling to help them meet their unique cost, latency, and design requirements. Last year, we launched the Azure AI Model Catalog, which now has the broadest range of models with over 1,600 models from vendors including AI21, Cohere, Databricks, Hugging Face, Meta, Mistral, Microsoft Research, OpenAI, Snowflake, Stability AI, and more. This month, we added OpenAI’s GPT-4o mini, Meta Llama 3.1 405B, and Mistral Large 2 via Azure OpenAI Service.
Building on today’s momentum, we are excited to announce that Cohere Rerank is now available on Azure. Access to Cohere’s enterprise-ready language models on the powerful infrastructure of Azure AI enables enterprises to seamlessly, reliably, and securely integrate cutting-edge semantic search technologies into their applications. This integration allows users to combine the flexibility and scalability of Azure with Cohere’s high-performance and efficient language models to deliver superior search results in production.
TD Bank Group, one of the largest banks in North America, recently signed an agreement with Cohere to explore its extensive suite of language models (LLMs), including Cohere Rerank.
At TD, we see the transformative potential of AI to deliver more personalized and intuitive experiences for our customers, colleagues and communities, and we’re excited to work with Cohere to learn how language models perform on Microsoft Azure and support our transformation journey across the bank.”
Kirsty Rasin, Vice President, AI Technology Officer, TD.
Atomicworks, a digital workplace experience platform and long-time Azure customer, has significantly enhanced its IT service management platform with Cohere Rerank. Atomicworks has integrated this model into its AI digital assistant, Atom AI, to improve search accuracy and relevance, and provide faster, more accurate answers to complex IT support queries. This integration streamlines IT operations and improves productivity across the enterprise.
Atomicworks’ digital workplace experience solution is powered by Cohere’s Rerank model and Azure AI Studio, which gives our digital assistant, Atom AI, the precision and performance it needs to deliver real-world results. This strategic collaboration underscores our commitment to providing enterprises with advanced, secure, and trusted enterprise AI capabilities.”
Vijay Rayapati, CEO of Atomicworks
Cohere’s flagship generative model, Command R+, which is also available in Azure AI, is specifically built to work well with Cohere Rerank within the Retrieval Augmented Generation (RAG) system. Together, the two can handle some of the most demanding enterprise workloads in production.
Earlier this week, we announced that the latest fine-tuned Llama 3.1 models, including Meta Llama 3.1 405B and 8B and 70B, are now available via Azure AI’s serverless endpoint. Llama 3.1 405B can be used for advanced synthetic data generation and distillation, with 405B-Instruct acting as a teacher model and 8B-Instruct/70B-Instruct models acting as student models. You can read more about this announcement here.
With Mistral Large 2 now available on Azure, Azure becomes the first leading cloud provider to offer this next-generation model. Mistral Large 2 outperforms its predecessors in coding, inference, and agent behavior, and is on par with other leading models. Additionally, Mistral Nemo, developed in collaboration with NVIDIA, delivers a powerful 12B model that pushes the boundaries of language understanding and generation. Learn more.
And last week, we brought GPT-4o mini to Azure AI, along with other updates to the Azure OpenAI Service, enabling customers to expand the scope of their AI applications with improved security and data distribution options at lower cost and latency. We will be announcing more capabilities for GPT-4o mini in the coming weeks. We are also excited to introduce a new capability to deploy chatbots built with the Azure OpenAI Service to Microsoft Teams.
Realizing AI innovation safely and responsibly
At Microsoft, building AI solutions responsibly is at the heart of AI development. Microsoft has a robust set of capabilities to help organizations measure, mitigate, and manage AI risk across the entire AI development lifecycle for existing machine learning and generative AI applications. Azure AI Assessments allows developers to iteratively assess the quality and safety of their models and applications using built-in and custom metrics to inform mitigations. Additional Azure AI content safety features, including rapid protection and protected material detection, are now “on by default” in the Azure OpenAI service. These features can be leveraged as content filters on all foundational models in the model catalog, including Phi-3, Llama, and Mistral. Developers can also easily integrate these features into their applications through a single API. Once in production, developers can monitor the quality and safety of their applications, adversarial rapid attacks, and data integrity, enabling timely intervention with the help of real-time alerts.
Azure AI uses HiddenLayer Model Scanner to scan third-party and open models for emerging threats such as cybersecurity vulnerabilities, malware, and other signs of tampering, and then onboards them into the Azure AI model catalog. Validation of Model Scanner results provided within each model card gives developer teams greater confidence when selecting, fine-tuning, and deploying open models for their applications.
We continue to invest across the Azure AI stack to deliver cutting-edge innovations to our customers, enabling them to build, deploy, and scale AI solutions safely and confidently. We’re excited to see what you build next.
Get the latest news on Azure AI
- Watch this video to learn more about the Azure AI Model Catalog.
- Listen to a podcast about Phi-3 with Sebastien Bubeck, Principal Scientist at Microsoft.