AI observability in practice
Many organizations start with good intentions and build promising AI solutions, but these initial applications often become disconnected and unobservable. For example, predictive maintenance systems and GenAI docsbots can spread by operating in different domains. AI observability refers to the ability to monitor and understand the functionality of generative and predictive AI machine learning models throughout their lifecycle within an ecosystem. This is very important in areas such as machine learning operations (MLOps), especially large language model operations (LLMOps).
AI Observability aligns with DevOps and IT operations to ensure that generative and predictive AI models integrate seamlessly and function properly. This allows you to track metrics, performance issues, and outputs generated by AI models, providing a comprehensive view through your organization’s observability platform. We also build teams to build even better AI solutions over time by storing and labeling production data to predict or fine-tune generative models. This continuous retraining process helps maintain and improve the accuracy and efficiency of AI models.
However, it is not without difficulties. Now, as architectures, users, databases, and models “scale up,” setup times are longer, multiple pieces of infrastructure and modeling need to be linked together, putting pressure on operations teams, and requiring more effort for ongoing maintenance and updates. Addressing proliferation is impossible without an open and flexible platform that acts as an organization’s centralized command and control center to manage, monitor, and control the entire AI environment at scale.
Most companies don’t stick to just one infrastructure stack and may want to switch things up in the future. What really matters to them is that AI production, governance, and monitoring remain consistent.
DataRobot is committed to observability across environments: cloud, hybrid, and on-premises. In terms of AI workflows, this means you can choose where and how to develop and deploy your AI projects, while maintaining full insight and control over them, even at the edge. It’s like you can see everything from a 360-degree angle.
DataRobot provides 10 key, ready-to-use components to achieve a successful AI observability implementation.
- Metric monitoring: Track performance metrics and troubleshoot issues in real time.
- Model Management: Use tools to monitor and manage models throughout their lifecycle.
- image: Provides a dashboard for insight and analysis of model performance.
- automation: Automate the building, governance, deployment, monitoring, and retraining phases of the AI lifecycle for a seamless workflow.
- Data quality and explainability: Ensure data quality and explain model decisions.
- Advanced Algorithms: Use out-of-the-box metrics and protections to improve model functionality.
- User experience: Improve user experience through both GUI and API flows.
- AIOps and Integrations: Integrate with AIOps and other solutions for unified management.
- API and telemetry: Use APIs for seamless integration and telemetry data collection.
- Walkthrough and workflow: Building a supporting ecosystem around AI observability And then take action on what is being observed.
AI Observability in Action
Every industry implements GenAI Chatbot across a variety of functions for different purposes. Examples include increased efficiency, improved quality of service, and accelerated response times.
Let’s take a look at deploying a GenAI chatbot within your organization and discuss how to achieve AI observability using an AI platform like DataRobot.
Step 1: Gather relevant tracking and metrics
DataRobot and its MLOps capabilities provide world-class scalability for model deployment. Oversee and manage models across your organization from a single platform, regardless of where they are deployed. In addition to DataRobot models, open source models deployed outside of DataRobot MLOps can also be managed and monitored on the DataRobot platform.
AI observability within the DataRobot AI platform ensures organizations understand when problems occur, why they occur, and can intervene to continuously optimize the performance of AI models. By tracking services, drift, forecast data, training data, and custom metrics, companies can keep their models and predictions relevant in a rapidly changing world.
Step 2: Data Analysis
DataRobot lets you leverage pre-built dashboards to monitor existing data science metrics or tailor custom metrics to address specific aspects of your business.
These custom metrics can be developed from scratch or using DataRobot templates. Use these metrics for models built or hosted inside or outside of DataRobot.
‘Swift rejection’ The metric indicates the percentage of chatbot responses that LLM cannot process. These metrics provide valuable insight, but what businesses really need are actionable steps to minimize them.
Guidance Questions: Please answer the following questions to help us gain a more comprehensive understanding of the factors that influence immediate rejection.
- Does the LLM have the appropriate structure and data to answer the questions?
- Are there patterns in the types of questions, keywords, or topics that an LLM cannot address or that it struggles with?
- Is there a feedback mechanism to collect user input on the chatbot’s responses?
Use feedback loop: We can answer these questions by implementing a usage-feedback loop and building applications that find “hidden information.”
Below is an example of a Streamlit application that provides insight into a sample of user questions and topic clusters for questions that LLM cannot answer.
Step 3: Take action based on analysis
Now that you know your data, you can take the following steps to significantly improve the performance of your chatbot.
- Modify the prompt. For better and more accurate results, try different system prompts.
- Improve your vector database: Identify the questions your LLM fails to answer, add this information to your knowledge base, and then retrain your LLM.
- Fine-tune or replace LLM: Experiment with different configurations to fine-tune your existing LLM for optimal performance.
Or, evaluate different LLM strategies and compare their performance to determine if a replacement is needed.
- Real-time adjustments or setting the correct guard model: We combine each generative model with a predictive AI guard model that evaluates the output quality and filters out inappropriate or irrelevant questions.
This framework is broadly applicable across use cases where accuracy and truthfulness are paramount. DR provides a control layer that can take data from external applications, secure it with predictive models hosted inside or outside Datarobot or NeMo guardrails, and call external LLMs for predictions.
Following these steps will ensure a 360° view of all AI assets in production and ensure your chatbot remains effective and reliable.
summary
AI observability is essential to ensure: Effective and stable performance Build AI models across your organization’s ecosystem. By leveraging the DataRobot platform, enterprises maintain comprehensive oversight and control over their AI workflows, ensuring consistency and scalability.
Implementing strong observability practices will not only help you identify and prevent issues in real time, but it will also help you continuously optimize and improve your AI models, ultimately helping you create useful and secure applications.
By leveraging the right tools and strategies, organizations can navigate the complexities of AI operations and leverage the full potential of their AI infrastructure investments.
About the author
Atalia Horenshtien is Director of Global Technical Product Support at DataRobot. She plays a key role as lead developer of DataRobot’s technology market story, working closely with product, marketing and sales. As a former customer-facing data scientist at DataRobot, Atalia is a trusted advisor on AI, working with clients across a variety of industries, solving complex data science problems, and helping them realize business value across their organizations.
Whether speaking to customers and partners or speaking at industry events, she champions the DataRobot story and helps how to use the DataRobot platform to adopt AI/ML across your organization. Some speaking sessions on various topics such as MLOps, time series forecasting, sports projects, use cases across different industries at industry events such as AI Summit NY, AI Summit Silicon Valley, Marketing AI Conference (MAICON) and partner events such as Snowflake Summit; Google Next, masterclasses, joint webinars, etc.
Atalia holds a bachelor’s degree in Industrial Engineering and Management and two master’s degrees: an MBA and Business Analytics.
Meet Atalia Horenshtien
Aslihan Buner is Senior Product Marketing Manager for AI Observability at DataRobot, where he builds and executes go-to-market strategies for LLMOps and MLOps products. She works with product management and development teams to identify key customer needs by strategically identifying and implementing messaging and positioning. Her passion is bridging market gaps, solving problems across all industries and connecting them to solutions.
Meet Aslihan Buner
Kateryna Bozhenko is an AI Production Product Manager at DataRobot with extensive experience building AI solutions. With degrees in International Business and Healthcare Management, she is passionate about helping users put AI models to work effectively to maximize ROI and experience the true magic of innovation.
Meet Kateryna Bozhenko