All coupons / Development

LLM AI Agent Evaluations and Observability with Galileo AI

Course Description

Important note: Please click the video for more information. This course is hands-on and practical, designed for developers, AI engineers, founders, and teams building real LLM systems and AI agents. It’s also ideal for anyone interested in LLM observability and AI evaluations and who wants to apply these skills to future agentic apps. You should have some knowledge in AI agents and how they are built. Note this is the complete guide to AI Observability and Evaluations. We go both into theory and practice, using Galileo AI as the AI Agent / LLM monitoring platform. Learners also get access to all resources and the GitHub code / notebooks used in the course. Why does LLM Observability and Evaluations Matter? LLMs are powerful, but they are unpredictable. They hallucinate, they fail silently, they behave differently across prompts and versions. There is a big difference between building an AI agentic / LLM system and actually "productionalizing" it. What if the LLM starts producing offensive content? What if tools embedded within agents fail silently? How do you measure model quality degradation? Traditional monitoring and building methods don't work. You need to run experiments, build custom evaluations, and set up alerts that assess subjective measures. Dashboards built to track classification accuracy are not designed for open-ended text generation. Log pipelines created for predictable APIs cannot capture reasoning steps, tool usage, or why an agent failed. As a result, most teams fall back on manual spot checks, gut feel, and endless prompt tweaking. That approach might work in the beginning, but it does not scale. What we need instead is a systematic way to measure, monitor, evaluate, and continuously improve LLM and agent systems. That is where observability and structured evaluation come in. What is this course? This course will make you more confident when you build and deploy AI agents or other LLM-based systems. It will teach you the tools and tricks needed for building robust AI agents with structured personalized evaluations and experiments, and how to monitor your agents in production with observability and logging. We first start with the basics, the theory around what makes AI agents / LLM systems particularly difficult to build and track. Then, we get into the practical where we build our own evaluations and instrument our own apps with Galileo AI. What is Galileo AI? Galileo is a platform designed specifically for evaluating and monitoring LLM and agent systems. It's specifically designed for AI agents / LLM-based systems, and includes the following features: Observability: Log LLM interactions, track spans and metadata, visualize agent flows, monitor safety and compliance signals Evaluations: Design experiments, create evaluation datasets, define and register metrics, use LLMs-as-judges, version and compare results In short, it gives you a structured way to understand how your AI systems behave and helps you build them. In this course, we do a masterclass in Galileo AI and how to use it to monitor and evaluate your AI app. Course Overview: Introduction - We start by explaining why LLM evaluations and observability matter, covering the risks of deploying generative AI without structured monitoring, setting expectations, and reviewing the course roadmap. Theory: LLM/Agent Observability - This section introduces traditional monitoring concepts, explains why they fall short for generative systems, and outlines the key components of LLM observability. Theory: LLM / Agent Evaluations - You’ll explore evaluation theory, understand why evaluations are critical for production AI, learn the main evaluation approaches, and see the common challenges teams face with LLMs. Theory: Observability and Evaluations for LLMs vs Traditional ML - We contrast generative AI with classical machine learning, highlighting the unique risks, costs, and iteration loops. Theory: Tools and Approaches for LLM Observability and Evaluations - This section surveys the landscape of observability and evaluation tools available for LLM systems and explains why dedicated platforms are necessary. Practice: Galileo Platform Deep-Dive Overview and Setup - This section walks you through Galileo’s architecture, integrations, pricing, account creation, repository cloning, and local development setup to prepare you for instrumentation. Practice: Logging LLM Interactions with Galileo - You’ll learn practical logging with Galileo, including terminology, manual and SDK-based methods, simulating LLM applications, inspecting agent graphs, detecting errors, and setting up alerts and signals. Practice: Evaluating LLM Performance with Galileo - We shift from observation to evaluation, showing how to design experiments, manage datasets and metadata, implement evaluation code, define metrics, and perform agent-specific and LLM-as-judge assessments. Conclusion: Earn your certificate