This talk addresses key challenges in evaluating LLM-powered AI agents: behavioral instability, comprehensive testing, and cascading failures.
We’ll explore advanced techniques including multi-dimensional metrics and automated scenario generation. Gain insights gleaned from hundreds of AI engineering teams in production into implementing agent-specific observability systems and designing robust evaluation pipelines that can handle the complexities of modern AI agents, from detecting subtle regressions to quantifying performance across diverse, dynamically generated test cases.
Jason Lopatecki is the co-founder and CEO of Arize AI, an AI observability company. He is a garage-to-IPO executive with an extensive background in building marketing-leading products and businesses that heavily leverage analytics.
Prior to Arize, Jason was the co-founder and Chief Innovation Officer at TubeMogul, where he scaled the business into a public company and eventual acquisition by Adobe. Jason has hands-on knowledge of big data architectures, programmatic advertising systems, distributed systems, and machine learning and data processing architectures.
In his free time, Jason tinkers with personal machine learning projects as a hobby, with a special interest in unsupervised learning and deep neural networks. He holds an electrical engineering and computer science degree from UC Berkeley - Go Bears!