Generative AI promises to unlock unprecedented efficiency and enhance human expertise in collaborative workflows. However, this technology differs from predictive AI in significant ways, and that creates new challenges to safety and reliability. One important difference is that unlike their predictive predecessors, LLMs are trained on general purpose tasks and subsequently applied to different, specific problem spaces. This means that gathering performance feedback and root-causing problems is not as simple as comparing a prediction against a ground-truth label.
In this talk, we’ll explore closing the feedback loop for generative applications with model-based metrics. At Fiddler AI, we call them “Trust Models” and they’re used both for offline diagnostics and synchronous runtime-path use cases where unsafe or underperforming output is intercepted in real-time.
Josh Rubin is Principal AI Scientist and a five-year veteran at Fiddler AI, an enterprise AI observability company. Fiddler's platform provides comprehensive and customizable workflows for monitoring, alerting, and root-causing predictive and generative models in production.
After a first career in experimental particle physics, Josh developed deep-learning models to address complex hardware calibration and signal processing problems for Labcyte Inc., a biotech hardware company. At Fiddler, Josh built and led a data science team that developed novel explainability tools for computer vision and multimodal deep-learning models, and techniques for measuring model robustness and drift in unstructured data, key components of Fiddler's LLM observability product.
Most recently, he's been developing small BERT-scale models to close the feedback loop on measuring large language model performance, serving customers including cloud-native travel platforms, large financial services firms, ad-tech companies, and cryptocurrency exchanges.