Training large AI language models is a challenging task that requires a deep understanding of natural language processing, machine learning, and distributed computing.
In this talk, we will go over lessons learned from training models with billions of parameters across hundreds of GPUs. We will discuss the challenges of handling massive amounts of data, designing effective model architectures, optimizing training procedures, and managing computational resources. This talk is suitable for ML researchers, practitioners, and anyone curious about the “sausage making” behind training large language models.
Hagay Lupesko is the VP of Engineering at MosaicML, where he focuses on making generative AI training and inference efficient, fast, and accessible. Prior to MosaicML, Hagay held AI engineering leadership roles at Meta, AWS, and GE Healthcare. He shipped products across various domains: from 3D medical imaging, through global-scale web systems, and up to deep learning systems that power apps and services used by billions of people worldwide.