fbpx

Sandeep Krishnamurthy

Databricks company logo.
Engineering Director,
Databricks MosaicAI
Model Training

Presentation Title:

Pre-training large language models:
Lessons from the trenches

Presentation Summary:

Atom icon for The AI Conference 2023, a groundbreaking two-day event on AGI, LLMs, Infrastructure, Alignment, AI Startups, and Neural Architectures.Training large AI language models is a challenging task that requires a deep understanding of natural language processing, machine learning, and distributed computing. In this talk, we will go over lessons learned from training models with billions of parameters across hundreds of GPUs. We will discuss the challenges of handling massive amounts of data, designing effective model architectures, optimizing training procedures, and managing computational resources.

Brain icon for The AI Conference 2023, a groundbreaking two-day event on AGI, LLMs, Infrastructure, Alignment, AI Startups, and Neural Architectures.This talk is suitable for ML researchers, practitioners, and anyone curious about the “sausage making” behind training large language models.

About | Sandeep Krishnamurthy

Sandeep is an engineering director leading Databricks MosaicAI Model Training products. Large Language Model training/fine-tuning for GenAI, applying neural networks on structured data for predictive AI tasks, and building reliable performant accelerated computing infrastructure, are Sandeep’s key interests and expertise.

Sandeep and his team built the MosaicAI Training platform on which the Mosaic Research team built state-of-the-art MPT, DBRX, and ImageAI models. Currently, Sandeep is focused on building products that help every Enterprise build “intelligence” on their secure enterprise data.

In the past, Sandeep led teams that built the Amazon SageMaker Model Training platform supporting hundreds of Enterprise journeys from their data to an intelligent model in production.