Open source LLMs like LLAMA2 and BLOOM have enabled widespread development of enterprise LLM Applications. However, open source LLMs can sometimes lag behind proprietary models. In this talk, we will explore the work done by SambaNova Systems to reduce the gap between open source LLMs and proprietary models. We look into multilingual chat capabilities of BLOOM-176B and how they can be improved, reducing the gap in software API manipulation capabilities of popular open source models and improving long sequence capabilities of existing models. This work and its artifacts (models, benchmarks etc) have been released on HuggingFace for the community to use, evaluate and improve further.
In our journey of working with our customers to make open source LLMs enterprise ready, we noticed a trend where organizations had a need for both general LLMs and specialized expert LLMs. These expert LLMs are useful in various use cases and scenarios – specialized tasks, context distillation, domain adaptation or incorporating private data. Depending on the complexity of the task, multiple such experts need to come together. However, running them efficiently can be prohibitively expensive. Even with 15 experts at 70b scale, the number of total parameters to hold during inference can easily add up to trillion. We look at how SN40L with its unique hierarchical memory architecture enables faster serving of such a collection of experts, outperforming existing ML hardware providers.
Urmish leads the NLP Team at SambaNova Systems. The NLP team at SambaNova focuses on understanding how to train and evaluate HHH aligned large language models, adapting LLMs to enterprise use-cases and HW-SW co-design of LLMs to enable efficient training and inference. Before SambaNova, he was in various engineering and research roles at Arm, AMD and Texas Instruments. He also helped drive the TinyML Performance Working Group in MLPerf, contributing to the development of key benchmarks for IoT ML. Urmish has 35+ publications and patents focussing on efficient deep learning and LLMs and has given guest lectures at various top universities and industry academia summits. He completed his masters at the University of Wisconsin Madison and his bachelors from Birla Institute of Technology and Science.