Open source LLMs like LLAMA2 and BLOOM have enabled widespread development of enterprise LLM Applications. However, open source LLMs can sometimes lag behind proprietary models. In this talk, we will explore the work done by SambaNova Systems to reduce the gap between open source LLMs and proprietary models. We look into multilingual chat capabilities of BLOOM-176B and how they can be improved, reducing the gap in software API manipulation capabilities of popular open source models and improving long sequence capabilities of existing models. This work and its artifacts (models, benchmarks etc) have been released on HuggingFace for the community to use, evaluate and improve further.
In our journey of working with our customers to make open source LLMs enterprise ready, we noticed a trend where organizations had a need for both general LLMs and specialized expert LLMs. These expert LLMs are useful in various use cases and scenarios – specialized tasks, context distillation, domain adaptation or incorporating private data. Depending on the complexity of the task, multiple such experts need to come together. However, running them efficiently can be prohibitively expensive. Even with 15 experts at 70b scale, the number of total parameters to hold during inference can easily add up to trillion. We look at how SN40L with its unique hierarchical memory architecture enables faster serving of such a collection of experts, outperforming existing ML hardware providers.