Large language models (LLMs) have changed the AI landscape in recent months. The pace of innovation in AI has been unprecedented and unlike any other field of science. This fast-paced progress has been possible mainly due to open science and open-access. However, research on benchmarking and evaluation tools has yet to be able to keep up with the growth of the capabilities of these powerful models.
I will discuss the LLM landscape, where open and closed-access models coexist, and what are the pros and cons of each. Next, I will focus on current tooling for evaluating LLMs’ capabilities and vulnerabilities. Finally, I will discuss open challenges and exciting research problems in taming the wild west of LLMs.