A key unlock in performance and quality of responses from LLMs will be the ability to have personalized memory (as BG2 have alluded to multiple times). At SingleStore, we believe performant inference based on personalized memory requires 1) unified storage and 2) a multi-method retrieval engine.
Over the last decade we have built a real-time, petabyte-scale HTAP data warehouse. Now, we are adding a graph query engine that enables complex graph queries directly on your existing tabular data and KV/prompt-caching to deliver faster and cheaper inference. Please join us as we share our thesis for how we can uniquely enable secure, personalized, and cost-optimized performance for LLMs in Enterprise settings.
Arthur is a Product Manager at SingleStore where he leads growth for their AI-application development suite.