Louis Brandy

VP Engineering
Rockset

Conference Video

Slides

Presentation Title:

On a Collision Course: ML and Real-Time Data

Presentation Summary:

When ML and real-time data infrastructure collide, a whole host of interesting new challenges are presented by each, to the other.

My aim in this talk is to help engineers across disciplines recognize these challenges and create a common language for discussing them, so they can better infuse multiple perspectives into the design of real-time ML at scale. The session will weave in lessons learned building spam-fighting infrastructure at Facebook and real-time data at Rockset to make the design challenges more applicable:

Low-latency everything. Every aspect of the data system, designed for supporting ML, needs to think about latency when you’re tackling anomaly detection or spam fighting.
Large ingest volumes of continuously arriving data needs to be queryable quickly. This requires streaming data to be indexed to power ML features. In spam fighting, spammers act quickly and their previous actions need to show up in the current classification.
Fast queries. Most spam is best stopped synchronously, before it’s ever written to any system. Classifications must be quick. Features need to be generated quickly, or pre-computed. This runs into the classic “materialized view” problems of traditional databases, except in an ML context.
Hybrid queries. The most valuable queries tend to involve both ML or anomaly detection techniques (e.g. vector search) combined with traditional SQL database techniques (e.g. where clauses).
Development loop. It’s always a good idea to make your development loop as tight as possible but this is even more crucial in adversarial or time-critical situations. Every aspect of the orchestration and training of ML workflows becomes latency sensitive, as well.

Louis Brandy on the Critical Need for Incremental Updates and Metadata Filtering.

00:00

About | Louis Brandy

Louis Brandy is the Vice President of Engineering at Rockset. Prior to Rockset, Louis was Director of Engineering at Facebook. During his time there, he was an early engineer and manager in Facebook’s Site Integrity organization where his team built much of the anti-abuse infrastructure that powers Facebook’s spam fighting, fraud detection, and other online, real-time classification systems. He also worked on Facebook's RPC and service discovery ecosystem and built and supported the C++ infrastructure teams responsible for the overall health of the Facebook C++ codebase, working on compilers, sanitizers, linters, and core (and open-source) libraries like folly, jemalloc, and fbthrift.