Discover how Clockwork’s edge-based congestion control technology can turbocharge NCCL communication throughput by 2.6x on TCP/IP+Ethernet networks, which can significantly shorten the time needed for LLM training on such networks.
Based on foundational clock sync technology developed at Stanford University, Clockwork’s software can transform unpredictable data networks into deterministic, time-sensitive systems. In this talk, we’ll provide an in-depth look at Packet Rocket—-an easy-to-deploy software solution that can enable TCP+Ethernet networks to deliver 100% utilization at near-zero packet loss. We will also present results from NCCL tests and discuss how the technology can be used to improve the performance of LLM workloads on TCP+Ethernet networks without hardware support or upgrades.
Vinay Sriram is a senior software engineer at Clockwork Systems. His recent work studies the benefit of Clockwork's foundational congestion control technology on machine learning workloads in the public cloud. Vinay also works on Clockwork’s real-time network monitoring and clock synchronization solutions. He holds a B.S. and M.S. in computer science from Stanford University. Before joining Clockwork, he was a research assistant in the Stanford Platform Lab, and has worked on ML teams at Google and Amazon.