← kwindla hultman kramer

My favorite part of the DeepSeek-V3 Technical Report is the stuff about the…

January 30, 2025

My favorite part of the DeepSeek-V3 Technical Report is the stuff about the all-to-all communication kernels. (Mostly in section 3.2.2. "Efficient Implementation of Cross-Node All-to-All Communication.")

It would be fun to see this code, though the actual implementation itself is tightly coupled enough to the architecture of the H800 and their cluster design — the specifics of the NVLink and InfiniBand — that it wouldn't be useful as an open source building block.

Writing really, really optimized distributed systems code is very satisfying. I've written a lot of both GPU code and networking code over the years, so the overlap here makes me particularly happy!

But my favorite, favorite part is that they also wrote down a bunch of "Suggestions on Hardware Design."

Usually, when you work on a system like this, you never manage to write up all the lessons learned even for your own use, much less publish them in such an accessible paper. Kudos to the DeepSeek team.

URL[1]

  1. https://arxiv.org/pdf/2412.19437v1