3/23/2026 Frontier RL Is Cheaper Than You Think
On this page The conventional wisdom on RL infrastructure is wrong, and it is costing teams that could be competing at the frontier. The entire mega-cluster narrative rests on a single assumption: that you have to ship 1 TB of weights every time you update your rollout fleet. You do not. Researchers have spent the last year writing about asynchronous RL and rollout-training disaggregation in systems like AReaL. Teams like Kimi and MiniMax have also published engineering notes on RL parameter updates and asynchronous scheduling. We have been running that pattern in production. That mega-cluster instinct comes from pretraining, where the main systems problem is keeping one huge synchronous training job saturated. RL is a different problem. The question is not just how to run the trainer. It is also how to keep a large rollout fleet generating data from…