ParaRNN: Large-Scale Nonlinear RNNs, Trainable in Parallel
ParaRNN: Large-Scale Nonlinear RNNs, Trainable in Parallel Recurrent Neural Networks (RNNs) are naturally suited to efficient inference, requiring far less memory and compute than attention-based architectures, but the sequential nature of their computation has historically made it impractical to scale up RNNs to billions of parameters. A new advancement from Apple researchers makes RNN training dramatically more efficient — enabling large-scale training for the first time and widening the set of architecture choices available to practitioners in designing LLMs, particularly for resource-constrained deployment. In ParaRNN: Unlocking Parallel Training of Nonlinear RNNs for Large Language Models, a new paper accepted to ICLR 2026 as an Oral, Apple researchers share a new framework for parallelized RNN training that achieves a 665× speedup over the traditional sequential approach (see Figure 1). This efficiency gain enables the training of the first 7-billion-parameter classical RNNs…