★ TOP STORY[ AMLR ]Research·99d ago

ParaRNN: Large-Scale Nonlinear RNNs, Trainable in Parallel

ParaRNN: Large-Scale Nonlinear RNNs, Trainable in Parallel Recurrent Neural Networks (RNNs) are naturally suited to efficient inference, requiring far less memory and compute than attention-based architectures, but the sequential nature of their computation has historically made it impractical to scale up RNNs to billions of parameters. A new advancement from Apple researchers makes RNN training dramatically more efficient — enabling large-scale training for the first time and widening the set of architecture choices available to practitioners in designing LLMs, particularly for resource-constrained deployment. In ParaRNN: Unlocking Parallel Training of Nonlinear RNNs for Large Language Models, a new paper accepted to ICLR 2026 as an Oral, Apple researchers share a new framework for parallelized RNN training that achieves a 665× speedup over the traditional sequential approach (see Figure 1). This efficiency gain enables the training of the first 7-billion-parameter classical RNNs…

Apple Machine Learning Researchread →

▲ trending · last 48hview all →

▾[AMLR]Apple Machine Learning Research· 13 articlesvisit →

155d ago

International Conference on Learning Representations (ICLR) 2026

International Conference on Learning Representations (ICLR) 2026 Apple is presenting new research at the annual International Conference on Learning Representations (ICLR), which takes place in person in Rio de Janeiro, Brazil, from April 23 to 27. We are proud to again sponsor the conference, which brings together the scientific and industrial research communities focused on deep learning. Below is an overview of Apple’s participation at ICLR 2026: Jump to a section: Schedule Stop by the Apple booth #204 during exhibition hours: 9:30 AM - 5:30 PM (Thursday, April 23 - Saturday, April 25). All times referenced in schedule are in BRT (local time). Schedule Thursday, April 23 - Pretraining with Hierarchical Memories: Separating Long-Tail and Common Knowledge - 10:30 AM - 1:00 PM, Poster Session 1, Pavilion 3, #0309 - Hadi Pour Ansari, C Thomas, David Grangier, Michael Kirchhof, Oncel…

155dResearch

285d ago

SQUIRE: Interactive UI Authoring via Slot QUery Intermediate REpresentations

SQUIRE: Interactive UI Authoring via Slot QUery Intermediate REpresentations AuthorsAlan Leung, Ruijia Cheng, Jason Wu, Jeffrey Nichols, Titus Barik SQUIRE: Interactive UI Authoring via Slot QUery Intermediate REpresentations AuthorsAlan Leung, Ruijia Cheng, Jason Wu, Jeffrey Nichols, Titus Barik Frontend developers create UI prototypes to evaluate alternatives, which is a time-consuming process of repeated iteration and refinement. Generative AI code assistants enable rapid prototyping simply by prompting through a chat interface rather than writing code. However, while this interaction gives developers flexibility since they can write any prompt they wish, it makes it challenging to control what is generated. First, natural language on its own can be ambiguous, making it difficult for developers to precisely communicate their intentions. Second, the model may respond unpredictably, requiring the developer to re-prompt through trial-and-error to repair any undesired changes. To address these weaknesses, we…

285d#coding

295d ago

MixAtlas: Uncertainty-aware Data Mixture Optimization for Multimodal LLM Midtraining

MixAtlas: Uncertainty-aware Data Mixture Optimization for Multimodal LLM Midtraining AuthorsBingbing Wen**, Sirajul Salekin, Feiyang Kang†, Lucy Lu Wang‡, Bill Howe‡, Javier Movellan, Manjot Bilkhu MixAtlas: Uncertainty-aware Data Mixture Optimization for Multimodal LLM Midtraining AuthorsBingbing Wen**, Sirajul Salekin, Feiyang Kang†, Lucy Lu Wang‡, Bill Howe‡, Javier Movellan, Manjot Bilkhu This paper was accepted at the Workshop on Navigating and Addressing Data Problems for Foundation Models (NADPFM) at ICLR 2026. Principled domain reweighting can substantially improve sample efficiency and downstream generalization; however, data-mixture optimization for multimodal pretraining remains underexplored. Current multimodal training recipes tune mixtures from only a single perspective such as data format or task type. We introduce MixAtlas, a principled framework for compute-efficient multimodal mixture optimization via systematic domain decomposition and smaller proxy models. MixAtlas factorizes the training data along two interpretable axes - image concepts and task supervision -…

295dResearch#multimodal#training

505d ago

Apple Machine Learning Research at ICLR 2026

Apple is advancing AI and ML with fundamental research, much of which is shared through publications and engagement at conferences in order to accelerate progress in this important field and support the broader community. This week, the Fourteenth International Conference on Learning Representations (ICLR) will be held in Rio de Janeiro, Brazil, and Apple is proud to again participate in this important event for the research community and to support it with sponsorship. At the main conference and associated workshops, Apple researchers will present new research across a variety of topics, including work unlocking large-scale training for Recurrent Neural Networks, a technique for improving State Space Models, a new approach to unifying image understanding and generation, a method for generating 3D scenes from a single photo, and a new approach to protein folding. During exhibition hours, attendees will be able…

505dResearch

543d ago

LaCy: What Small Language Models Can and Should Learn is Not Just a Question of Loss

LaCy: What Small Language Models Can and Should Learn is Not Just a Question of Loss AuthorsSzilvia Ujváry†**, Louis Béthune, Pierre Ablin, João Monteiro, Marco Cuturi, Michael Kirchhof LaCy: What Small Language Models Can and Should Learn is Not Just a Question of Loss AuthorsSzilvia Ujváry†**, Louis Béthune, Pierre Ablin, João Monteiro, Marco Cuturi, Michael Kirchhof This paper was accepted at the Workshop on Memory for LLM-Based Agentic Systems at ICLR. Language models have consistently grown to compress more world knowledge into their parameters, but the knowledge that can be pretrained into them is upper-bounded by their parameter size. Especially the capacity of Small Language Models (SLMs) is limited, leading to factually incorrect generations. This problem is often mitigated by giving the SLM access to an outside source: the ability to query a larger model, documents, or a database. Under…

543dTutorial#agents

890d ago

Can Large Language Models Understand Context?

Can Large Language Models Understand Context? AuthorsYilun Zhu†**, Joel Ruben Antony Moniz, Shruti Bhargava, Jiarui Lu, Dhivya Piraviperumal, Site Li, Yuan Zhang, Hong Yu, Bo-Hsiang Tseng Can Large Language Models Understand Context? AuthorsYilun Zhu†**, Joel Ruben Antony Moniz, Shruti Bhargava, Jiarui Lu, Dhivya Piraviperumal, Site Li, Yuan Zhang, Hong Yu, Bo-Hsiang Tseng Understanding context is key to understanding human language, an ability which Large Language Models (LLMs) have been increasingly seen to demonstrate to an impressive extent. However, though the evaluation of LLMs encompasses various domains within the realm of Natural Language Processing, limited attention has been paid to probing their linguistic capability of understanding contextual features. This paper introduces a context understanding benchmark by adapting existing datasets to suit the evaluation of generative models. This benchmark comprises of four distinct tasks and nine datasets, all featuring prompts designed to…

890dResearch#benchmark

1167d ago

Cram Less to Fit More: Training Data Pruning Improves Memorization of Facts

Cram Less to Fit More: Training Data Pruning Improves Memorization of Facts AuthorsJiayuan Ye, Vitaly Feldman, Kunal Talwar Cram Less to Fit More: Training Data Pruning Improves Memorization of Facts AuthorsJiayuan Ye, Vitaly Feldman, Kunal Talwar This paper was accepted at the Workshop on Navigating and Addressing Data Problems for Foundation Models at ICLR 2026. Large language models (LLMs) can struggle to memorize factual knowledge in their parameters, often leading to hallucinations and poor performance on knowledge-intensive tasks. In this paper, we formalize fact memorization from an information-theoretic perspective and study how training data distributions affect fact accuracy. We show that fact accuracy is suboptimal (below the capacity limit) whenever the amount of information contained in the training data facts exceeds model capacity. This is further exacerbated when the fact frequency distribution is skewed (e.g. a power law). We propose…

1167dResearch#training

1244d ago

Personalized Group Relative Policy Optimization for Heterogenous Preference Alignment

Personalized Group Relative Policy Optimization for Heterogenous Preference Alignment AuthorsJialu Wang, Heinrich Peters, Asad A. Butt, Navid Hashemi, Alireza Hashemi, Pouya M. Ghari, Joseph Hoover, James Rae, Morteza Dehghani Personalized Group Relative Policy Optimization for Heterogenous Preference Alignment AuthorsJialu Wang, Heinrich Peters, Asad A. Butt, Navid Hashemi, Alireza Hashemi, Pouya M. Ghari, Joseph Hoover, James Rae, Morteza Dehghani Despite their sophisticated general-purpose capabilities, Large Language Models (LLMs) often fail to align with diverse individual preferences because standard post-training methods, like Reinforcement Learning with Human Feedback (RLHF), optimize for a single, global objective. While Group Relative Policy Optimization (GRPO) is a widely adopted on-policy reinforcement learning framework, its group-based normalization implicitly assumes that all samples are exchangeable, inheriting this limitation in personalized settings. This assumption conflates distinct user reward distributions and systematically biases learning toward dominant preferences while suppressing minority signals.…

1244d#fine-tuning#training#safety

1466d ago

ACM Human-Computer Interaction Conference (CHI) 2026

ACM Human-Computer Interaction Conference (CHI) 2026 Apple is presenting new research at the annual ACM (Association of Computing Machinery) CHI Conference on Human Factors in Computing Systems, which takes place in person in Barcelona, Spain, from April 13 to 17. We are proud to again sponsor the conference, which brings together the scientific and industrial research communities focused on human-computer interaction. Below is an overview of Apple’s participation at CHI 2026. Below is the schedule of Apple-sponsored presentations, demos, and events at CHI 2026. Jump to a section: Schedule Stop by the Apple booth during exhibition hours at the CHI 2026 venue in Barcelona, Spain. All times listed in CEST (local time): - Monday, April 13: 10:30 - 16:30; CHI Reception 18:00 - 20:00 - Tuesday, April 14: 10:00 - 18:00 - Wednesday, April 15: 10:00 - 18:00 - Thursday,…

1466dResearch

1548d ago

A Theoretical Framework for Acoustic Neighbor Embeddings

A Theoretical Framework for Acoustic Neighbor Embeddings AuthorsWoojay Jeon A Theoretical Framework for Acoustic Neighbor Embeddings AuthorsWoojay Jeon This paper provides a theoretical framework for interpreting acoustic neighbor embeddings, which are representations of the phonetic content of variable-width audio or text in a fixed-dimensional embedding space. A probabilistic interpretation of the distances between embeddings is proposed, based on a general quantitative definition of phonetic similarity between words. This provides us a framework for understanding and applying the embeddings in a principled manner. Theoretical and empirical evidence to support an approximation of uniform cluster-wise isotropy are shown, which allows us to reduce the distances to simple Euclidean distances. Four experiments that validate the framework and demonstrate how it can be applied to diverse problems are described. Nearest-neighbor search between audio and text embeddings can give isolated word classification accuracy that is…

1548dResearch#multimodal#embeddings

1618d ago

Efficient Privacy Loss Accounting for Subsampling and Random Allocation

Efficient Privacy Loss Accounting for Subsampling and Random Allocation AuthorsVitaly Feldman, Moshe Shenfeld† Efficient Privacy Loss Accounting for Subsampling and Random Allocation AuthorsVitaly Feldman, Moshe Shenfeld† We consider the privacy amplification properties of a sampling scheme in which a user’s data is used in k steps chosen randomly and uniformly from a sequence (or set) of t steps. This sampling scheme has been recently applied in the context of differentially private optimization (Chua et al., 2024a; Choquette-Choo et al., 2025) and communication-efficient high-dimensional private aggregation (Asi et al., 2025), where it was shown to have utility advantages over the standard Poisson sampling. Theoretical analyses of this sampling scheme (Feldman & Shenfeld, 2025; Dong et al., 2025) lead to bounds that are close to those of Poisson sampling, yet still have two significant shortcomings. First, in many practical settings, the resulting…

1618d#local

1677d ago

What Do Your Logits Know? (The Answer May Surprise You!)

What Do Your Logits Know? (The Answer May Surprise You!) AuthorsMasha Fedzechkina, Eleonora Gualdoni, Rita Ramos, Sinead Williamson What Do Your Logits Know? (The Answer May Surprise You!) AuthorsMasha Fedzechkina, Eleonora Gualdoni, Rita Ramos, Sinead Williamson Recent work has shown that probing model internals can reveal a wealth of information not apparent from the model generations. This poses the risk of unintentional or malicious information leakage, where model users are able to learn information that the model owner assumed was inaccessible. Using vision-language models as a testbed, we present the first systematic comparison of information retained at different “representational levels” as it is compressed from the rich information encoded in the residual stream through two natural bottlenecks: low-dimensional projections of the residual stream obtained using tuned lens, and the final top- logits most likely to impact model’s answer. We show…

1677dTutorial#multimodal

2644d ago

Governance-Aware Agent Telemetry for Closed-Loop Enforcement in Multi-Agent AI Systems

Governance-Aware Agent Telemetry for Closed-Loop Enforcement in Multi-Agent AI Systems AuthorsAnshul Pathak, Nishant Jain Governance-Aware Agent Telemetry for Closed-Loop Enforcement in Multi-Agent AI Systems AuthorsAnshul Pathak, Nishant Jain Enterprise multi-agent AI systems produce thousands of inter-agent interactions per hour, yet existing observability tools capture these dependencies without enforcing anything. OpenTelemetry and Langfuse collect telemetry but treat governance as a downstream analytics concern, not a real-time enforcement target. The result is an “observe-but-do-not-act” gap where policy violations are detected only after damage is done. We present Governance-Aware Agent Telemetry (GAAT), a reference architecture that closes the loop between telemetry collection and automated policy enforcement for multi-agent systems. GAAT introduces (1) a Governance Telemetry Schema (GTS) extending OpenTelemetry with governance attributes; (2) a real-time policy violation detection engine using OPA-compatible declarative rules under sub-200 ms latency; (3) a Governance Enforcement Bus (GEB)…

2644dAgents#agents#langchain#observability