Enric Boix-Adsera

My research focuses on building a mathematical science of deep learning. My goal is to understand the fundamental mechanisms driving how neural networks learn, in order to enable more efficient and more trustworthy AI systems.

People: I am fortunate to work with a number of amazing students and collaborators. At UPenn, I am currently co-advising Honam Wong with Surbhi Goel.

Publications [sorted by year | sorted by topic]

* denotes equally-contributing first authors and (αβ) denotes alphabetical order

2025

The Features at Convergence Theorem: a first-principles alternative to the Neural Feature Ansatz for how networks learn representations
EB*, Neil Mallinar*, James B. Simon, Mikhail Belkin.
Preprint.
The power of fine-grained experts: Granularity boosts expressivity in Mixture of Experts
(αβ) EB, Philippe Rigollet.
Preprint.
Let Me Think! A long chain of thought can be worth exponentially many short ones
Parsa Mirtaheri*, Ezra Edelman*, Samy Jelassi, Eran Malach, EB.
Conference on Neural Information Processing Systems (NeurIPS'25).
Toward universal steering and monitoring of AI models
Daniel Beaglehole, Adityanarayanan Radhakrishnan, EB, Mikhail Belkin.
Preprint.
On the inductive bias of infinite-depth ResNets and the bottleneck rank
EB.
Preprint.

2024

Towards a theory of model distillation
EB.
Preprint.

2023

When can transformers reason with abstract symbols?
EB*, Omid Saremi, Emmanuel Abbe, Samy Bengio, Etai Littwin, Joshua Susskind.
International Conference on Learning Representations (ICLR'24).
Prompts have evil twins
Rimon Melamed, Lucas H. McCabe, Tanay Wakhare, Yejin Kim, H. Howie Huang, EB.
Conference on Empirical Methods in Natural Language Processing (EMNLP'24).
Transformers learn through gradual rank increase
EB*, Etai Littwin*, Emmanuel Abbe, Samy Bengio, Joshua Susskind.
Conference on Neural Information Processing Systems (NeurIPS'23).
Tight conditions for when the NTK approximation is valid
(αβ) EB, Etai Littwin.
Transactions on Machine Learning Research (TMLR).
SGD learning on neural networks: leap complexity and saddle-to-saddle dynamics
(αβ) Emmanuel Abbe, EB, Theodor Misiakiewicz.
Conference on Learning Theory (COLT'23).

2022

GULP: a prediction-based metric between representations
EB*, Hannah Lawrence*, George Stepaniants*, Philippe Rigollet.
Conference on Neural Information Processing Systems (NeurIPS'22).
Selected as oral (top 8% accepted papers)
On the non-universality of deep learning: quantifying the cost of symmetry
(αβ) Emmanuel Abbe, EB.
Conference on Neural Information Processing Systems (NeurIPS'22).
The merged-staircase property: a necessary and nearly sufficient condition for SGD learning of sparse functions on two-layer neural networks
(αβ) Emmanuel Abbe, EB, Theodor Misiakiewicz.
Conference on Learning Theory (COLT'22).

2021

The staircase property: How hierarchical structure can guide deep learning
(αβ) Emmanuel Abbe, EB, Matthew Brennan, Guy Bresler, Dheeraj Nagaraj.
Conference on Neural Information Processing Systems (NeurIPS'21).
Chow-Liu++: Optimal Prediction-Centric Learning of Tree Ising Models
(αβ) EB, Guy Bresler, Frederic Koehler.
Foundations of Computer Science (FOCS'21).
Wasserstein barycenters are NP-hard to compute
(αβ) Jason Altschuler, EB.
SIAM Journal on Mathematics of Data Science (SIMODS).

2020

Hardness results for Multimarginal Optimal Transport problems
(αβ) Jason Altschuler, EB.
Discrete Optimization (DISOPT).
Polynomial-time algorithms for Multimarginal Optimal Transport problems with structure
(αβ) Jason Altschuler, EB.
Mathematical Programming.
Wasserstein barycenters can be computed in polynomial time in fixed dimension
(αβ) Jason Altschuler, EB.
Journal of Machine Learning Research (JMLR).
The Multiplayer Colonel Blotto Game
(αβ) EB, Ben Edelman, Siddhartha Jayanti.
Conference version: Economics and Computation (EC'20).
Journal version: Games and Economic Behavior.

2019

The Average-Case Complexity of Counting Cliques in Erdos-Renyi Hypergraphs
(αβ) EB, Matthew Brennan, Guy Bresler.
Foundations of Computer Science (FOCS'19).
Invited to the SIAM Journal on Computing Special Issue for FOCS 2019
Sample-Efficient Active Learning of Causal Trees
Kristjan Greenewald*, Dmitriy Katz-Rogozhnikov*, Karthikeyan Shanmugam*, Sara Magliacane, Murat Kocaoglu, EB, Guy Bresler.
Conference on Neural Information Processing Systems (NeurIPS'19).
Subadditivity Beyond Trees and the Chi-Squared Mutual Information
(αβ) Emmanuel Abbe, EB.
IEEE International Symposium on Information Theory (ISIT'19).
Randomized Concurrent Set Union and Generalized Wake-Up
Siddhartha Jayanti*, Robert E. Tarjan*, EB.
Symposium on Principles of Distributed Computing (PODC'19).

2018

An Information-Percolation Bound for Spin Synchronization on General Graphs
(αβ) Emmanuel Abbe, EB.
Annals of Applied Probability (AAP).
Graph powering and spectral robustness
(αβ) Emmanuel Abbe, EB, Peter Ralli, Colin Sandon.
SIAM Journal on Mathematics of Data Science (SIMODS).

* denotes equally-contributing first authors and (αβ) denotes alphabetical order

Learning

The Features at Convergence Theorem: a first-principles alternative to the Neural Feature Ansatz for how networks learn representations
EB*, Neil Mallinar*, James B. Simon, Mikhail Belkin.
Preprint.
The power of fine-grained experts: Granularity boosts expressivity in Mixture of Experts
(αβ) EB, Philippe Rigollet.
Preprint.
Let Me Think! A long chain of thought can be worth exponentially many short ones
Parsa Mirtaheri*, Ezra Edelman*, Samy Jelassi, Eran Malach, EB.
Conference on Neural Information Processing Systems (NeurIPS'25).
Toward universal steering and monitoring of AI models
Daniel Beaglehole, Adityanarayanan Radhakrishnan, EB, Mikhail Belkin.
Preprint.
On the inductive bias of infinite-depth ResNets and the bottleneck rank
EB.
Preprint.
Towards a theory of model distillation
EB.
Preprint.
When can transformers reason with abstract symbols?
EB*, Omid Saremi, Emmanuel Abbe, Samy Bengio, Etai Littwin, Joshua Susskind.
International Conference on Learning Representations (ICLR'24).
Prompts have evil twins
Rimon Melamed, Lucas H. McCabe, Tanay Wakhare, Yejin Kim, H. Howie Huang, EB.
Conference in Empirical Methods in Natural Language Processing (EMNLP'24).
Transformers learn through gradual rank increase
EB*, Etai Littwin*, Emmanuel Abbe, Samy Bengio, Joshua Susskind.
Conference on Neural Information Processing Systems (NeurIPS'23).
Tight conditions for when the NTK approximation is valid
(αβ) EB, Etai Littwin.
Transactions on Machine Learning Research (TMLR).
SGD learning on neural networks: leap complexity and saddle-to-saddle dynamics
(αβ) Emmanuel Abbe, EB, Theodor Misiakiewicz.
Conference on Learning Theory (COLT'23).
GULP: a prediction-based metric between representations
EB*, Hannah Lawrence*, George Stepaniants*, Philippe Rigollet.
Conference on Neural Information Processing Systems (NeurIPS'22).
Selected as oral (top 8% accepted papers)
On the non-universality of deep learning: quantifying the cost of symmetry
(αβ) Emmanuel Abbe, EB.
Conference on Neural Information Processing Systems (NeurIPS'22).
The merged-staircase property: a necessary and nearly sufficient condition for SGD learning of sparse functions on two-layer neural networks
(αβ) Emmanuel Abbe, EB, Theodor Misiakiewicz.
Conference on Learning Theory (COLT'22).
The staircase property: How hierarchical structure can guide deep learning
(αβ) Emmanuel Abbe, EB, Matthew Brennan, Guy Bresler, Dheeraj Nagaraj.
Conference on Neural Information Processing Systems (NeurIPS'21).
Chow-Liu++: Optimal Prediction-Centric Learning of Tree Ising Models
(αβ) EB, Guy Bresler, Frederic Koehler.
Foundations of Computer Science (FOCS'21).
Sample-Efficient Active Learning of Causal Trees
Kristjan Greenewald*, Dmitriy Katz-Rogozhnikov*, Karthikeyan Shanmugam*, Sara Magliacane, Murat Kocaoglu, EB, Guy Bresler.
Conference on Neural Information Processing Systems (NeurIPS'19).
Subadditivity Beyond Trees and the Chi-Squared Mutual Information
(αβ) Emmanuel Abbe, EB.
IEEE International Symposium on Information Theory (ISIT'19).
An Information-Percolation Bound for Spin Synchronization on General Graphs
(αβ) Emmanuel Abbe, EB.
Annals of Applied Probability (AAP).
Graph powering and spectral robustness
(αβ) Emmanuel Abbe, EB, Peter Ralli, Colin Sandon.
SIAM Journal on Mathematics of Data Science (SIMODS).

Optimal Transport

Wasserstein barycenters are NP-hard to compute
(αβ) Jason Altschuler, EB.
SIAM Journal on Mathematics of Data Science (SIMODS).
Hardness results for Multimarginal Optimal Transport problems
(αβ) Jason Altschuler, EB.
Discrete Optimization (DISOPT).
Polynomial-time algorithms for Multimarginal Optimal Transport problems with structure
(αβ) Jason Altschuler, EB.
Mathematical Programming.
Wasserstein barycenters can be computed in polynomial time in fixed dimension
(αβ) Jason Altschuler, EB.
Journal of Machine Learning Research (JMLR).
The Multiplayer Colonel Blotto Game
(αβ) EB, Ben Edelman, Siddhartha Jayanti.
Conference version: Economics and Computation (EC'20).
Journal version: Games and Economic Behavior.

Miscellaneous

Randomized Concurrent Set Union and Generalized Wake-Up
Siddhartha Jayanti*, Robert E. Tarjan*, EB.
Symposium on Principles of Distributed Computing (PODC'19).
The Average-Case Complexity of Counting Cliques in Erdos-Renyi Hypergraphs
(αβ) EB, Matthew Brennan, Guy Bresler.
Foundations of Computer Science (FOCS'19).
Invited to the SIAM Journal on Computing Special Issue for FOCS 2019