SwissMAP Logo
Log in
  • About us
    • Organization
    • Professors
    • Senior Researchers
    • Postdocs
    • PhD Students
    • Alumni
  • News & Events
    • News
    • Events
    • Online Events
    • Videos
    • Newsletters
    • Press Coverage
    • Perspectives Journal
    • Interviews
  • Research
    • Basic Notions
    • Phase III Directions
    • Phases I & II Projects
    • Publications
    • SwissMAP Research Station
  • Awards, Visitors & Vacancies
    • Awards
    • Innovator Prize
    • Visitors
    • Vacancies
  • Outreach & Education
    • Masterclasses & Doctoral Schools
    • Mathscope
    • Maths Club
    • Athena Project
    • ETH Math Youth Academy
    • SPRING
    • Junior Euler Society
    • General Relativity for High School Students
    • Outreach Resources
    • Exhibitions
    • Previous Programs
    • Events in Outreach
    • News in Outreach
  • Equal Opportunities
    • Mentoring Program
    • Financial Support
    • SwissMAP Scholars
    • Events in Equal Opportunities
    • News in Equal Opportunities
  • Contact
    • Corporate Design
  • Basic Notions
  • Phase III Directions
  • Phases I & II Projects
  • Publications
  • SwissMAP Research Station

A multiscale analysis of mean-field transformers in the moderate interaction regime

Giuseppe Bruno, Federico Pasqualotto, Andrea Agazzi

29/9/25 Published in : arXiv:2509.25040

In this paper, we study the evolution of tokens through the depth of encoder-only transformer models at inference time by modeling them as a system of particles interacting in a mean-field way and studying the corresponding dynamics. More specifically, we consider this problem in the moderate interaction regime, where the number of tokens is large and the inverse temperature parameter of the model scales together with . In this regime, the dynamics of the system displays a multiscale behavior: a fast phase, where the token empirical measure collapses on a low-dimensional space, an intermediate phase, where the measure further collapses into clusters, and a slow one, where such clusters sequentially merge into a single one. We provide a rigorous characterization of the limiting dynamics in each of these phases and prove convergence in the above mentioned limit, exemplifying our results with some simulations.

Entire article

Phase III direction(s)

  • Statistical Mechanics and Random Structures
  • Differential equations of Mathematical Physics

Noise sensitivity of crossings for high temperature Ising model

Quantitative convergence of trained single layer neural networks to Gaussian processes

  • Leading house

  • Co-leading house


The National Centres of Competence in Research (NCCRs) are a funding scheme of the Swiss National Science Foundation

© SwissMAP 2025 - All rights reserved