Instytut Podstawowych Problemów Techniki
Polskiej Akademii Nauk

Partnerzy

Francisco Meza


Prace konferencyjne
1.  Kozachinskiy A., Urrutia F., Jimenez H., Steifer T., Pizarro G., Fuentes M., Meza F., Calderon C.B., Rojas C., Strassen Attention, Split VC Dimension and Compositionality in Transformers, NeurIPS 2025, 39th Conference on Neural Information Processing Systems, 2025-11-30/12-07, San Diego (US), pp.1-32, 2025

Streszczenie:
We propose the first method to show theoretical limitations for one-layer softmax transformers with arbitrarily many precision bits (even infinite). We establish those limitations for three tasks that require advanced reasoning. The first task, Match 3 (Sanford et al., 2023), requires looking at all possible token triplets in an input sequence. The second and third tasks address compositionality-based reasoning: function composition (Peng et al., 2024) and binary relations composition, respectively. We formally prove the inability of one-layer softmax Transformers to solve any of these tasks. To overcome these limitations, we introduce Strassen attention and prove that, equipped with this mechanism, a one-layer transformer can in principle solve all these tasks. Importantly, we show that it enjoys sub-cubic running-time complexity, making it more scalable than similar previously proposed mechanisms, such as higher-order attention (Sanford et al., 2023). To complement our theoretical findings, we experimentally studied Strassen attention and compared it against standard (Vaswani et al, 2017), higher-order attention (Sanford et al., 2023), and triangular attention (Bergen et al. 2021). Our results help to disentangle all these attention mechanisms, highlighting their strengths and limitations. In particular, Strassen attention outperforms standard attention significantly on all the tasks. Altogether, understanding the theoretical limitations can guide research towards scalable attention mechanisms that improve the reasoning abilities of Transformers

Afiliacje autorów:
Kozachinskiy A. - inna afiliacja
Urrutia F. - inna afiliacja
Jimenez H. - inna afiliacja
Steifer T. - IPPT PAN
Pizarro G. - inna afiliacja
Fuentes M. - inna afiliacja
Meza F. - inna afiliacja
Calderon C.B. - inna afiliacja
Rojas C. - inna afiliacja
200p.

Kategoria A Plus

IPPT PAN

logo ippt            ul. Pawińskiego 5B, 02-106 Warszawa
  +48 22 826 12 81 (centrala)
  +48 22 826 98 15
 

Znajdź nas

mapka
© Instytut Podstawowych Problemów Techniki Polskiej Akademii Nauk 2025