| 1. |
Kozachinskiy A.♦, Urrutia F.♦, Jimenez H.♦, Steifer T., Pizarro G.♦, Fuentes M.♦, Meza F.♦, Calderon C.B.♦, Rojas C.♦, Strassen Attention, Split VC Dimension and Compositionality in Transformers,
NeurIPS 2025, 39th Conference on Neural Information Processing Systems, 2025-11-30/12-07, San Diego (US), pp.1-32, 2025 Streszczenie: We propose the first method to show theoretical limitations for one-layer softmax transformers with arbitrarily many precision bits (even infinite). We establish those limitations for three tasks that require advanced reasoning. The first task, Match 3 (Sanford et al., 2023), requires looking at all possible token triplets in an input sequence. The second and third tasks address compositionality-based reasoning: function composition (Peng et al., 2024) and binary relations composition, respectively. We formally prove the inability of one-layer softmax Transformers to solve any of these tasks. To overcome these limitations, we introduce Strassen attention and prove that, equipped with this mechanism, a one-layer transformer can in principle solve all these tasks. Importantly, we show that it enjoys sub-cubic running-time complexity, making it more scalable than similar previously proposed mechanisms, such as higher-order attention (Sanford et al., 2023). To complement our theoretical findings, we experimentally studied Strassen attention and compared it against standard (Vaswani et al, 2017), higher-order attention (Sanford et al., 2023), and triangular attention (Bergen et al. 2021). Our results help to disentangle all these attention mechanisms, highlighting their strengths and limitations. In particular, Strassen attention outperforms standard attention significantly on all the tasks. Altogether, understanding the theoretical limitations can guide research towards scalable attention mechanisms that improve the reasoning abilities of Transformers Afiliacje autorów:
| Kozachinskiy A. | - | inna afiliacja | | Urrutia F. | - | inna afiliacja | | Jimenez H. | - | inna afiliacja | | Steifer T. | - | IPPT PAN | | Pizarro G. | - | inna afiliacja | | Fuentes M. | - | inna afiliacja | | Meza F. | - | inna afiliacja | | Calderon C.B. | - | inna afiliacja | | Rojas C. | - | inna afiliacja |
|  | 200p. |
| 2. |
Barceló P.♦, Duarte M.♦, Rojas C.♦, Steifer T., No Agreement Without Loss: Learning and Social Choice in Peer Review,
ECAI 2023, 26th European Conference on Artificial Intelligence, 2023-09-30/10-04, Kraków (PL), DOI: 10.3233/FAIA230270, pp.190-197, 2023 Streszczenie: In peer review systems, reviewers are often asked to evaluate various features of submissions, such as technical quality or novelty. A score is given to each of the predefined features and based on these the reviewer has to provide an overall quantitative recommendation. It may be assumed that each reviewer has her own mapping from the set of features to a recommendation, and that different reviewers have different mappings in mind. This introduces an element of arbitrariness known as commensuration bias. In this paper we discuss a framework, introduced by Noothigattu, Shah and Procaccia, and then applied by the organizers of the AAAI 2022 conference. Noothigattu, Shah and Procaccia proposed to aggregate reviewer’s mapping by minimizing certain loss functions, and studied axiomatic properties of this approach, in the sense of social choice theory. We challenge several of the results and assumptions used in their work and report a number of negative results. On the one hand, we study a trade-off between some of the axioms proposed and the ability of the method to properly capture agreements of the majority of reviewers. On the other hand, we show that dropping a certain unrealistic assumption has dramatic effects, including causing the method to be discontinuous. Afiliacje autorów:
| Barceló P. | - | inna afiliacja | | Duarte M. | - | inna afiliacja | | Rojas C. | - | inna afiliacja | | Steifer T. | - | IPPT PAN |
|  | 140p. |
| 3. |
Delle Rose V.♦, Kozachinskiy A.♦, Rojas C.♦, Steifer T., Find a witness or shatter: the landscape of computable PAC learning,
COLT 2023, The Thirty Sixth Annual Conference on Learning Theory, 2023-07-12/07-15, Bangalore (IN), No.195, pp.1-14, 2023 Streszczenie: This paper contributes to the study of CPAC learnability—a computable version of PAC learning—by solving three open questions from recent papers. Firstly, we prove that every improperly CPAC learnable class is contained in a class which is properly CPAC learnable with polynomial sample complexity. This confirms a conjecture by Agarwal et al (COLT 2021). Secondly, we show that there exists a decidable class of hypotheses which is properly CPAC learnable, but only with uncomputably fast-growing sample complexity. This solves a question from Sterkenburg (COLT2022). Finally, we construct a decidable class of finite Littlestone dimension which is not improperly CPAC learnable, strengthening a recent result of Sterkenburg (2022) and answering a question posed by Hasrati and Ben-David (ALT 2023). Together with previous work, our results provide a complete landscape for the learnability problem in the CPAC setting Słowa kluczowe: PAC learnability, CPAC learnability, VC dimension, Littlestone dimension, computability, foundations of machine learning Afiliacje autorów:
| Delle Rose V. | - | University of Siena (IT) | | Kozachinskiy A. | - | inna afiliacja | | Rojas C. | - | inna afiliacja | | Steifer T. | - | IPPT PAN |
|  | 200p. |