Session Led by Lucia Mocz
The advancement of large language models (LLMs) for real-world applications hinges critically on enhancing their reasoning capabilities. In this paper read, we explore the reasoning abilities of LLMs through a geometrical framework. We establish a connection between the expressive power of LLMs and the density of their self-attention graphs.
The analysis performed in this work demonstrates that the density of these self-attention graphs defines the intrinsic dimension of the inputs to the MLP blocks. In particular, it shows, through theoretical analysis and toy examples, that a higher intrinsic dimension implies a greater expressive capacity of the LLM. Additionally, it provides empirical evidence linking this geometric framework to recent advancements in methods aimed at enhancing the reasoning capabilities of LLMs.
Don't miss this opportunity to deepen your understanding of how geometrical insights can drive improvements in AI reasoning. Whether you're an AI researcher, practitioner, or enthusiast, this talk promises to offer valuable perspectives and groundbreaking ideas.
Paper: https://arxiv.org/abs/2407.02678
(Accompanying paper: https://arxiv.org/abs/2312.01648)