A research team from Google and EPFL proposes a novel approach that sheds light on the operation and inductive biases of self-attention networks, and finds that pure attention decays in rank doubly exponentially with respect to depth.
Here is a quick read: Google & EPFL Study Reveals Huge Inductive Biases in Self-Attention Architectures
The paper Attention is Not All You Need: Pure Attention Loses Rank Doubly Exponentially with Depth is on arXiv.
submitted by /u/Yuqing7
[link] [comments]