markisus
The authors hypothesize at the bottom of page 3 that linear layers can combine to form nonlinear functions. This is wrong, but maybe I’m misunderstanding what they are trying to say.
nwoli
Scaling linear algebra in the end is probably all we’ll need in the end. Only missing data and compute to get there