Right, but part of the argument in the paper, as I understand it, is that the se... | Hacker News

Hacker News new | past | comments | ask | show | jobs | submit

login

magicalhippo 5 months ago | parent | context | favorite | on: Reasoning in Large Language Models: A Geometric Pe...

Right, but part of the argument in the paper, as I understand it, is that the self-attention layers can increase the intrinsic dimension of the input data if you feed it additional, relevant context.

I guess you could also use this result to find that a smaller network might be sufficient for your particular problem.

slashdave 5 months ago [–]

If you have additional context that is relevant, feed it to the network. Why wouldn't you? As to the size of the network, this is not a simple benefit, because you need to account for the trade off between model size and training efficiency.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact