Hacker News new | past | comments | ask | show | jobs | submit login

the interesting advance in the anthropic/mats research program is the application of dictionary learning to the "superpositioned" latent representations of transformers to find more "interpretable" features. however, "interpretability" is generally scored by the explainer/interpreter paradigm which is a bit ad hoc, and true automated circuit discovery (rather than simple concept representation) is still a bit off afaik.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: