I don't follow it too closely, but I've seen papers on mechanistic interpretability that look promising.
I don't follow it too closely, but I've seen papers on mechanistic interpretability that look promising.