Actually sounds pretty cool, but the graph on expert level tasks is confusing my expectations. Saying it has a pass rate of less than 20% sounds a lot like saying this thing is wrong most of the time.
Granted, these strike me as difficult tasks and I’d likely ask it to do far simpler things, but I’m not really sure what to expect from looking at these graphs.
Ah, but the fact that it bothers to cite its sources is a huge plus. Between that and its search abilities it sounds valuable to me
I think that's mostly because of the access to information it has. Much of the highly useful information is not on the public internet or shows up on search engines, only domain experts know about them. Also, the websites may be paywalled or gated by login. So a better comparison would be if the models had the same level of access as an expert.
Granted, these strike me as difficult tasks and I’d likely ask it to do far simpler things, but I’m not really sure what to expect from looking at these graphs.
Ah, but the fact that it bothers to cite its sources is a huge plus. Between that and its search abilities it sounds valuable to me