Pretty much all models, including today's models, already fall foul of the "Haza...

andy99 · 2024-04-28T23:38:35

Really I feel the opposite way, that none of today's models or anything foreseeable meets the hazardous capability criteria. Some may be able to provide automation but I don't see any concrete examples where there's any actual step change in what's possible due to LLMs. The problem is it's all in the interpretation. I imagine some people will think that because a 7B model can give a bullet point list of how to make a bomb (step 1: research explosives) or write a phishing email that sounds like a person wrote it that it's "dangerous". In reality the bar should be a lot higher, like uniquely making something possible that wouldn't otherwise be, with concrete examples of it working or being reasonably likely to work, not just the spectre of targeted emails.

I've been actually thinking there should be a bounty for a real hazardous use of AI identified. The problem would be defining hazardous (which would hopefully itself spur conversation). On one end I imagine trivial "hazards" like what we test models with today (like asking to build a bomb) and on the other it's easy to see there could be a shifting goalposts thing where we keep finding reasons something that technically meets the hazard criteria isn't reall hazardous.