A really important nuance here is that they are building on top of Llama-2, the ...

kromem · 2023-11-21T14:00:19.000000Z

Adding to this - it's really interesting the safety stuff that *is* in this paper. Such as:

> We probe some of the categories where we see a larger difference (e.g., violent) and observe that Orca 2 tends to counter the harmful positions more often (which is penalized by the metric), while models that have gone through RLHF safety training tend to decline to respond more often (which is rewarded by the metric).

Or the fact Orca 2 is less likely to extend hate speech than Llama-2-chat which theoretically went through safety fine tuning even though Orca 2 did not have any explicit safety fine tuning.

Research over the past year has really demonstrated (a) just how impactful fine tuning can be - to the point of transmitting capabilities from larger models to smaller, and (b) that we're still clumsily wading through that process with only partial clarity on best practices as the foundational pretrained models get better and better at astounding rates.