There is no way to create a censored model without losing performance.
Techniques for PPO and RLHF (the technical basis for the censorship mechanisms) inevitably destroy some parts of the model’s ability to “reason” (as measured by benchmarks) and accurately gauge relative probabilities of truthfulness.
Getting uncensored base models to the public is really what’s driving research forward on LLM innovations (except scale) at this point.
Techniques for PPO and RLHF (the technical basis for the censorship mechanisms) inevitably destroy some parts of the model’s ability to “reason” (as measured by benchmarks) and accurately gauge relative probabilities of truthfulness.
Getting uncensored base models to the public is really what’s driving research forward on LLM innovations (except scale) at this point.