Who knows? Certainly not the public. It might be a finetuned model that works be... | Hacker News

Hacker News new | past | comments | ask | show | jobs | submit

login

KeplerBoy 3 months ago | parent | context | favorite | on: Learning to Reason with LLMs

Who knows? Certainly not the public.

It might be a finetuned model that works better in such a setting.

OkGoDoIt 3 months ago [–]

The linked blog posts explains that it is fine-tuned on some reinforcement learning process. It doesn’t go into details but they do claim it’s not just the base model with chain of thought, there’s some fine-tuning going on.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact