Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
KeplerBoy
3 months ago
|
parent
|
context
|
favorite
| on:
Learning to Reason with LLMs
Who knows? Certainly not the public.
It might be a finetuned model that works better in such a setting.
OkGoDoIt
3 months ago
[–]
The linked blog posts explains that it is fine-tuned on some reinforcement learning process. It doesn’t go into details but they do claim it’s not just the base model with chain of thought, there’s some fine-tuning going on.
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search:
It might be a finetuned model that works better in such a setting.