Hacker News new | past | comments | ask | show | jobs | submit login

"No dude, the bribe you offered was too much so the LLM got spooked, you need to stay in a realistic range. We've fine-tuned a local model on realistic bribe amounts sourced via Mechanical Turk to get a good starting point and then used RLMF to dial in the optimal amount by measuring task performance relative to bribe."



RLMF: Reinforcement Learning, Mother Fucker!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: