Do they really need “free RLHF”? As I understand it, RLHF needs relatively littl...

spi · 2024-04-19T09:00:19

Variety matters a lot. If you pay 1000 trained labellers, you get 1000 POVs for a good amount of money, and likely can't even think of 1000 good questions to have them ask. If you let 1000000 people give you feedback on random topics for free, and then pay 100 trained people to go through all of that and only retain the most useful 1%, you get much ten times more variety for a tenth of the cost.

Of course numbers are pretty random, but it's just to give an idea of how these things scale. This is my experience from my company's own internal -deep learning but not LLM- models to train which we had to buy data instead of collecting it. If you can't tap into data "from the wild" -in our case, for legal reason- you can still get enough data (if measured in GB), but it's depressingly more repetitive, and that's not quite the same thing when you want to generalize.

mvkel · 2024-04-19T17:43:18

Absolutely.

Modern captchas are self driving object labelers; you just need a few to "agree" to know what the right answer is.

dizhn · 2024-04-20T10:03:49

We should agree on a different answer for crosswalk and traffic light and mess it up for them.