Hacker News new | past | comments | ask | show | jobs | submit login

For most the evaluations you reported at engagingness (expect Figure 16). Did you also look humanness? I would be especially interested how human your chatbot is compared to real humans (Figure 17). This would be similar to a turning test.



The bot is really optimized on engagingness rather than humanness - in particular this is how we chose the hyper parameters. We did evaluate Meena vs. Human (31 vs. 69) and Meena vs. BlenderBot (35 vs. 65). But didn't do BlenderBot vs. Human. Good suggestion though.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: