Understanding Emergent Abilities of Language Models from the Loss Perspective | Hacker News

Hacker News new | past | comments | ask | show | jobs | submit

login

		Understanding Emergent Abilities of Language Models from the Loss Perspective (arxiv.org)
		6 points by maccaw 5 months ago \| hide \| past \| favorite \| 1 comment

cosmojg 5 months ago [–]

Does this mean that "overtraining" a midsize LLM for many more epochs on a small, representative subset of the dataset used by a larger, more performant LLM might be sufficient for matching the performance of the larger model?

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact