Does this mean that "overtraining" a midsize LLM for many more epochs on a small...

cosmojg 5 months ago | parent | context | favorite | on: Understanding Emergent Abilities of Language Model...

Does this mean that "overtraining" a midsize LLM for many more epochs on a small, representative subset of the dataset used by a larger, more performant LLM might be sufficient for matching the performance of the larger model?