To handle low entropy text, the “adding a smaller constant to the logits” approach avoids having much chance of changing the parts that need to be exactly a particular thing,
Though in this case it needs longer texts to have high significance (and when the entropy is low, it needs to be especially long).
But for most text (with typical amounts of entropy per token) apparently it doesn’t need to be that long? Like 25 words I think I heard?
Though in this case it needs longer texts to have high significance (and when the entropy is low, it needs to be especially long).
But for most text (with typical amounts of entropy per token) apparently it doesn’t need to be that long? Like 25 words I think I heard?