> This happens with the HTML contents just as easily.
On an individual scale, yes. But if you're a spammer and want to vaccuum up text from 1000 sites, you'll skip writing a scraper for individual sites (which may change their formatting later anyway) and just use reliably-formatted RSS feeds.
That was probably true back in 2004, but today there are numerous reader view, full text converter, and advanced web spider projects that anybody can plug in and get the full text from at least 90% of web sites, with no extra effort.
This happens with the HTML contents just as easily. Perhaps just include a unique identifier in the text per client, and a link back to the source.