Additionally, using any Wiki page is misleading, as LLMs have seen their format many times during training, and can probably reproduce the original HTML from the stripped version fairly well.
Instead, using some random, messy, scattered-with-spam site would be a much more realistic test environment.
Instead, using some random, messy, scattered-with-spam site would be a much more realistic test environment.