If you let people see their toxicity rating, they'll just learn to game the system. Of course, more indirect or poetic insults might be an improvement.
"George Soros is influencing the media": 6% likely to be perceived as toxic.
"(((George Soros))) is influencing the media": 2% likely to be perceived as toxic.
This thing literally considers using anti-Semitic coded toxic messaging to make your statements three times less likely to be toxic. I mean, if it ignored punctuation I could at least understand that on a technical level (although it would be the wrong technical decision for exactly this reason), but this is actively wrong.