There's no need for the model weights and parameters. The probabilities output from a multilabel classification model would be entirely sufficient to tell what the model thought the user had done. It was trained on a category and it is claiming that the user is part of that category. That category should have a human description of what the corporate entity was looking for and that should be enough for some kind of arbiter to determine if the 'accused' was actually doing that or not. The question isn't what the model is doing, the question is if the person is a false positive or not, which should be able to be determined without any reference to the model at all. If the model is doing too many false positives then determining why the model is broken becomes the job of the corporation (ideally with some punitive financial damages as incentive).
Although I do think that the model weights and parameters should in some way be subpoenable so that the model can be 'compelled to testify' if someone has enough legal, technical and fiscal resources to sue the company.
Although I do think that the model weights and parameters should in some way be subpoenable so that the model can be 'compelled to testify' if someone has enough legal, technical and fiscal resources to sue the company.