I don't think the number is as important as the question of how would someone be expected to magically know which 10% is wrong and needs to be corrected?
This is a good point! (Hopefully) obviously, if we knew a particular claim was fishy, we wouldn't make it in the app in the first place.
However, we do do a couple of things which go towards addressing your concern:
1. We can be more or less confident in the answers we're giving in the app, and if that confidence dips below a threshold we mark that particular cell in the results table with a red warning icon which encourages caution and user verification. This confidence level isn't perfectly calibrated, of course, but we are trying to engender a healthy, active, wariness in our users so that they don't take Elicit results as gospel.
2. We provide sources for all of the claims made in the app. You can see these by clicking on any cell in the results table. We encourage users to check—or at least spot-check—the results which they are updating on. This verification is generally much faster than doing the generation of the answer in the first place.
This is true but if the error rate were 1/1000 I could see the risk management argument for using this thing. 1/100 is pushing it. 1/10 seems unconscionably reckless and lazy.