Could we crowdsource a speech benchmark? We start with one individual with a **t...

Could we crowdsource a speech benchmark?

We start with one individual with a trust level of 1 (a probability of correct work in the bayesian sense). All other contributors start with a trust level of 0.

Anyone with a trust level above MIN_TRUST (say, 0.6), called a trustee, can validate others' work. This status is dynamic, such that a trustee can stop being one, invalidating all its verifications.

Valid work is work that has a score above MIN_TRUST. Such work is included in the benchmark (with a possible added check, such as a lower bound for the number of votes received).

The score of a work is the lower bound of the Wilson score confidence interval for a Bernoulli parameter with a confidence level of 95%. Given `total` the number of votes from trustees and `valid` the number of votes that claimed this was a valid work:

    z = 1.96
    z2 = z * z
    positive = valid / total
    score(work) = positive + z2 / (2*total)
      - z * sqrt((positive*(1-positive) + z2/(4*total)) / total)) / (1 + z2/n)

The trust level of each contributor is computed as the proportion of validated work (by a trustee) amongst their work, minus the proportion of invalid work. In math:

    trust(contributor) = max(0, (valid_work - invalid_work) / total_work)

Each trustee may verify as many pieces of work as they have produced themselves. They receive work randomly amongst work that is not yet valid.

A piece of work can also be discarded if it reached a certain number of votes and has a certain low score.

In this case, each piece of work would be some text read out loud by the contributor.