I wish they clarified whether this claim that humans have a 5.1% error rate is in "listen to this sentence once and transcribe it" or "study this recording however you like and transcribe it."
edit: They talk about this in the arxiv paper:
>The transcription protocol that
was agreed upon was to have three independent transcribers
provide transcripts which were quality checked by a fourth senior transcriber. All four transcribers are native US English speakers and were selected based on the quality of their work on past transcription projects.
>...The transcription time was estimated at 12-14 times realtime (xRT) for the first pass for Transcribers 1-3 and an additional 1.7-2xRT for the second quality checking pass (by Transcriber 4). Both passes involved listening to the audio multiple times: around 3-4 times for the first pass and 1-2 times for the second.
edit: They talk about this in the arxiv paper:
>The transcription protocol that was agreed upon was to have three independent transcribers provide transcripts which were quality checked by a fourth senior transcriber. All four transcribers are native US English speakers and were selected based on the quality of their work on past transcription projects.
>...The transcription time was estimated at 12-14 times realtime (xRT) for the first pass for Transcribers 1-3 and an additional 1.7-2xRT for the second quality checking pass (by Transcriber 4). Both passes involved listening to the audio multiple times: around 3-4 times for the first pass and 1-2 times for the second.