Developer of murph.live here again - after reading this thread I have some ideas I'd like to vet with everyone here.
1. We need to post links to the source of the stream. I neglected to do that for fear of cease and desist, but now I realize we need to create accountability on our own platform. I will be contacting broadcastify.com to ensure we can direct users to a source.
2. We need a disclaimer on the site directly in your face. I agree with everyone here - this could potentially spread misinformation and do more harm than intended. These transcripts should be read with caution. Additional messaging from us is a must.
3. We need a better acoustic model. Google is too much $$$ and although I'm an engineer, I'm not a linguistics machine learning expert. Can anyone help me with this please?
Our mission was to create transparency into our government - not cause harm. There is a lot of responsibility creating a tool like this and we want to get it right.
With that being said, this site blew up in a few hours. I'm overwhelmed. Please let me know if you'd like to help. Thank you to everyone for the feedback so far - it all helps immensely.
I have a generic English ASR model for ESPnet (https://github.com/espnet/espnet) trained on multiple various datasets and would be happy to provide it. If you send me few audio samples, I can give it a try. You can contact me pavel.denisov@gmx.de.
It might be worth checking out https://www.assemblyai.com they let you build a custom audio model. One challenge with the audio from these radios is that it goes through some heavy compression. Traditional models will have a lot of challenges. Give a system that uses analog audio a try. The quality of the audio is a lot better.
I doubt that the google speech model was meant to deal with AMBE compressed voice. I think you will need to create your own speech-to-text model to solve the quality issues.
1. We need to post links to the source of the stream. I neglected to do that for fear of cease and desist, but now I realize we need to create accountability on our own platform. I will be contacting broadcastify.com to ensure we can direct users to a source.
2. We need a disclaimer on the site directly in your face. I agree with everyone here - this could potentially spread misinformation and do more harm than intended. These transcripts should be read with caution. Additional messaging from us is a must.
3. We need a better acoustic model. Google is too much $$$ and although I'm an engineer, I'm not a linguistics machine learning expert. Can anyone help me with this please?
Our mission was to create transparency into our government - not cause harm. There is a lot of responsibility creating a tool like this and we want to get it right.
With that being said, this site blew up in a few hours. I'm overwhelmed. Please let me know if you'd like to help. Thank you to everyone for the feedback so far - it all helps immensely.