I have to agree - I like tinkering and hacking as much as the next dev, and I don't like to be negative, but... what is the point of AI/ML here, where a simple sounds trigger works just fine? What problem is it actually solving - I only see this approach causing problems.
Which makes me think that a simple trigger based on ambient sound level probably does the job... I suspect that many baby monitors work that way.
This also actually makes sense because I likely want to be alerted in case of noise in general rather than just cries, just in case.