I'm seriously considering trying something like this just to see if it works. TCP would kill it though (maybe?). But maybe UDP plus some clever ffmpeg invocation...
If you went with the heavier-weight use of Asterisk, you could have softphones authenticated to Asterisk, but, no inbound calls from the softphones would be accepted. Only if 1 user sought to call another would Asterisk "dial" each softphone; which might be sufficient in terms of security (you would have to have a working SSH account to be called).
I think the main issue would be the buffering existing tools are likely to assume. You don't need more than a few hundred ms of latency for a phone call to start feeling really weird.