It's an accessibility feature for people losing their voice or on the verge of. And it is TTS only, not speech-to-speech as your mention of "can use it in phone calls and facetime" implies. Not being s2s means it doesn't retain vocal disfluencies, prosody etc signals that make a voice feel real