Yes! You can upload a corpus of text that the service will learn new words (and their context) from and/or you can tell it it specific words and their pronunciation. No audio samples are needed, the customization works on the existing language models.
If you use Kaldi you can mix any type of domain-specific texts, it usually improves accuracy significantly, particularly for technical domains. You do not need audio for that.
Is there a way to feed some kind of (text) dictionary to aid recognition? Or does it also need audio samples to learn from?