Data moats?
Data not on the web, both for the first pass of training [1], and for fine tuning / RLHF [2].
Maybe compute would be a moat...can you run it in inference cheaper? Conditional computation, very quantized models. Probably training compute is not a moat however.
Not sure what SOTA is on callouts to oracles (e.g. calling out to Mathematica or some database) during inference, but maybe if you could use.
I would look at BloombergGPT for inspiration on a SOTA vertical GPT [3].
Probably speed to deployment & integration & sales would be a moat too, as once you have customers using your stuff your competitors will have more trouble selling to the same folks.
[1] In the recent interview on Lex, Sam Altman referred to "data from partnerships".
[2] Probably very reviewed by human raters.
[3] https://arxiv.org/abs/2303.17564