If you want to go with <1B model, you use a BERT which is bidirectional or a T5 that is easier to fine-tune on other tasks.