Nice. I will definitely be taking a look at this. Have you looked at the xformer...

danieldk · on Dec 20, 2023

I would've loved to look at xFormers, but I avoided looking at other implementations to make sure that ours is a clean room implementation.

Curated Transformers started as a very small library just for spaCy (spaCy 3.7 transformer pipelines use Curated Transformers) with just the older encoder models (BERT, RoBERTa, etc.). spaCy used Hugging Face Transformers prior for the provided transformer models, but we wanted something where we could easily hook into different parts of the model (e.g. for distillation).

After the functionality needed for spaCy was done, Matt @ Explosion encouraged us to extend it into a more general PyTorch library that would also support decoder architectures, generation, etc.