One of the now-underdiscussed features of embeddings is that you can indeed use any existing statistical modeling techniques on them out of the box, and as a bonus avoid the common NLP preprocessing nuances and pitfalls (e.g. stemming) entirely.
This post is a good example on why going straight to LLM embeddings for NLP is a pragmatic first step, especially for long documents.
You can apply statistical techniques to anything you want. Embeddings are just vectors of numbers which capture some meaning, so statistical analysis of them will work fine.
This post is a good example on why going straight to LLM embeddings for NLP is a pragmatic first step, especially for long documents.