Yeah, the whole thing is probably easy to overoptimize. Any recommendations wrt to the software? My preference for these sort of things are maximally compatible/portable solutions based on simplistic file formats (I keep a lot of personal tracking data in csv, eg). Here I'm thinking something along the lines of local markdown files for the cards, a database for the sampling info and a web app that does the sampling.
I use Anki in a "straightforward" way. In the sense that I write the card using Anki's interface without using excel or markdown files.
For example, if I want to memorize a word I found while reading, I highlight it in blue in Apple's books, it goes to readwise, then I take it from readwise by copying and pasting from Apple's dictionary the translation or definition.
I am for the simplest things possible, the point is to learn/memorize and not to find the best pipeline. You may think that the optimal pipeline leads to optimal learning, but (1) what kind of difference are we talking about compared to the simplest method of building cards?, (2) if setting up the pipeline means having the pipeline as the goal (as in ML the platform and not the predictions become the goal) or abandoning the study because the whole process becomes too cumbersome, is that something we should aspire to?
One of my favorite novelists, Arturo Pérez-Reverte, when asked what software he uses to write his novels, replied: "I only use Word, but it's true that I don't know any other [software]".