Arborio is a kind, just like Granny Smith, bing, or Key. I wonder if it'd be easier just to have a dictionary of kinds of ingredient and anything else is a preparation.
It's the kind of thing that seems like it's solvable with rules, but once you get into the weeds, there are too many rules and exceptions and exceptions to those exceptions.
In addition to prep notes, there's also text that most clients consider garbage. For example, "2 cups sugar, I use Sweetums brand!" the brand preference is garbage.
And then there are tokens like "pound" that could mean different things in different contexts (e.g., "1 pound flour", "1 chicken breast - pound till flattened", "2 cups pound cake mixture").
Years ago I built a recipe app for a client who wanted it to have a "smart" shopping list feature that could dedupe ingredients from disparate recipes, normalize their quantities and sum them up to friendly totals. I fell waaaay down the parsing rabbit hole battling those exceptions-to-exceptions, what a mess. Reading your examples gave me a flashback!
The app only had a couple of hundred recipes so we wound up just having someone manually translate every ingredient for every recipe into a common structure in a spreadsheet.
Anyhow, thanks so much for these posts, I quit my job a few months ago to work on a few projects of my own and I've really enjoyed reading your updates. Looking forward to more in the future.
Yeah, that's a little like how I fell into Zestful. Very early into my adventure, I made a keto recipe search engine and wanted to handle ingredients properly. It started with regexes then spiraled into machine learning.