>For example the string "3/4 cup risotto rice, Arborio or Carnaroli"
>It identifies the unit and quantity fair enough, and it's right when it says "Risotto rice" is the product. But then it says "Arborio or Carnaroli" is a preparation method. I can totally see WHY it says that. But I'm not sure that's correct.
Yeah, I'm constantly improving the model, but there will always be cases it gets wrong. For most of my customers, that's a mostly fine result, because very few of the clients actually use the preparation note field. Most care only about the quantity, unit, product, and USDA fields.
I'll fix the bad result on the risotto example, but it's definitely a game of whack-a-mole. There are so many variations and corner cases that it'll never be 100%.
>At your price point though, I might still use it. Ultimatley to parse everything I need to parse it would be a lot cheaper than spending even an hour writing it myself. But that was my first thoughts playing around with it.
Haha, maybe I should raise my prices. But seriously, I'd be happy to have you as a customer. If you'd like to chat more about about your use case, shoot me an email at michael@zestfuldata.com.
Arborio is a kind, just like Granny Smith, bing, or Key. I wonder if it'd be easier just to have a dictionary of kinds of ingredient and anything else is a preparation.
It's the kind of thing that seems like it's solvable with rules, but once you get into the weeds, there are too many rules and exceptions and exceptions to those exceptions.
In addition to prep notes, there's also text that most clients consider garbage. For example, "2 cups sugar, I use Sweetums brand!" the brand preference is garbage.
And then there are tokens like "pound" that could mean different things in different contexts (e.g., "1 pound flour", "1 chicken breast - pound till flattened", "2 cups pound cake mixture").
Years ago I built a recipe app for a client who wanted it to have a "smart" shopping list feature that could dedupe ingredients from disparate recipes, normalize their quantities and sum them up to friendly totals. I fell waaaay down the parsing rabbit hole battling those exceptions-to-exceptions, what a mess. Reading your examples gave me a flashback!
The app only had a couple of hundred recipes so we wound up just having someone manually translate every ingredient for every recipe into a common structure in a spreadsheet.
Anyhow, thanks so much for these posts, I quit my job a few months ago to work on a few projects of my own and I've really enjoyed reading your updates. Looking forward to more in the future.
Yeah, that's a little like how I fell into Zestful. Very early into my adventure, I made a keto recipe search engine and wanted to handle ingredients properly. It started with regexes then spiraled into machine learning.
> It identifies the unit and quantity fair enough, and it's right when it says "Risotto rice" is the product. But then it says "Arborio or Carnaroli" is a preparation method. I can totally see WHY it says that. But I'm not sure that's correct.
Simple solution: for your 'Preparation' just say 'Preparation or additional options' and the additional preparation can contain swap out products that later you can identify by brand name or common marketing name.
So when it says '"Risotto rice" is the product and the "Arborio or Carnaroli" is a preparation method. Changing that to 'Preparation method and additional options' then later parsing potential products helps it a bit.
>For example the string "3/4 cup risotto rice, Arborio or Carnaroli"
>It identifies the unit and quantity fair enough, and it's right when it says "Risotto rice" is the product. But then it says "Arborio or Carnaroli" is a preparation method. I can totally see WHY it says that. But I'm not sure that's correct.
Yeah, I'm constantly improving the model, but there will always be cases it gets wrong. For most of my customers, that's a mostly fine result, because very few of the clients actually use the preparation note field. Most care only about the quantity, unit, product, and USDA fields.
I'll fix the bad result on the risotto example, but it's definitely a game of whack-a-mole. There are so many variations and corner cases that it'll never be 100%.
>At your price point though, I might still use it. Ultimatley to parse everything I need to parse it would be a lot cheaper than spending even an hour writing it myself. But that was my first thoughts playing around with it.
Haha, maybe I should raise my prices. But seriously, I'd be happy to have you as a customer. If you'd like to chat more about about your use case, shoot me an email at michael@zestfuldata.com.