It looks like you're using the recipe-scrapers library to scrape recipes which only supports a set number of websites.
If you want to expand that, I recommend parsing JSON+LD and Microformats. Given your parsers folder [2], it looks like you've tried it, but only for specific websites. I would make that generic and check whether the metadata is available on any website. I wrote a blog post on this if you're interested [3].
Howdy crap... I just created an account on your website and added one random recipe (cashew nut yoghurt) that did not work on the original post site, and it worked like a charm!
You've got a new paying customer :)
I'd been looking for something like your app for a long time.
Ough, your PayPal flow is not working :( fix that and you'll have a paying customer haha
I second this comment. While I am probably not going to pay for this yet (I dont have that many recipes), this site was able to scrape a recipe I cook often and put it into a format that is much better than the original blog post.
The scaling and editing recipe functionality is top notch.
Ill probably use this tool now.
Thanks for sharing! I also want to make the JSON+LD stuff generic, but I have found that there are sometimes different renditions of that format. Though, now that I've looked at it, I only have 1 example of something non-standard, which doesn't include the @graph directive.
So that just requires some more research and testing. Perhaps someone enterprising will read this and make a pull request...
Saffron looks great, I had encountered it before building this for myself. Your blog post is quite illuminating - perhaps the first practical application of LCA that I've seen outside of an interview setting :)
It's a shame that Saffron provides neither on the published recipe pages. If I share a recipe with someone, they might want to import it into a different app.
I'm a paying customer with about 240 recipes imported. Saffron works very well overall and I recommend it, though I did have to do a lot of hand editing on some of the recipes.
Your app looks awesome. I've been thinking about a way of putting all my recipies in one place for a while, well, this looks like the kind of place I've been looking for.
Years ago, I wrote something like you describe in the blog (regex to match ingredient lines, looking for imperative verbs, filling in the gaps). Recently I revisited the subject and learned that almost everybody has decent jsonld data now. Even paywalled stuff.
Now I've got tampermonkey watching over my shoulder and backing up everything I look at to a couchdb instance. (Still gotta write some UI and an agent to pull down images, but I've got other irons in the fire at the moment.)
Hi! I built this! (Surprised to see it on the front page as I didn’t get much traction when i first submitted it :)
Anyway, all of you have a lot of neat suggestions! Please do take a look at the “contributing” section of the repo and let me know if you’d like to pitch in!
Hi memset, your project made me create an account and comment for the first time on HN. :)
Very nice to see your repo and happy to see it being so minimalistic. I use paprikapp[1] to manage my recipes and you can import recipes into paprika using yaml[2].
I used to be a customer of hellofresh (Germany) and had to manually copy paste into Paprika. I recently made a tool to just input a HelloFresh URL and output a yaml file which can be used to import the recipe into paprika into images. [3]. Maybe I can open a PR and add support for HelloFresh recipes.
From what I've observed on other sites (mostly StackOverflow and Reddit) I think there are a few major components to this, namely: timing of post, luck / randomness regarding early upvotes, and other posts during that day.
The last one is a bit like when a film releases on the same weekend as some blockbuster, it is more likely to go under the radar. The middle condition is mostly luck, unless someone is prepared to manipulate the early upvotes somehow. But the first one is quite easy to time correctly - US-heavy sites tend to have a few time slots where a disproportionately large number of people check it. My guess would be morning, lunch and afternoon slots, and especially slots where different time zones overlap. E.g. an afternoon slot for Europe which overlaps with the Eastern seaboard doing a morning check of HN might work quite well, etc. This can help a fresh post break through the decaying, but highly upvoted older posts that are keeping it from more visibility.
I've always wondered why recipes have the ingredient list and quantity separate from the instructions. I often have to scroll up and down (even on a mostly-decent recipe site like all recipes.com) first to see what gets added next and then to see how much of it to add. Why not tell me it's one teaspoon of salt in the same place as you tell me to add the salt? Only advantage to separating them is to make the shopping list, but no reason one can't duplicate the quantities.
It's a comparatively recent idea. It only appeared in the 19th century, with the Fannie Farmer Boston Cooking School cookbook. It introduced the whole idea of exact measures, rather than "a lump of butter the size of a walnut" and "enough to make a stiff dough". Before that, recipes were told like stories.
It was a scientific way of cooking: gather and measure all your ingredients before you start. In a commercial kitchen you still do that: you go to work hours before the doors open to put everything in place (mise en place). That's how you get reliable results.
Even if you don't gather stuff, a good home cook will still scan the list to ensure that they have what they need. Still, It wouldn't be a bad idea to also replicate the measurements in the recipe itself. Perhaps they have the coder's instinct to not duplicate information.
Great British Chefs [0] sort of does this - they have both a full list of ingredients with quantities and also at each recipe step they list off the ingredients with quantities used in that step.
We have our recipes in a Trello board for meal planning, grocery list making, etc. I wrote a Python script to extract the data and print out an oft-used subset of them in a nice readable format. I use a Jinja2 template to format the pages with HTML and CSS, then convert them to one PDF document for printing.
I use an 8.5x11 page with metadata on one side (name, description, time to make, tags, dates made, etc.) and the recipe on the other. On the recipe side I have ingredients at the top of the page in two columns, and then the directions at the bottom in two columns.
We put this page in a plastic sleeve and into a 3-ring binder. When we want to make the recipe, we take it out and tape it to the cupboard with masking tape. That way anyone in the kitchen can easily see it and prep their part.
Funny, after trying out all sorts of digital recipe organization methods, I've settled on plain old 4x6 index cards as the most useful.
I have all of my recipes plaintext in an org file with a very simple format...
* Recipe title
** Ingredients
- 1 tsp x
- 2 tbsp y
** Directions
1. Mix together x and y
2. Bake at 350 for 15 minutes
I set up a simple python script using the same package as the OP's website for scraping recipe sites to org format, then I export subsections to latex with a custom class and print to index cards.
My site (the OP) uses a separate print-specific CSS to try and format recipes to a 4x6 index card. It doesn't work as well as I'd like, as in, it's hard to get what shows up on the screen to print on an odd-sized piece of paper, but if you, or someone you know, has CSS expertise then I'd love for someone to help make this work more reliably!
I save my recipes in plain old text files. For a long time now I've been meaning to write them up in LaTeX and print them on A6 cards with a nice font. Then construct a nice wooden box to put them in.
That might be coming from the tradition of mis-en-place - “everything is put in place”. The idea is that you prepare all the ingredients and have them in front of you before you start cooking. Oftentimes cooking process requires precise timing and parallel execution and you will not have a time to search for an ingredient, measure it, or prep (dice, wash etc).
unfortunately the recipes never include the steps required to prep the things. There's the ingredients, and then halfway through the recipe there's the "now add the chopped carrots" ..."WHAT?!?! you never told me to chop the carrots!"
drives me nuts. I have to rewrite any recipe i want to keep making to actually have all the required instructions.
Usually though, FYI, they specify 'chopped' in the ingredients list. But that leads to the equally-infuriating "Only takes 5 minutes of prep!" but the ingredient list hides the fact that you have to make a pie crust or chop two pounds of veggies not included in that estimate.
I've found it useful for knowing if you have enough of each ingredient. If they weren't separate, one of the first things I would need to do would be to sit down and work out what ingredients I need to know if I need to buy more or not.
It annoys me greatly when the ingredients list is not organized logically.
When I write recipes I group the ingredients by step. So for example I might have a "Sauce Ingredients" block, and the instruction will be "add sauce ingredients to the pan". It makes mise less annoying, I don't have to keep scanning the instructions to figure out how to organize my ingredients.
The easiest way to cook is to have lots of small bowls for the ingredients. Each ingredient gets measured out into a bowl and put aside. Then when you need it, you dump the bowl into the dish being prepared.
This is assumed in all the time estimates, as well. Time spent getting the ingredients into the bowls isn't counted, so if something calls for 3 chopped onions you have to add the peeling & chopping time to get the real prep time.
I don't know, this is overkilling it. As a Mediterranean, of course you would prep every ingredient beforehand and then just follow the recipe. Mise-en-place or not.
Cooking for Engineers.com puts a compact table in the recipes, which shows the list of ingredients and the order of preparation and combination. I think he calls it "Tabular Recipe Notation" or something like that.
It took me a moment to grok the format, but once I did I found it to be really helpful.
And if it's on a computer instead of on paper, there's no reason you can't tag the ingredients inline, and automatically generate your shopping list from the tags.
This is great and works, for the supported sites, remarkably well.
One major flaw: it seems like the calories and macros aren’t captured. For bodybuilding and powerlifting types, and other athletics, these are the most important part of a meal.
What would make this a “killer app”, in my view, is if I could request a recipe in JSON instead of just formatted plain text as you do. Then I could use the recipe (and recipe search) in my own home-brewed meal planning program.
Very nice insight. I might even go a step further and archive the entire page[1]; hard drive space is cheap, and how many recipes is one person going to save, honestly? 1-2 LOCs worth? Then you can just parse the content you want, with the ability to drop down into the original page as you first saw it.
As a person with better visual memory for certain kinds of data, having the original page content may have as much meaning as the recipe, for entirely different reasons. Food can be very personal, and recipe books doubly so. A recipe archive can be as personal as we like, or all of that can abstracted away when we don’t need it.
I meant Library of Congress, an anachronistic visual metaphor related to data storage from the 1990s, perhaps earlier? In 2012, 1 LoC was roughly equivalent to ~3 PB (petabytes).
I remember first seeing it on Slashdot way back in the day, before they had user moderation or user meta-moderation.
If you save the source document, the code needed to parse your recipe archive is likely to be pretty short. Then you have a corpus to do A/B testing of your recipe parsing code against.
Side note: I feel that moderation and later meta-moderation system on Slashdot was the most transparent, fun, positive moderation system I’ve ever been part of. I wish HN had more than just up and downvotes, for instance. User meta-moderation would help reduce flamewars immensely IMO.
My quick and dirty trick for finding the actual recipe in a long blog post:
Ctrl+F "print".
Probably 90+% of websites use some tool to format the actual recipe, so the actual recipe pretty consistently includes a link for printing it. Search for the word "print", and it takes you to the part of the page that has the recipe.
(You don't actually have to click the link. It's just a landmark for navigation within the page.)
Neat! Now make a website that instantly electrocutes or at least painfully shocks people that publish recipes as Instagram stories! The accumulated anger I have after trying to follow those will mean a hefty donation from my side :)
As an aside: I live in a jurisdiction where recipes cannot be copyrighted, so I have collected all recipes I remotely liked on a web page with only text. All recipes except some of the very last ones "untested, potentially disgusting" headline are in Swedish though: https://koketteriet.se/skrivet/Recept/recept.html
We can still just copy the how and what, and not the accompanying epic novel :)
I think I managed to give credit where it is due (in the intro, at least), but about 60% of the recipes are things we veganised ourselves from whatever old family recipes we have.
Anyway, I find, as I rapidly approach the big four-oh, I don’t need recipes much anyways, so I have a sleeve-book with some recipes, some pages are just dish names, and others are flavours or ingredients that go together.
This is fantastic! I'd love it if I could pair this with an RSS aggregator to get new, plain-text recipes in my RSS feed every day/week, etc. Right now I get an RSS feed of my favourite recipe sites but have to get through so much junk to see the actual recipe.
I recently went down the rabbit hole of recipe websites while I built a little side project Shopify app for creating recipes on ecommerce stores[1]. Having a plaintext version has been on my backlog to-do list forever, but it seems the vast majority of store owners aren't keen on it. It's been a massive learning experience for me; and I never realized how... bad the recipe website experience was for so many people.
Question: how do you think we, as in us as the people building the current web, can improve the standard recipe display on the web? Obviously removing the 1300 word novel before recipes is a big plus, but what else do you think would improve your day-to-day recipe browsing?
As this only works on certain websites, that implies you're not pulling from something consistent e.g. structured data, which the vast majority of popular / visible websites should use
I recently wrote a tiny site to do exactly this! It converts ingredients from volumes to weight. Check it out at https://bready.io/. Would love to hear any feedback ya'll might have!
I imagine you all have also greatly increased the amount of baking you're doing while staying safe at home.
Until recently, it was a lot easier to lay your hands on a decent set of measuring cups than a good electronic balance. The spring ones weren't very precise, and weights are easy to lose and make a mess. Measuring cups are cheap and simple, and doubling and halving recipes (the most common operations) is elementary-school math.
Even European recipes still measure small amounts for flavors, leaveners, etc. in volumes. A scale that's precise to sub-gram levels is more expensive. So it's not as if you're not familiar with the concept.
Now that electronic ones are cheap it's clearly the better choice, especially for flour, but that's only in the last decade or two. People are switching over to scales, but the volume measurements are perfectly adequate and they're what our existing cookbooks call for.
I'm not American (Canadian living in Germany) and I have to say that I find using cups/spoons is much easier than pulling out a scale and weighing everything. Just grab a measuring cup / spoon (every house has a set of standard sized measuring cups and spoons) fill it up to the right amount and dump it in.
Cups work great for certain things, such as a cup of water, sugar, flour, or peanuts. It gets difficult in other situations though, like trying to measure a cup of spinach, strawberries, or ice, because the measurement greatly varies depending on how the items are packed or arranged in the space.
Using a scale is even easier, especially for things like bread, you can just pour ingredients directly into your bowl until you hit the right amount, zero it and move onto the next thing. No guesswork of trying to fill a cup, no separate measuring device needed, just pour and tare.
It also solves to many issues with ingredients that will just never have equal density, flour variances, packing brown sugar by hand, etc are things of the past.
Another great spot is stuff like butter where you wind up cutting off different sizes all the time, and always have a few assorted scraps kicking around, using a scale solves that, just pile scraps up til you get the right amount.
Just use a german measuring cup which has like 30 different scales for everything you might want to measure in volume and weight, e.g. https://www.amazon.de/dp/B01BAU2N62/ ?
That’s what we commonly use which is easier to use than just weighing everything, but still just as accurate.
Not every house has a set of standard cups and spoons. We leave a scale on our counter and when were mixing ingredients, we put the bowl on the scale, add and tare.
It's quite easy to fill a cup with apple slices. You won't get an accurate, reproducible measurement this way, but the procedure itself isn't difficult. If accuracy isn't that important, then it works okay.
Australians, Indonesians, Germans all measure recipes with two kinds of spoons (!). Australians also use cups (that are 10-12 mL bigger than American ones); other countries may too.
I wonder if the ideal country where no-one uses units of volume other than litres and millilitres actually exists.
The really bizarre units of volume are certainly the ones I've seen in Europe - where you get measures marked in "grams of flour" and "grams of sugar". I suppose no-one has ever asked for a hundred grams of sugar of flour, but I'm tempted to every time I see one of them.
"Grams of flour" isn't volume, it's mass. You stick your mixing bowl on the scale, hit "tare", and add flour until you hit the right mass. Then hit tare, add the next ingredient until it hits the right mass, etc. Much more accurate, which is needed for good consistent baking. Also much faster.
Because most Europeans use weighing scales rather than volumetric measurements. When I'm baking I put the bowl on the scales and pour in the ingredients. When recipes say 1tbsp or 2tsp it gets me annoyed because if they had converted to grams I could just pour those ingredients straight in too but instead I need to get out my volumetric measures (or in most cases just guess and hope it doesn't matter too much)
>When recipes say 1tbsp or 2tsp it gets me annoyed because if they had converted to grams I could just pour those ingredients straight in too but instead I need to get out my volumetric measures
Are kitchen scales accurate enough for such tiny measurements? 1 teaspoon is 5ml, so if your scale only has 1g of precision you're looking at up to a 10℅ variance (assuming you're measuring water), and that's assuming its accurate at that range. It gets worse when you need to deal with fractional teaspoons.
> Are kitchen scales accurate enough for such tiny measurements
Yes, and there are lots of recipes that use mass rather than volumetric measurements. For solids, in fact, many sources prefer this because consistency of packing density and other user factors that effect consistency with volumetric measure are far worse than with mass measured with a kitchen scale.
Well, the conveniently indicated grams on the packet are not units of weight - they're units of volume. So you've got two kinds of units of volume in your recipes: milli/centi/decilitres, and grams of butter. But it's fairer than the grams of flour I've been ranting about elsewhere here because at least the manufacturer can have some responsibility for fine tuning it to their product!
They are the SI unit of mass. He's just saying that because it's indicated by lines on the package that the actual measurement is done in volume, and the conversion is implicit by the scaling. Where he goes wrong is in thinking that it's implying that "grams of butter" is a unit of volume.
Well, tell that to Europeans, who don't use millilitres to measure their flour or sugar by volume - they use bizarre units of volume called "grams of flour" or "grams of sugar". Check their cup measures! It's crazy.
Apparently it works perfectly fine for household cooking to use units of volume for flour and sugar. Close enough is good enough!
If you need a particular amount of flour, you can't express it in volume, since 250ml of flour may be too much or too little depending on how tightly the flour is packed. For reliable baking, measuring flour by volume is too inaccurate, you need to weigh it so you know how much flour actually went into that cup.
This is awesome. But perhaps this product is better packaged as a browser extension. So when you browsed a recipe page, the product would turn the page into a plain text recipe.
I'd rather not install an extension (trust) and not all browsers I use (e.g. on mobile) support extensions. You can use a bookmarklet pretty much anywhere though:
Thanks for the great idea. The code got broken in the comment, but was easy enough to reconstruct.
Another bookmarklet I use is PrintFriendly. It uses some of it's own js, and also works well on recipe sites - giving more options for what to keep/remove.
Recipe websites are the worst of web design. To make things even worse they are mostly used on mobile where bloat/advertisements/janky loading is even more of a pain.
>Not to mention the 1200 word partial life story preceding every recipe.
Everyone I know balks at this, yet it exists. Can some explain why? Where is this tradition coming from? Does this drive page views? I know nothing about the cooking subculture. Is it some sort of normies vs "really into this" thing (and please excuse the phrasing, I could not come up with something better). Because that would mean the page is just not targeted at me. But then again that argument will make sense if the monetization was not through advertisement and view volume.
I didn't realize how curious I was about this.
Edit: just had a thought. Can it be something to do with SEO? Does google like bigger articles with lots of personal text? If this is the reason it would blow my mind. Without knowing it google algo would have made the internet a shittier place, this is very amusing to me.
It's mostly SEO and "engagement" but also possibly copyright law (at least that's the case with cookbooks that include non-recipe fluff content)
From the US Copyright Office:
"Mere listings of ingredients as in recipes, formulas, compounds, or prescriptions are not subject to copyright protection. However, when a recipe or formula is accompanied by substantial literary expression in the form of an explanation or directions, or when there is a combination of recipes, as in a cookbook, there may be a basis for copyright protection.”
Adding your life story into your recipe might make it eligible for copyright, but what's the point? Someone can simply strip out your life story and then they'll be allowed to copy it.
With the amount of money an average recipe blogger can make from display ads inserted in those stories, they really and truly don't care if it annoys John Q. Public.
Double edge knife here, but reviews can be very helpful, when the audience is positive they can provide good alternatives/substitutes, ideas to increase/reduce this or that ingredient to achieve a different flavour, etc.
gotta try and hide from the Google robot that you're just a content farm monetising other people's copyrighted content somehow. you'll note the recipes are always "adapted from" somewhere else with a minor change.
You might think that - but those pseudo news sites with random "one trick" type crap whose page elements keeps moving around are worse.
I have the opposite problem trying to get more text on our recipe pages - I am sure some designers would design a recipe e page with no text if we let them.
I am eagerly awaiting for something like that can happen in coming days. There are lots of things like photos, video, ingredients, nutritional value, serving tips, recipe notes and ofcourse structure data in my https://hassanchef.com
Very interesting. I'm starting now a project to organise my recipes. With https://github.com/domchristie/turndown it transforms the html from clipboard into markdown, they result is really good and it's generic to all recipes.
Of course you don't have structure of the recipe and it does not simply work with a url.
Neat idea! The fiance and I have taken to simply printing recipes on paper. It has the benefit of eliminating a lot of the cruft that the website has - condensing the printed material to just ingredients and instructions.
And makes it really easy to reference a week or two later if we want to remake it - just pull it out of the binder.
I've tried that. After the recipe binder got to be a couple of inches thick, it would take more discipline than I have to keep it organized in some kind of effective way. Plus I don't know what way that would be. So I pretty much end up either looking at the laptop while cooking or printing out a new copy each time.
I do often wonder if many recipe sites ever really think about the user experience. It's almost always going to be on mobile and there's either got to be a simplicity to the presentation (which is what this does brilliantly) OR a really simple way to flick between ingredients and method.
The one that stands out for me is Jamie Oliver (example: https://www.jamieoliver.com/recipes/vegetables-recipes/veggi...) - on mobile there's this super simple tab that flips you between ingredients / method. It's simple, but exactly what you want, especially when you're covered in flour / oil / etc... :-)
I've been using Paprika app for years, and I would reccomend it.
As well as importing from websites, you can enter you own recipes. And my wife and I can sync between our phones and laptops so we can follow and edit the same recipes. Very handy.
Next stop: the holy grail of recipe website cleanup, removing the mindless drivel about the author's.. whatever / whoever, their inane anectodes, and generally any trace of their annoying personalities (here's an offending example: https://www.ibreatheimhungry.com/easy-roasted-pork-shoulder-...)
I was already excited when Apple showcased an Extension for the new Safari in their keynote that did something similar.
I wonder if any of these websites will find a way to prevent these recipe scrapers from working so that people read the damn ads in their blog texts. Instapaper‘s article extraction gets rejected from time to time and Safari‘s reader mode too, but it’s mostly on major news sites.
This is really cool, I've been building a site to help build templated recipes with markdown and javascript and I'll definitely be using this when I transcribe recipes!
I want a tool like this for any website, that intelligently extracts just the useful part as plain text. Reader view and so forth just don't work that well. Are there any tools around like this?
If you want to expand that, I recommend parsing JSON+LD and Microformats. Given your parsers folder [2], it looks like you've tried it, but only for specific websites. I would make that generic and check whether the metadata is available on any website. I wrote a blog post on this if you're interested [3].
source: I've built a very similar tool for my cooking app: https://www.mysaffronapp.com/
[1] https://github.com/hhursev/recipe-scrapers
[2] https://github.com/poundifdef/plainoldrecipe/blob/master/par...
[3] https://www.benawad.com/scraping-recipe-websites/