Hacker News new | past | comments | ask | show | jobs | submit login
Zapier Email Parser: Extract Text From Automated Emails (zapier.com)
228 points by bryanh on March 4, 2014 | hide | past | favorite | 85 comments



This is a killer feature for Zapier and i'm really excited. Zapier has been a huge time saver for us at BetterVoice.com.

How many times do you get approached by customers or other vendors about "when are you going to integrate with XYZ" product? If you just integrate ONE time with Zapier, you're immediately connected with over 200 other online services. Then your answer to those customers/vendors can be: "We're integrated with Zapier, so you guys should integrate with them too and you'll get all the other benefits as well". With that answer you accomplish 2 things: 1) you shift the burden to them (2) you can use Zapier as a "testing ground" to see how popular an integration with a particular service is. If one particular Zapier connection starts taking off, then you can consider a direct integration. If only a few people use a particular service connection, then you haven't lost anything.


For some reason, this comment really read like a marketing testimonial.


well they have a vested interest in discussing their product/company, and given their association with the OP, they chose to describe it in a positive way.

this is what a testimonial would be if testimonials were invented in 2014


as a former freelancer whose work gross last year came from doing Zapier integrations for vendors (some currently public catalogue services, others private), i can subscribe to sunsu's comment as entirely accurate, given that the problems i was asked to solve and was payed for, covered that nature of what was described and more.


If you want infuriating marketing, try Googling for an integration between two obscure web services.

The Zapier "Zapbook" automatically makes a page for each possible combination of the services they hook into, giving useless bullshit like https://zapier.com/zapbook/ducksboard/aim/ (which, by the way, isn't the same as https://zapier.com/zapbook/aim/ducksboard/ either).


If you're Googling for an integration between two obscure web services, doesn't that imply you have an idea for how you want to use the integration between them? Finding out Zapier will allow you to seems like both useful and intelligent marketing.


Except that's not what's offered.

https://zapier.com/zapbook/drupal/zapier/

"Vote to connect Drupal and Zapier" brings up "Sorry! This App Isn't Available Yet". No integration is possible. It's a useless search result.


Fair enough, I see what you mean with that example. Does that only happen if one of the pairs is Zapier or are there others where it does too? Your other example seems to work, and in the comments they claimed it's used too.


Not just Zapier. For example, as far as I can tell, Zapier doesn't support either Drupal or Netsuite (neither is on their list of apps), but this page exists: https://zapier.com/zapbook/drupal/netsuite/


Zapier lets you connect apps together in interesting ways, whether or not it really makes sense to. Maybe someone out there will find it useful to be able to manage their dashboards with instant messages.


Indeed, we actually have users doing both of those examples! (None have opted to share their zaps, so the parents' links remain barren.)


Try this one on for size, then: https://zapier.com/zapbook/drupal/zapier/

"Sorry! This App Isn't Available Yet" spam is a frustrating thing to find when looking for converters between systems.


If you're looking for an open source alternative, please consider my Huginn project, which now has quite an active community.

https://github.com/cantino/huginn


Very cool. I have been wishing I had a system like this for a long time.


+1 for Huginn. It's amazing.


Happy to answer any questions! This tool has been in use for many months by some select Zapier users and we decided to finally release it. I definitely want to open source the core extractor bits and document the REST API that powers the Zapier integration.

We have more information on using it in Zapier (the main use case at the moment) here: https://zapier.com/zapbook/updates/308/introducing-zapier-em...


Very nice! This looks very useful and easy to use.

How much boilerplate text do you need on either side of a token in order to identify it? Put another way, how much can the template emails vary? If the template format changes is there any sort of notification? Did you use the simplest implementation that could work or is this much more complicate than it looks?


It can actually get pretty complex, the technique we're using is a wacky hacky hodgepodge of Google diff-match-patch that works surprisingly well! If you run into any that don't work, just let us know and we can add it to the test suite and figure it out.


We've got some particularly complicated html RegEx for Email parsing at our company. We manually write new ones for new email layouts as we get them. I'd be interested in any information on how you're solving the issue, as I love how you've tackled it at least on the UI end.


Sure! In very broad strokes:

First, download yourself a copy of Google's diff-match-patch.

Second, make a template for the email you have (think "Your shipment will be delivered {{date}}. Thank you!" vs. the original raw email "Your shipment will be delivered 2014-04-04. Thank you!").

Third, run it through diff-match-patch.

Forth, walk over the change tree and record the insertion (1), a deletion (-1) or equality (0) transformations (one as keys the other as values).

(There are a lot of edge cases to handle between the forth and fifth step, but test cases make those pretty obvious (if not very frustrating.)

Fifth, collate the keys/values into a dictionary and do some last minute cleanups.

We will be documenting a REST API so you can use parser.zapier.com directly, and it is pretty easy to forward emails automatically to our robot (so you can conceivably avoid writing anything at all and just use the app).


> If you run into any that don't work, just let us know and we can add it to the test suite and figure it out.

How would you prefer we contact you? There's no contact info on the site.


Any hints on how I can process datetimes with non-standard formatting properly? I've got mails where the date is formatted as dd/mm/yyyy, which causes Google Calendar to create an event on the 3rd of January instead of the 1st of March.


If you can, try setting each portion of the date as a different field for the parser to split out. Then in a zap you could re-assemble the date in the right order so GCal can understand it.


Good job, mate ! I love it !


Good luck Zapier. I've spent many an hour on this exact problem.

If the templates don't change at all (no ads, nothing contextual or optional) then this is possible, but in my experience emails have a surprising amount of variation when trying to do stuff like this. Paid with an online check this time versus a credit card? Billing address, last 4 numbers, etc might now be gone, which totally messes up the extraction.

Hopefully though people will be using this for more niche tools than Amazon receipts.


It's difficult, but not impossible. My university has a spin-off company, which does parsing of documents. One thing that they do is parsing of CVs. You give it a pdf file of any CV in any format and it will convert that into a machine readable format. Obviously that requires domain knowledge, but it's possible.

I don't know how Zapier works, but it is possible that they do some kind of fuzzy matching, that is robust to those things.


Can someone share a use case?

It seems cool for... something.

edit: thanks all, very interesting.


Sure, so whenever someone cancels their Planscope account I send myself an email with their name, email address, LTV, plan, and cancelation reason (this is also stored in my database.)

With this, I could simply CC that email to Zapier, they'll yank out all that info and shove it into a Google Spreadsheet, which will let me do certain things that would be a pain to program myself.


I don't understand.. if you send automated emails to yourself in a human-friendly format, why not just use a Google API to push the same data into the spreadsheet at the same time?

Zapier may be really cool, but it still seems like you're taking raw data, converting it to human-readable form, then sending that to Zapier to attempt to extract the raw data again. Or am I misunderstanding?


The difference is that you have to do new work every time the destination of the data changes. Using an interface to say "here are the relevant parts" is a lot nicer than having to code it yourself.


Or just push your entire object to google spreadsheet: https://github.com/firmafon/to_google_spreadsheet


you will have to change the zapier parser template as well when the destination format changes.


That's clever! I had never thought to CC/BCC into our parser...


Brennan, thanks for sharing your use case. What kind of things would you do with the parsed info in Google spreadsheet?


Contacting users and offer a discount maybe?


I am playing with it now. I want to have my EZPass (RFID auto tolls) statements sent to my inbox. Right now I get an email with a url, then I have to type a PIN code, just to see my monthly statement.

I am too lazy to do it every month, but it would be nice the statements just appeared in my inbox.


Can it fetch data from URL and parse it? The landing page doesn't mention this.


you can use kimonolabs.com / import.io for HTML parsing


Please publish this if you create it. Or somehow launch this into a service. EZPass emails are the bane of my existence because of how little information they give me versus how much I believe I have a right to know.


Say you want LinkedIn messages to import to your CRM but they don't offer integration. Zapier the email notifications for your application. It's API by force, if companies don't let you in the front door just use another entrance.


One way we've seen it used is to parse out info from receipt emails into a spreadsheet or expense tool.


awesome use case.

how about -

1. Creating your own support desk 2. Automatically adding tasks to a calendar 3. Web scraping 4. Basic data collection


This is exactly what I'll be doing.


I've been thinking to develop a simple idonethis & Plivo integration where I need to parse the daily digest email and push the selected text as input for the text-to-speech app built using the Plivo API.

This is more than awesome for my use case.

[edit] Incidentally has anyone found it rewarding to "hear" what you get done for the day? (another pleasure cream added on top of checking items off your to-do list.)


Congratulations Bryan for launching it!

We are doing the same at http://mailparser.io and I can confirm that there is a real need for a solution like that. A lot of our existing customers use mailparser.io in combination with Zapier. Those customers can now directly use the parser of Zapier. Which sucks for us but which is surely great for the customer and Zapier ... :-)

One caveat though, a lot of use-cases are not the static "contact form" email where nothing moves except the values. We get a lot of requests for parsing lists, tables etc. Curious how the chosen approach works on this kind of parsing jobs.


How would you parse lists and tables from a website? e.g. parsing an online schedule in order to use the data in a responsive mobile or native app?

Random site example: peoplewhodance.net/events/NYE/schedule.html


You might look into http://www.kimonolabs.com/.


This seems really useful but I can't think of an example usage. Can someone share an example where this is useful?


"Set up receipts@example.com as Google hosted email, automatically parse all Amazon/SaaS subscription/etc emails and update the bookkeeping system" would already save me (or, well, my bookkeeper) 5+ hours a year.


Shouldn't this kind of notifications be handled with the Paypal / Amazon API directly for more reliability?


These are business expenses rather than things I am selling. As such, they're on my corporate credit cards. Regrettably, those don't have API access, and the best way to get information from them is currently "Log in once a month, dump a PDF file, and email it to the bookkeeper."


As a tech guy, I always feel extremely guilty about it, but my Italian accountant is cool with me dumping a big pile of receipts and papers in his office every few months. Even US accountants are often more or less ok with this, even if it makes me a bit sick to my stomach.


> As a tech guy, I always feel extremely guilty about it

As a tech guy, business people routinely dump big piles of raw data in my office every few days. I mutter a bit about how this is why we invented databases and then laugh all the way to the bank.

And I guarantee you the business people don't feel at all guilty about it.


We have been using the feature for a while, it has become an integral part of our system. The Zapier team is great and always willing to help out with any issues we have, above and beyond what you would expect.


This is incredibly clever. Zapier has already built a fast-expanding platform to relate structured business data, and now is taking steps to grow it even further by structuring unstructured data. SaaS mainstays, watch out.


If you think this feature is cool, I would encourage you to take a look at my company: Taskflow.io

It's basically a visual drag-and-drop tool to automate business processes.

You can drag-and-drop many different actions together (including email) to automate and track many processes you organization might have.

I would love to hear your feedback as we are just getting off the ground.


I just noticed Google started doing something like this. I got Google search results about purchases I'd made from Apple. They parsed a receipt from my Gmail account and displayed it in a card in my search results.

I was a bit taken aback by that.



They do parsing as well.

I forward all my work flight itineraries to Gmail (so I get Google Now notifications) after cut & pasting them out of the Word document I get them in.

Gmail parses them and displays them using the "Flight details" format, even though there is none of the information required for the "Actions" thing linked above.


Gmail doesn't do any parsing, Gmail provides the SDK described here https://developers.google.com/schemas/tutorials/google-now-c...

When developer / brands create emails following Gmail quick action guidelines, and you have Google Now enabled, then you will see Google Now cards.


Did you read my post?

I cut and paste plain text out of a word document and send it, and that triggers the Google Now cards.

It's parsing


I remember, they also parsed my AirBnb reservation and rental car receipt from my email. They then showed it in Google Now on my mobile phone. However it wasn't very useful, they had some parsing error, iirc.


It seems like an awesome feature. Just I don't like the landing page... If patio11 wouldn't have shared on twitter saying how awesome it is, I would have closed without caring too much - just one more thing on The Internet I simply am not bothered with.

The explanation, the first paragraph simply doesn't really give a clear idea on what it does and what's the benefit for me, and, imho a random user is not really motivated to try and figure it out.

I believe a simple nice diagram

[email] -> [zapier] -> [internet]

with some example integrated in it would do a much better job.


Could anyone more versed in being a sysadmin comment on using this for logging?

I know there are things like logstash, but this one seems like something that could be complementary. What about feeding log files for known events, and, I don't know, have a pretty dashboard will all the stats or something? what other use for this feature plus logging can you think of?

Disclaimer: I'm currently thinking of a way of doing logging and security audits to some CA server I'm about to configure, so right now everything looks like a nail I guess...


I think you've got a typo of "chose" vs "choose" in the step that asks the user to "chose" where the data should be sent.


Ah, thanks! Fix committed and pushing.


Is it possible to extract repeating patterns? example would be extracting repo name, number of stars, language and project description from GitHub Archive daily digests.

sample mail: http://us5.campaign-archive2.com/?u=439aa16a39e4b10e0b65ff2e...


It is not possible to extract something (today) that would be generated inside a for loop. I'd be curious if anyone has ideas on how you might do that.

If you need support for more complex emails and extraction rules, you might look into using http://mailparser.io/.


Would love to see an open source tool like this for parsing general text / csv / fixedwidth / html in a similar way.

Very nifty concept.


I made an open source tool that does something like this using a modified version of the Levenshtein distance algorithm: https://github.com/nathanathan/fuzzyTemplateMatcher

Here's a demo you can play around with: http://nathanathan.com/fuzzyTemplateMatcher/


It has some problems e.g. take out the dog, call my mother (it is entirely possible I missed the point of the code)


I really want to open source the core parsing bit which is based on diff-match-patch but I've got some cleanup to do before... It is probably not hard to reverse engineer what we're doing, it is pretty simple!


I've been not quite needing this enough for years. Such is life.


We’ve been doing this for a couple of years over at http://getdispatch.com, with a particular focus on sales leads.


A host-it-yourself version of this would be nice.


can someone here explain the difference between ifttt and Zapier? At least from a macro perspective they seem the same


Zapier has more business integrations. IFTTT more consumer.


Zapier makes money.


Man, that's some cool software ! I love it when I see software that's simple yet high quality. Good job.


I would like this for PDFs. Do you know an app that can handle that?


If the text in your PDFs has enough structure (i.e. fixed field names or punctuation) then it should be possible to convert the PDF to text and then pass it to Zapier's parser.


Great to know!


Interesting. What's the use case you're thinking?


My bank just give me a pdf with all my transactions. I want to analyze them.


Make an enterprise offering. Ops teams would kill for this.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: