Just for those curious, it's not that cheap actually compared to Google's enterprise level Geocoding. Nor, I'm guessing, is it able to geocode internationally. In which case you might as well use Mapquest, as it's completely free.
Currently, the company I work for uses Google for geocoding and we have 1.1mil a day which ends up costing around $5k (22%) per year more than these folks... but! It includes international geocoding, google maps, etc.
Simply using census data to Geocode US addresses is easy; and there are directions how to do it here in the comments... but setting up Nominatim (from open street maps) is a serious amount of effort (and not cheap for a 32GB server) but /is/ capable of global level geocoding.
One great use case for this service though: using mapbox, which is currently forbidden by Google's TOS...
While I'm stoked to see competition in this space, I wish the competition was a bit more robust (but everyone has gotta start some where, right?)
I hope you all continue forward with this, and hopefully add international capabilities as well as price drops. I for one would do away with your free offer altogether as the free users ROI will probably always be an expensive crap-fest and allocate those resources to driving the price down for your paying customers.
If/when you all can do ~1.1mil international geocodes per day for less than $10k a year, LET ME KNOW! :)
Other founder of Geocodio here -- thanks for your feedback! You bring up a lot of good points.
Building off of what's above, Geocodio is intended to be accessible to developers who don't have $10k to drop on geocodes. We found that this is a big need in the community (and for ourselves for our other projects). All of the other non-major-mapping geocoding services we found, including CSV upload instead of API, were more expensive than $0.001/each (oftentimes much more -- $0.25+)
Also, we don't have limitations to how you use the data. No requirements that you use a specific brand of map with it, no attribution requirements, etc.
We priced it at this point, with a free tier, so that people can give it a try first. No, our data isn't quite as good as Google's -- we get about 90% of addresses within 1 mile, and most within a tenth of a mile -- and we want people to be able to play around with the service and get to know it before they have to give credit card info.
With that said, we definitely plan to continue improving the product and add international support.
PS. We are HUGE fans of Mapbox, so we're pretty excited that you listed that as a potential use case :)
Thanks for the feedback, it's really valuable! The idea behind Geocodio is definitely to prevent you from needing to go through the hassle of building a dataset yourself and hosting it, it is indeed a very time consuming process. Note that mapquest still has a 5k/requests per day limit [1] making this a viable alternative.
As mentioned in our FAQ [2] we do indeed provide special pricing and capacity for high-volume users, we would definitely be able to match Google's pricing by far.
Thanks! Yes, I haven't really seen a lot of other services that provides batch geocoding as an API endpoint.
We have mostly been running tests against the Google Maps API, and from a totally random sample of 100 address, 90 of them were within a mile from the Google Maps API returned location (Most of them were actually within 0.01 mile).
I'm not sure how we would compare to OpenStreetMaps and Data Science Toolkit since our data source is different (US Census Bureau). - But the obvious reason why we provide this as a SaaS, is that you don't have to host anything yourself, or juggle around with gigabytes of boundary data. We handle all the mess.
For those looking to roll your own, the Ruby implementation of a TIGER geocoder released by GeoIQ a while back is a pretty solid starting point: https://github.com/geocommons/geocoder/
We ended up using that as a base and then making some customizations for our US-based geocoding solution. As these guys are figuring out, there's no great int'l option. Google is bad from a licensing perspective (but their tech is fantastic). MapQuest is great but can get really expensive. We've had decent luck with TomTom I think, but if I remember correctly there are a lot of caveats.
I had the need to geocode 10s to 100s of thousands of US addresses weekly, with the ability to accept slightly-reduced accuracy vs. the parcel-level accuracy of Google Maps.
I rewrote the geocommons geocoder in Java to speed up the loading and geocoding process, and wrapped a REST api around it. I used a minimal perfect hash function to map zips/streets (metaphone3'd and ngramfingerprint'd) to data stored in a key-value structure. The key-value structure is small enough to fit in memory of a decent sized EC2 instance, but I haven't tested the throughput except from a slow disk--which got me about 100-150 results/sec.
The results include parsed address, lat/lng in WGS84 datum, and associated US census region info (state, county, block group, block, msa, cbsa/csa, school district, legislative district, etc.).
I'd considered open sourcing it, and I was trying to architect it such that one could plug in various data sources beyond TIGER when higher-accuracy info is available (e.g., from SF's address parcels, Massachusetts has lots of E911 parcel data available, etc).
It's a very smart geocoder that one, have contributed to it - It is a bit old now and not that easy to get started with.
The state of the art of open source geocoder would be TwoFishes: https://github.com/foursquare/twofishes written in Scala and developed and used by FourSquare
That's a pretty nice starting point, but unfortunately the code base hasn't been updated for years and the data import process is extremely time consuming [1]. That said, rolling your own geocoding solution is the most restriction free service you can get. Just be prepared for the maintenance and the time consuming set up.
The biggest problem we've had is changing non well-formed addresses / ambiguous addresses into canonical addresses with lat/lng. Google Maps wins on that front.
We obviously can't beat Google in that case :) That's also why it's priced to be way more affordable. It does however happen that Geocodio is more accurate than Google Maps - try for example "8895 Highway 29 South, 30646" (Address of a CVS store) on Google Maps and Geocodio.
I'm using mapQuest geocoding API[1] which basically does what you do for free, without the rate limits.
setting it up was quite a pain because they don't use semantic http codes, and I had to play with it a lot to handle their undocumented error codes (they store it inside body.info.statuscode). Good to read that you return semantic http codes.
If you want to differentiate from the competition, I would suggest that you improve the address parsing and support more patterns. Think of us having to geocode user-typed location fields from twitter. Enjoy it :)
Cherry picking one example does not make you more accurate than Google Maps. TIGER has some giant holes in it, and is based on block faces not building footprints like Google Maps. In most cases Google Maps will be much more accurate and comprehensive.
UI suggestion: 'street addresses' currently has a box around it, so I thought it was an <input type="text"> field, thought "how cute", tried to click on it to enter an address to geocode, and was disappointed to find out it was just some bolded text. Might be a fun little feature to have that actually be an entry point into trying out a demo of the API (I thought I was supposed to enter an address to have geocoded).
Maybe it's because I'm semi-technical, but "$0.001 each" is clearer to me than "1 cent for every 10 uses." I mean, I recognize that they're synonymous, but the former clicks in my mind faster than the latter.
So, if, as you admit, the service's audience is semi-technical, and if the average semi-technical person's brain works like me (big assumption, I know), I would argue that they should stick with $0.001.
I work at SmartyStreets, where we've learned that geocoding is very, very difficult, so I definitely feel your pain! We started with basic Census Bureau stuff and it's definitely complicated, and accuracy can be spotty. (We've since worked with other data vendors to improve the accuracy.) It's too bad we don't all have little cars to roam the country with and manually collect rooftop-level data like Google does.
+1 on the versioned API endpoint... when we released ours nearly 8 years ago, versioning APIs wasn't really a thing yet. We're paying that technical debt off now as we vigorously rewrite and improve our service.
Quick feedback: Links on the FAQ page are hard to distinguish from regular text.
Thanks! Yes, it is definitely not easy, a lot of edge cases to take care of too. Luckily we are not trying to directly compete with any of the big guys out there, which makes us able to keep the price low and the output high.
We were tired with dealing with the often steep pricing on geocoding when you reach your daily free limit (e.g. Google Maps starts at $10k/year). So I built this service so I can use it myself and hopefully it would be useful for others too.
Love it. I'll keep using my current service for now (SmartyStreets), but I'll let you know two things I noticed:
1) Most services will accept shortcuts for names, like "SF" for San Francisco or NYC for New York, but in both cases, I got error messages instead of geocodes.
2) Addresses that aren't "properly" formatted (i.e., without commas or something) often return very incorrect information. Here's an example:
2680 NW 8th Pl, Fort Lauderdale, FL 33311 - returns correct info
2680 NW 8th Pl Fort Lauderdale FL 33311 - returns incorrect info (see suffix, formatted_address)
For what it's worth, SmartyStreets mangles even the first address that you got correct, but on the other hand, they're very good at correctly returning data for improperly formatted addresses like the second one.
Thanks for the feedback! We don't currently support shorthands for city names - only states. But this is definitely something that's on the todo list now.
Our address parser will try to pick up the address even if isn't formatted correctly with commas, but it obviously won't work in all cases. Address parsing is indeed a very complex problem.
Address parsing is actually very easy. Knowing when you got it right (or wrong), that's the hard part, and that's where address validation come in handy.
If you can start with a list of all the following, you've got a great start:
prefix abbreviations
street names
street types
suffixes
city names
state names
Add to that all the possible misspellings and then factor in levenshtein and soundex to account for misspellings you didn't know about and you've got a pretty dang good address parser. Figure out how to do that lickety-split fast, and you've got gold.
An admirably quick response, both to my question here, and to the email I sent.
People, this is a lesson. If you post a "Show HN" then be ready to respond to people's questions and comments. Posting and then going silent for hours is not a good message to send
to people who you want using your service. It says you haven't thought enough about your level of service.
If there's enough interest, we'll definitely be working on this next. It would just require a slight restructuring of our data to make the lookups as efficient as possible.
+1 to reverse geocoding support. Our app runs around 25k/day reverse geocode calls to OSM and Mapquest's Nominatim. We are projecting up to 4x growth within the year so an accurate, bulk and cheap service will help ease our pain. And oh, we're based and operating in the Philippines (which hopefully you can add soon as well).
Cool project. Like others have said, not particularly convinced that it's cheaper than Google's enterprise geocoding, but I'm more than glad to see the competition.
Thanks! This looks great! Would you mind if we possibly mentioned this in our documentation?
As for the pricing, we are indeed much cheaper than Google's geocoding offerings (given the nature of our product). If you are looking to do a high amount of geocoding requests, just contact us[1] and we'll work out a pricing model for you.
I wonder how the "choose your own api key" policy is going to work in practice... given that people don't usually make very secure passwords and that the example is "Real estate website" you're going to get some pretty easy to guess api keys.
That's actually just a name to identify the API key, the actual API key is a 40-character automatically generated string. The idea is that you will be able to create an API key for each of your projects and revoke them individually as necessary.
I tested this website api for 2000 randomly selected home address. And it's not accurate enough. It's 4000 foot away in average to google's lat lng. This number is kinda less accurate comparing to bing's 1000 and datasciencetoolkit is 2200.
Geocoda (http://geocoda.com) launched last year, does point storage as well as geocoding, and should be comparable for low amounts of geocoding, and cheaper for large amounts per month (> 250K).
TIGER (dataset that this is based on) has some giant holes in it, and is based on block faces not building footprints like Google Maps. Its also U.S. only... why not base on OSM, which should include TIGER as well as all the other contributions.
Gah! This is awesome. Where were you when I was trying to get an idea launched and the cost of geocoding was the wall I kept hitting??? Seriously this makes my week, maybe it's time to dust off some old projects...
IIRC Google's TOS prohibits saving geocoded points. "Caching" is allowed, but I think this has value/is different insofar as it would let you store points permanently without breach of contract.
Where the pricing says $.001/ea for 2501+ geocodes, are the first 2500/day prior to that still free? Or am I paying $2.50 for the day as soon as I make that 1 extra request above the free limit?
Street Address to Coordinates: Street Address to Location calculates the latitude/longitude coordinates for a postal address.
Currently only the US and UK have street-level detail.
Google-style Geocoder: Are you currently using Google's geocoding API and want to switch? Replace maps.googleapis.com with the address of a DSTK server and your code should work without changes.
Free to use, also available as a (free) self-hostable VM.
Neat! If you can get your address parsing up to Google's level or anywhere close, you should do quite well.
For others looking for a solution you can play with yourself, here's a VM image with a pretty good geocoder you can set up yourself (iffy address parsing, though):
http://www.datasciencetoolkit.org
Parsing isn't the hard part, its the source data, which if you do the math on what Google has done (drive around the world taking 360 video and LIDAR of streets) is literally billions of dollars worth of work.
TIGER is a pretty bad starting point, geocoding based on block faces is really inaccurate if you want to zoom in to the street level. And its U.S. only.
OSM Nominatim should be a better place to start.
I'd love to see open sourced Street View data collection / processing as part of the OSM project. Then there is a chance to compete with Google.
What you're talking about (massive ground-level driving effort to pinpoint where along streets specific addresses are located) would boost accuracy. Without a Google level address parser, though, you don't get usability for a lot of use cases, which is frankly much more important for a lot of companies. One of the best things about Google's geocoder is that you can throw various location names, as humans type them, and Google will return something, and it's usually the right thing. For many applications, this is the desired behavior, rather than precision.
I oversaw a project like this elsewhere (where we had reams and reams of geo coordinates, but we needed text searchable tags (like "Canada", "Toronto", etc).
We had millions of them though, so maybe an API isn't really the way to go.
See my previous answer [1], obviously it's impossible to compete directly with Google and especially not at this price point. Our goal is to return a geo coordinate that is at least on the correct block and as close to the street number as possible.
We actually don't have any rate limiting currently (we can handle a pretty high amount of concurrent requests and will hopefully be able to scale up hardware before we hit any performance issues).
Very cool. I'm in the telematics industry and forward geo-coding is something in which I am always interested, since it can be quite the bitch of a task. How did you go about assembling the shape files?
Small thing: I would drop the "bulk" as the tagline is too much of a mouthful and "bulk" is unnecessary. It's free at smaller volumes anyway, so certainly not deceptive to drop it.
Good point, we might want add that as an optional parameter. Also note that our address address parsing API endpoint is free and doesn't count towards the usage statistics and billing :)
Our infrastructure is pretty efficient, making us able to keep our operating costs low. We wanted to have a pricing point that was below any other similar services we could find.
Yep! Unfortunately we will manually have to add support for each country (including getting data, normalizing it, etc.) which is quite some work. We're planning to add support for additional countries if demand is high enough.
Separating a street name into "street" and "suffix" is a baffling decision which probably has a few issues even in the US, and definitely won't work elsewhere.
Agreed, this is unusable for us until there's international support. We have a large percentage of international users in 130 different countries. Doesn't make sense to use this over, say, MapQuest.
Currently, the company I work for uses Google for geocoding and we have 1.1mil a day which ends up costing around $5k (22%) per year more than these folks... but! It includes international geocoding, google maps, etc.
Simply using census data to Geocode US addresses is easy; and there are directions how to do it here in the comments... but setting up Nominatim (from open street maps) is a serious amount of effort (and not cheap for a 32GB server) but /is/ capable of global level geocoding.
One great use case for this service though: using mapbox, which is currently forbidden by Google's TOS...
While I'm stoked to see competition in this space, I wish the competition was a bit more robust (but everyone has gotta start some where, right?)
I hope you all continue forward with this, and hopefully add international capabilities as well as price drops. I for one would do away with your free offer altogether as the free users ROI will probably always be an expensive crap-fest and allocate those resources to driving the price down for your paying customers.
If/when you all can do ~1.1mil international geocodes per day for less than $10k a year, LET ME KNOW! :)