> Get the following datapoints from any IP address;
Again :-( I just wonder if people will ever stop making assumptions about others given their IP address or hair colours and divide the Internet on these slippery grounds. The only legitimate and valid way to determine the client's location automatically is the browser or the OS geolocation API, and for the language it is the HTTP_ACCEPT_LANGUAGE header. The fact I have contacted your server from an IP address assigned to a Chinese ISP neither means I'm in China nor that I speak Chinese.
Believe it or not, you make up about 0.000001%[1] of the internet users. There is no saying in what this can be applied, it doesn't have to be for language or anything like that. It doesn't even have to be anything customer facing, and if it's just some stats you will probably probably just be a bit of noise.
Don't get me wrong. I totally agree that this thing is definitely not accurate(it thinks i'm 1300mi away) but it's good enough to get a general idea.
Yes, and you've made up a number that reinforces your point. A number that, if wrong, could negate your point.
Many more people are using VPNs these days. It's still not a large percentage, to be sure, but people are indeed doing it. Hell, I'm on my company VPN right now, and this site mis-identified my location because our VPN is set up to route most addresses in AWS's address space through the VPN (and ipdata.co appears to be hosted on AWS).
Then you have stuff like people on planes (usually this is all routed through a single PoP somewhere), people on mobile data who are roaming internationally (often proxied through one of the home carrier's PoPs), etc.
I think using IP geolocation is a decent first-order approximation, and is useful if there are no other ways to get that information, but it must always be treated as unreliable.
We're working on creating our own offering on this. We'll have Tor Exit node detection first and are planning to make that generally available to all our users within the week at no extra cost.
Indonesia -- 47%
Thailand -- 39%
Brazil -- 36%
Turkey -- 36%
UAE -- 36%
Viet Nam -- 35%
Saudi Arabia -- 34%
India -- 32%
Malaysia -- 32%
...
UK -- 14%
US -- 14%
Not knowing what a VPN doesn't exclude you from using one. Many workplaces use VPNs and it's likely that public internet access points (such as libraries, lounges, restaurants and cafes) could be using VPNs too.
Wikipedia says that Indonesia has an internet penetration rate of 50%. It might be reasonable to assume that a low penetration rate means that a significant amount of the internet access is happening in a work place rather than in a residential place. If a significant amount of access is happening at a work place it might also be reasonable to assume that a lot of that access is happening over a VPN.
It's pretty clear that VPN usage has increased dramatically in recent years. And at least I cited something. I could point to other surveys with comparable results. What's your evidence that it's so much lower?
Believe it or not, but my Mobile carrier frequently hands out IPs that are geolocated to anywhere but my country. I usually get French or Dutch websites.
This comment reminds me of the couple of people on HN that complain everytime someone shows off a new product and it doesn't work well in their text only lynx browser.
It's such an extreme edge case that it doesn't really matter. (unless you are of course targeting that type of market).
In other words, IP address gives a general location in a huge number of cases.
In my case it gets country and city but misses completely the ZIP and area, as did all similar services tried so far. If someone threw a non nuclear missile to those coordinates I would probably just hear a distant bang.
I'm curious to know if other people noticed the same level of accuracy.
IP Geolocation might never be the silver bullet for determining user location. Like I've mentioned before if it can't pin you exactly it'll give your co-ordinates as being at the center of the population. Use it where you're okay with having a general idea of your user's location or as a fallback where other means of geolocation are not available eg. via browser, or GPS via mobile apps
Indeed. Certainly most useful for getting a ballpark idea of where, in general, your users are located. I would guess anything that really depends on knowing an individual user's location oughta use something more specific. But you could, say, put certain countries or languages as the default of a drop-down based on IP and be right and bunch of the time, while still allowing the user to make a different choice easily.
Ugh, you're absolutely right and I can't speak enough about how much this annoys me. I run up against issues caused by well-intentioned "get your location from your IP" lookups at least once a week.
Completely ignoring the aspect of VPNs (which is huge), I've also been fighting this mentality for the past few months trying to prove to Coinbase that I'm a US citizen with my US bank account (however I signed up from an IP in Croatia!), and being completely unable to do mundane tasks like browsing/editing my Google Photos photobooks since I wasn't on a USA IP.
Can we please, please, please stop tying IPs to location? It's obnoxious to have to set up proxies just to prove I am who (or where) I say I am.
It’s funny... Google says this same thing publicly on their various webmaster blogs, but you know how Google Analytics determines your location? Your IP.
Agreed. Using IP for things you present to the user is strange and presumptuous. But the geolocation API is kind of “creepy” I image for the user. At least that’s always how it feels to me.
To be honest, this is (usually) only used for geotargeting the nearest Anycast address (or Unicast address if you are too small to operate a GeoDNS->Anycast setup). Well, that, and anti-fraud measures to flag transactions for additional verification.
People who are using it for real-world-location and language detection are doing it wrong but being wrong on the internet is a common problem. :/
It's not so clean cut. For a web browser, I think it's better to ask the user to share location data so that you can use that to make their experience. They would do this using the methods you suggest.
I think the risk here is the assumption that the only devices where you may care about GeoLocation are ones that have this capability. There are cases where using the less precise GeoIP option is not only legitimate, but it's the only option you have. What about IoT or embedded systems where you may not have the capacity to determine the location?
You can't treat the data obtained from GeoIP databases as being absolutely precise. However, for most cases the data will be precise for most people's needs.
I'd be interested in hearing how you handle devices that don't have the APIs you mention.
If the geolocation api doesn't give anything - that probably means the user has chosen to configure it this way because they don't want to share their location. You should probably ask them in case you actually need to know where they are.
If one is not even allowed to guess with the user's IP the language this should be asked in, should the first question be some kind language selection form with just pictograms in it? :-)
You should ask them in whatever language their Accept-Language HTTP header specifies. If there isn't one, then ok, maybe look at their IP. But it makes no sense to treat the IP as a more valuable source of information than the HTTP header that's there for exactly this purpose.
If there isn't one (which is rather improbable - AFAIK all the browsers have it set by default), then it should be either English or both English and the language assumed given the IP address.
Of course IP geolocation is terrible for language selection, but that doesn't make passive rough location information useless in general. What if, for example, you want to show someone nearby flights? You could ask them to enter their zip, or you could save them a step and make suggestions based on their geolocation. They could update the location if they wanted information on somewhere else or if you got the location wrong, and that's worse case 1 step, and best case no steps - you improved your UX!
The problem is when people fail to distinguish "helpful assumptions" from "actual facts"... I have just today gutted some worse-than-useless GeoIP functionality from a form that was facing customers of our organization so it could guess at their country, language, and city without asking.
BI guys put in the feature request some years ago so they could make pretty maps (but ultimately take no actionable steps based on these), and call center were then taking these guesses as facts (think getting a call and having someone chummy ask "Hey, how's the weather in <city you don't live in>" in the wrong language...). What's worse is we then added some user-facing fields asking for the location information, and it was impossible to tell apart user-supplied info from the IP-derived info!
It took fighting with BA and BI, but we finally decided it was stupid and were able to simply remove it. (BI will have to "fix their maps" by doing their own assumptions, outside of the application)
Tangential but I find it funny that such a simple service has libraries for so many different languages. Is one simple HTTP call and JSON deserialization that hard? Personally I'd be more concerned about bringing yet another 3rd party module (leftpad, security, maintenance) than doing GET + JSON.parse but maybe convenience is what's most important.
One of the first things people tend to look at for an API is whether or not there's a library that makes integration as easy as possible. One HTTP call and JSON deserialization is often more thought and effort than people want to put in.
Right, this is why we have multiple copy-pasteable examples in different languages. To make it as straight forward and quick as possible to get started.
On 2, thanks for catching that, changing that right now. On 1 atm we have a Python repo https://github.com/ipdata/python which I know isn't exactly what you're looking for. Thanks for reporting this. The right avenue to handle something like this would be sending an email to Support. Be assured we'll be quick to respond and work through whatever you need.
What are you thinking. Never do this in a library. Also it looks a bit amaterish, as it's your only currently supported library I'd fix it up a bit - don't add .pyc/dist files to git, make it pep8, remove sys.exit() in a library (!!!) etc.
You're absolutely right! Not sure how/why that got in there in the first place. I'm fixing a new release right now. I'm also going to have the repo/package cleaned up.
Calling 'print' from within a library is pretty bad form too... responses and errors should always be returned as data that can be inspected programmatically. In this case it's even worse, as the code will just return None on error, and you don't know why.
You're right. I guess it makes sense to let Request's exceptions propagate in case there are network issues. I'll work on ensuring pep-8 adherence and adding custom exceptions for the error codes in https://ipdata.co/docs.html#status-codes
^ This. Never call print from a library, what if I'm using a script to pipe stdout somewhere. Just throw a custom GeolocationException or something, with some useful data about what went wrong. In this specific case though, just don't bother catching the exception. It's not useful and it indicates something is terribly wrong.
- The Global infrastructure backing ipdata.co is pretty impressive, 4 data centers in the US, 1 in Canada, 2 in Europe and 1 each in Mumbai, Sydney and Seoul.
- We offer more datapoints, I tried to build this around the most common use cases for Geolocation, one of which is showing your users the right currency. This is something we provide, that is, the currency ISO code and symbol.
- I think our API is pretty darn fast :)
Your pros;
You offer more data formats, csv and xml whereas we only offer JSON.
You offer a pretty high free tier but I think there's definitely tonnes of people who derive a lot of value from ipdata :)
I don't mean to take anything away from freegeoip, this is basically the self-hosted vs external API debate that I've already responded to here. In short, there're cases where each solution shines more than the other.
Thanks for you answer. To clarify: Freegeopip is not my project but it's pretty easy (for someone who knows how to run a Linux server) to host your own instance.
My bad! Sorry about that. That's definitely an option, however there're definitely users who'd rather not do it themselves and would benefit from our global infrastructure and not having to bother with updates.
I wonder if you can change the pricing plans and language around them a little. What happens if I'm going along happily on the 2500/day plan and suddenly I get hit with a traffic spike and need 20,000/day for 3 days until things calm down? If you're going to use pricing buckets like this it probably needs to also come with an automated way to bump the plan if needed.
Someone else in the thread suggested Maxmind as an alternative. Their prices seem to be several times higher than yours. Can you double your prices? It seems really cheap. Too cheap?
Also, the location identified for me was off by 60 miles.
On your Documentation page you've used 8.8.8.8 as an example IP address. And 1.1.1.1, and 2.2.2.2... I'd rather see you use the IPv4 address blocks reserved for documentation by RFC 5737 [0]. That's what they're for! Unfortunately, they probably don't have any actual data associated with them so the examples would need to be fabricated...hmm.
And it looks like actually using one of those results in some output that does not conform to the format on your documentation page, so you'd best also add documentation about error handling.
You can download the Maxmind database locally. I don't see the option for that from IpData. That's worth the price increase for me.
Having a web API is great for low- or mid-volume applications, but adding 10ms per user for an HTTPS call would still be like 1000x slower than hitting a local DB.
Just cache the responses so that only the first time a new IP connects it needs to be looked up (until cache fills)
Integrating maxmind is a bit of a pain and it’s a massive database. Not well suited to use cases like single purpose or ephemeral containers.
You’ll probably end up storing the maxmind db in a separate node on your local network, anyway. So you still have a latency issue (although lower, obviously). More importantly you now need to maintain an extra box per replicated cluster. It could make sense to outsource that management if the latency between your data center and ipdata is acceptable. Such low latency is easily achievable if ipdata has BGP anycast on servers in the same cloud region as your app.
We already do something like that, marking up the user information with geoip information when we first see them, but we just get a lot of unique users. (Yes, we could do batch lookups, but that would be a complete rewrite of our workflow.)
I found Maxmind integration to be pretty straight-forward. They have APIs for most popular languages and most of the time, usage amounts to instantiating a reader object with the path to the database, then calling methods on that object with the IP in question. So, db.city(ip) returns all of the city information, etc.
Massive is a relative term I guess. But the databases are segmented, so you only take what you need and all the ones I've worked with have been tens of megabytes in size.
I consider 100mb to be quite large. That could easily double memory consumption of an application. Most http services are light but widely replicated. Adding 100mb of memory usage to each replica is certainly a tradeoff.
What exactly do you mean by web server? Generally you will have at least one server process wherever you do the maxmind lookup, which could be at the LB/proxy (Nginx) or app server (node, python, go, etc).
Nowadays these services are often heavily decoupled and therefore built to be as lightweight as possible. Think about small single purpose containers. Adding 100mb of memory requirements per instance could be quite expensive, depending which existing process is calling maxmind.
Say you have python/ruby/js processes. You usually build 1 box/container with multiple processes inside. If you have 100 services and you distribute 1 process of each in each box then THAT is your problem.
are you passing the original ip addresses to all these containers? Why don't you just decode the ip to location at the edge of your network (typically load balancer) and add a header with location?
You're right, they're definitely many cases where incorporating any calls to external resources would be the wrong call and where hitting a local copy of a db would be the best solution.
An API such as this exists would be a great choice where you need to modify content on the frontend. In which case you'd benefit from the availability of 10 endpoints around the world from which the API is available to provide the lowest latency to your users.
I get what you mean. However from the use cases I've seen often whatever processing needs to be done eg. showing a banner ad to users from a particular state or redirecting to a country specific version of a site can be handled from the user's browser.
Even if you're not hosting the country specific site in that region your user will still save time on making the call to our endpoint local to them.
I've actually been seriously considering adding a pay as you go plan, where you pay for only per the number of requests you make,is this something that would interest you? The current prices are an extended Cyber Monday offer, I figured I'd let it run into the holidays.
Unfortunately it's never going to be spot on all the time.
Awesome points on the example ips and adding error handling documentation. Thanks for pointing me to rfc5737!
I'm adding the Error Code documentation right now.
Added a table with Error Codes and their accompanying messages.
It's more fun to sell metered services to people than "plans". You can get the best of both worlds by having a per unit price, and then giving people discounts for committing to certain usage levels.
Plans are kind of a pain because you have to have a sales process to get people to change. Mailchimp automatically upgrades/downgrades people between plans, which is a decent workaround, but still a little weird to customers.
Definitely not! :) We'd send you a unique link to a page where you can update your details. We'd never expose your credit card information in that manner.
Good to hear, but I'd still be a bit concerned that the wording above might lead some inexperienced/not-security-conscious person to send you their details via email outright.
Please see my reply to mywittyname, basically, there are certain situations where being able to take advantage of our globally distributed infrastructure is the best call and I 100% agree that there are situations where using a local db is simply the best (only) call.
Unfortunately, it's never going to be a 100%. If it's not able to pin you exactly it'll place your location as the center of the population. If you're okay with sending your IP and location to an internet stranger my email is jonathan at ipdata dot co. I can keep tabs on this and see if accuracy improves.
I'm going to paste my reply to dewski above, Unfortunately, it's never going to be a 100%. If it's not able to pin you exactly it'll place your location as the center of the population. If you could send me your location and ip address at jonathan at ipdata dot co. I can keep tabs on this and see if accuracy improves.
It's precisely the same as Maxmind for me -- about 50 miles off. GeoLiteCity.dat.gz is about 11 megabytes, i.e. one penny on a cell connection. I'm not sure how this makes sense as a business, since the value in a geolocation database lies in having cars constantly driving around sniffing people's IPs.
Please see my reply to mywittyname above, basically, there are certain situations where being able to take advantage of our globally distributed infrastructure is the best call and I 100% agree that there are situations where using a local db is simply the best (only) call.
It also started as a simple geolocation API when I first launched it over 3 years ago. Since then we've built many custom data sets and expanded to more products, including IP to company details, carrier detection, reverse IP hosting data and more. See https://ipinfo.io/products. We've been lucky enough to signup customers like Dell, Tesla, eBay, TripAdvisor, Plesk and others. Happy to chat and share notes at some point!
Thanks coderholic! :). I took a lot of inspiration from your service but built around solving a number of pain points. Would love to chat and talk sometime :)
How accurate can you get with an IP address? I seems like it could act as a fallback if the geolocation web api is blocked.
I have a side project[0] that uses the geolocation api to post to Slack and seems to be pretty accurate, even on mobile. Maybe I'll try this as a fallback.
Geoslack looks really cool!! Using it as a fallback when location isn't allowed via the browser or if within an app from GPS would be a valid and common usecase!
I'd say you can pretty much trust it up to the country level and greater region, more specific than that the results will not always be highly accurate.
1. Currency Data - The user's country's Currency ISO code and symbol are returned in the API output.
2. Phone Code Data - The user's country's Calling code is returned as well.
3. Global Footprint - Endpoints in 10 locations globally. 4 on both US coasts, 1 in Canada, 2 in Europe, 1 each in Mumbai, Seoul and Sydney. This ensures you get pretty low latencies wherever your app is hosted and wherever your end users are located.
4. Solid Infrastructure - The infrastructure backing Ipdata is pretty impressive, and built to scale to a substantial amount of traffic - in the hundreds of millions.
5. A free tier of 1500 requests daily (45 000/month) with no signup or credit card needed.
6. Examples in multiple languages on the Docs page at https://ipdata.co/docs.html, to make integrating with your site absolutely painless.
7. Regularly updated data - we check for changes daily and regularly update our data.
Too cool! Ping me at jonathan at ipdata dot co and I'll set you up with free api keys for this! :) We can also talk about adding including some examples on the docs page.
please explain your data source model. How do you integrate the RIR data, the NIR data, BGP and RTT location/estimation data to arrive at the geoid of where any given IP is. How do you handle multi=national ASN or cellular ASN using IPs across borders? (disclosure: I work in a regional internet registry)
That's a very intriguing idea. I'm definitely going to look into the mechanics of this. I don't know if it'd make it possible to have them in different sizes, but thanks for this! I'm definitely going to look into adding a field with this.
The folks at http://getipintel.net/ can give you some decent information about wether an IP looks spammy or not.
> Given an IP address, the system will return a probabilistic value (between a value of 0 and 1) of how likely the IP is a VPN / proxy / hosting / bad IP. A value of 1 means that IP is explicitly banned (a web host, VPN, or TOR node) by our dynamic lists.
We've been using it for a while and are pretty happy.
If you want to know whether an IP is from a VPN, Tor or a compromised server you can do that for free using the Shodan API (https://developer.shodan.io).
Disclaimer: I'm the founder of Shodan and identifying the type of connection is something we're often used for.
You can see all the datapoints we provide by visiting our homepage https://ipdata.co or by making a call in your terminal via curl https://api.ipdata.co
Thank you for sharing your need for this. I'm going to make this a priority. If you'd send me an email jonathan at ipdata dot co, just a quick hello, I could keep you up to date on our progress as we develop this.
I think the freemium model is pretty common in Saas. And I think a free tier of 45 000 requests a month, 1500 per day is pretty generous. Anything running as a hobby app/side project would run pretty well within that tier. Most users making more than 1500 requests a day have no issue paying.
Again :-( I just wonder if people will ever stop making assumptions about others given their IP address or hair colours and divide the Internet on these slippery grounds. The only legitimate and valid way to determine the client's location automatically is the browser or the OS geolocation API, and for the language it is the HTTP_ACCEPT_LANGUAGE header. The fact I have contacted your server from an IP address assigned to a Chinese ISP neither means I'm in China nor that I speak Chinese.