Hacker News new | past | comments | ask | show | jobs | submit login
OpenFlights – airport and airline data (openflights.org)
224 points by cyberlab on April 27, 2021 | hide | past | favorite | 60 comments



You can get free enterprise access to a lot of the major flight tracker services by setting up an ADSB receiver and feeding the data to them. Basically they give you full access to everything as if you were paying for the top tier of their services, because you're helping increase the coverage of their data. A few such services:

https://www.flightradar24.com

https://flightaware.com/

https://www.radarbox.com/

https://skyscanworld.com/

There's also https://www.adsbexchange.com/ which doesn't filter their data (probably much to the chagrin of various businesses and governments). If you see/hear a weird plane above and you can't find it on the commercial services above, check ADSB Exchange.


> You can get free enterprise access to a lot of the major flight tracker services by setting up an ADSB receiver and feeding the data to them. Basically they give you full access to everything as if you were paying for the top tier of their services, because you're helping increase the coverage of their data.

"top tier" may be overstating it but setting up a RPi and $20 USB ASDB receiver will get you the $90/month Enterprise feed [1]. Still a great deal if this is a topic that interests you.

[1] https://flightaware.com/adsb/


Not the same thing. Scheduled future/historic flight data vs observed realtime/historic flight data.


They do give historic data, but yeah I don't know offhand which ones give comprehensive direct API access just by feeding ADS-B data. RadarBox seems to. FR24 and FA require you to contact them -- I've never done this so I don't know how much friction the process entails, or what kind of API limits you may be subject to. Probably depends on your intended application.


Briefly got interested, then hit "Warning: The third-party that OpenFlights uses for route data ceased providing updates in June 2014. The current data is of historical value only."


http://info.flightmapper.net/ is the gold standard for manual use, as far as I'm concerned. Would be lovely to have programmatic access to this data.

They say that they get their data from Cirium.


It's ridiculous that the only original source of this data, the IATA [0], charges $700+ for this list, so kudos to OpenFlights.

I can't stress just how important (and how hard) it is to get a great source of data for airports -- I've now built 3 travel-related projects (the latest, Wanderlog [https://wanderlog.com], keeps people's flight reservations, so uses it for an autocomplete), and it's been a key building block for all of them.

The main datasets we use are:

- OpenFlights [1]: mentioned in this post, but this dataset was great since it had timezone too.

- OurAirports [2]: no timezone here, but the "type" and "scheduled_service" columns in this dataset are essential. "Type" lets you distinguish between small/medium/large airports, and "scheduled_service" lets you easily filter out airports without real flights (which you often might not care about).

- Random other GitHub Gist [3]: I have no idea where this data comes from, but it was surprisingly complete and has a few golden nuggets like "num_flights" and "runway_length" in addition to "timezone". The presence of a "woeid" suggests Yahoo-related origins, but it's hard to be sure.

- We now supplement this with airports from autocomplete APIs like Skyscanner's, because they're still the most up-to-date.

Long story short, it'd be AWESOME to have one complete, updated database with all this data in one place. This kind of data really should be public and a public service, but until then it's unfortunately up to the community.

[0] https://www.iata.org/en/publications/store/airline-coding-di...

[1] https://github.com/jpatokal/openflights/

[2] http://ourairports.com/data/

[3] https://gist.github.com/tdreyno/4278655


I've done something similar for my current job. I've used all these same data sources, even got access to the IATA stuff eventually. I also used GeoNames a lot, it's not specific to airports but it has decent airport data and I need a lot of the surrounding features as well.

Every source was definitely useful, but I think ultimately crawling Wikipedia was the most useful and highest quality set of data for me (after some significant data cleaning). The List of Airports By IATA Code [0] is almost as comprehensive as the official list from IATA, and you can follow the links to crawl info about the airport and city served. Getting info about what city the airport is considered to "serve" is so useful, as most airports are technically not in the city people consider them to be the major airports of, and some "serve" multiple cities.

Of course the difficult part there is that Wikipedia data isn't really clean or standardized. The page HTML isn't standard, even things that look very standardized like the sidebar will have 30 variations when you crawl all the airport pages. There is WikiData, but I found it still wasn't simple to get the data from there, and it also didn't include most of the page content which I wanted. [1]

Nowadays we have direct relationships with the airlines/GDS/so on, and also a department of people to add and manage the data ourselves, because even the direct source gives you pretty poor quality data. The project was way more fun when I was wrangling data from a dozen places around the web :) Now it's more of an enterprise CRUD webapp with some fancy localization and GIS tooling.

[0] https://en.wikipedia.org/wiki/List_of_airports_by_IATA_airpo...

[1] This was a while ago, so maybe WikiData has changed


FlightStats / Cirium have an API for airport data[0] that I’ve found to be mostly complete (sans a few rural Australian airports). It includes historical records for airports that are no longer active, such as Hong Kong’s Kai Tak airport that previously went by the HKG IATA code.

FlightAware have a similar API[1].

These aren’t free or open mind you, but are at least readily accessible for those that need/want it.

[0] https://developer.flightstats.com/api-docs/airports/v1

[1] https://uk.flightaware.com/commercial/aeroapi/


Are you just looking for airports, routes, and schedules? FlightAware provides that: https://flightaware.com/

Not sure what you get with the commercial services, but even the free services are pretty good. It's what we used in 1st CAV to track the redeployment of the last units to leave Iraq in 2011.


If it’s important and hard to get this data, is it really ridiculous that a provider of the data charges $700 for it?


It's only hard to get because IATA doesn't easily provide it. IATA isn't "a" provider of the data, they are the data. It would be like if you had to purchase a list of the bus stops and schedule in your city from your transportation department.


Many (most?) standards bodies charge for their standards documents and data feeds. There are obviously costs associated with running IATA; I don't see why they should be obliged to provide their data for free, especially when the typical user of such data is likely to build a for-profit business on top of it.

I don't think IATA actually assigns the codes, but rather aggregates them. In the US, the FAA assigns the airport identifiers: https://www.faa.gov/nextgen/cip/airport_facility/


I think the FAA will only assign the four letter ICAO code. The 3 letter IATA code is assigned by them directly: https://www.iata.org/contentassets/1277d04d575843dc80a3f613d...


That’s a good point. The US has a somewhat unique system there. I’ve been googling this and I can find any definitive reference. All I can turn up is that the application for an IATA code is supposed to be made by an an airline or CSR, not the airport itself. What I can’t find is an explanation for the “coincidence” of the K prefix.


If you tell me the FAA code for a US airport, I can tell you the IATA code without a database lookup and without checking with IATA.


C83

Good luck


I’d have returned that it has no IATA code assigned, which appears to be correct from spot-checking it on several online sites.

I’m guessing you know that’s not correct or you’d have not asked, so what is it?


I have to pay the bus company to ride the bus, it doesn't seem insane that the bus company may want to charge me money if I asked them for a full, comprehensive, organized list of stops and schedules.

Sure, there are reasons for why they would want to make it available for free, but there are also reasons for why they would want to charge me, and they aren't unreasonable. I don't have any fundamental, natural right to a transportation network curating and providing their data for my consumption. It might not care enough about my needs to spend money do so. It might not care enough about my needs to spend money to do so for free.


Oh they charge way more than $700. For a recent project, for four months of flights in North America cost roughly $5000.


openstreetmap might happen to have a list of airports with decent metadata, but it'll have zero info about actual flights.


Yep. It is quite good for some data that volunteers can provide from open data. Things like runway numbers, surface material and length.


Downloadable airport data on OpenFlights tends to be quite dated, missing notably e.g. the Berlin Brandenburg Airport. OurAirports (https://ourairports.com/) has a slightly different format but the data there is significantly more recent.

source: started with OpenFlights but had to switch to OurAirports for my project https://flightnotebook.com


The US Bureau of Transport statistics provides historic flight schedule and actual flight performance data in CSV tables:

http://www.transtats.bts.gov/DL_SelectFields.asp?Table_ID=23...

But it's cumbersome to work with.

I am working (on and off) on a DBMS benchmark based on this data. As part of that endeavor, I have a script which:

* Automates downloading the CSVs.

* Creates an appropriate SQL database schema.

* Performs a bit of rudimentary cleaning (e.g. invalid character codes; optional)

* Loads the CSV files into the database.

So that, from the command-line, you could get the flight on-time performance data by merely typing in something like:

  /path/to/usdt-ontime-tools/scripts/setup-usdt-ontime-db -r  -db-name ontime --first-year 2019 --last-year 2020
it's available within this repository:

https://github.com/eyalroz/usdt-ontime-tools

the caveat is that, for now, the only DBMS supported directly is MonetDB: https://www.monetdb.org/ , a FOSS analytics-oriented columnar DBMS.

An adaptation of the script for other systems (MySQL/Maria, PostgreSQL) should be straightforward, since the commands are SQL'ish after all. If you're interested in that, open an issue or write me.


Spent a lot of time with that dataset a couple of years ago, looking at historical flight delays and cancellation rates. If only this type of data was available outside of the US. The data is updated daily I believe and provides a lot of detail and is of pretty good quality.


For anyone interested, the nice guys at https://aviation-edge.com supplied me access to their flight API so I can track how many flights fly directly over my little community in Gary Indiana: https://millerbeach.community

I wish I was able to track more frequently than every 15 minutes (free version api max, etc), because some aircraft pass overhead before they're picked up, so it's not the most accurate, but a rough figure to/from O'Hare, Midway, and Gary


I’m working on some project to compare historical availabilities of seats between city pairs in Europe, too bad their historical api doesnt return aircraft type (so number of seats its unknown). I also couldn’t find how far back their data goes.

... for my project, I actually got some historical paper schedules of the official aviation guide, basically they’re phone books. I hope to find a decent/affordable database for more recent data. (MIT students/alumns actually get access to a database going back to 1979, but alas no access for outsiders...)


The Official Aviation Guide became OAG who will have what you need in digital form but at steep commercial rates, as will their competitor Innovata (Cirium)

The actual seats bit is surprisingly complex if you want accurate figures, as the same aircraft type can have wildly different numbers of seats depending on layout and class configuration. OAG/Innovata's standard schedule product has the aircraft variant normally assigned to a route shown, and they survey the airlines on the seating configurations of their aircraft calculate capacity and ASKS. I believe Cirium now cross reference this with flight tracking data to get data based on the actual aircraft used (which solves edge cases like substitutions or an airline operating aircraft with differently configured A330-200s on different routes) - doing that was part of the masterplan when I worked for them before they acquired Flightstats.


Nice job on your community website!

You may already be aware of this but if you want real-time ADS-B, check out PiAware (https://flightaware.com/adsb/piaware/) as a low cost option to run your own ADS-B ground station via a raspberry pi.


A few years ago I tried writing a Python wrapper around SABRE's API to get pricing, route, and schedule data, which seemed to work reasonably well. It likely doesn't work anymore, but it was a fun exercise. https://github.com/Jamil/sabre_dev_studio

I wish I had access to the GDS data to get realtime seat/award availability, but I couldn't find any pricing information to get that information through Sabre's API.

Does anyone know how much that costs, or if there are any services which provide it as an API? I use ExpertFlyer for personal use, but ideally I'd want to get that information at the source…


I’d really like to know this too, but been unable to get a price either. I also really want an API (ideally with historical data available) with fare pricing data too, but not been able to get a quote on that either.


You can both take a look at Amadeus for dev (Amadeus is the biggest GDS) https://developers.amadeus.com

More info on API here https://github.com/amadeus4dev/hackathon-starter/blob/master...

Disclaimer I work for Amadeus, but actually never used this API service, I'd be interested in your feedback


Thanks for those links! Didn't know Amadeus had something similar.

Just like with Sabre, it seems all the "juicy" stuff is in the Enterprise API (which is opaque on pricing and setup -- you have to contact the sales team to get access).

Do you have any insight into whether it's possible for individual developers to get access to that enterprise API, and what the fees look like?


Would you have an example of a juicy stuff you are looking for? Seems to me most needs (flight search, seat maps, flight status, airport/airline infos) are covered by the self service API


I’ll look into this API, thanks! Everything I’ve tried previously I ran into limitations that blocked me from building the project I was working on.


Top, do not hesitate to share your feedback, if anything can be improved I can try to forward it internally


Try the "crowdsourced" ADS-B Exchange site, which shows unfiltered flight data. [0] For more info, check their FAQ.

Live data: https://globe.adsbexchange.com

[0] https://www.adsbexchange.com


And if the OP has a strong personal interest in tracking flights over his community, he should pay particular attention to the page about antenna and setting up his own tracker




For those who want to jump in and query this dataset, I uploaded it here: https://bit.io/boyd/airports

I'm still working on bit.io and would love feedback so hit me.


Thank you for discovering bit.io!


Interesting! In addition to the links already shared, I use OpenTravelData, available at https://github.com/opentraveldata/opentraveldata, which consolidates airport information from different sources, but also data on aircrafts, airlines, etc.


If you need airport or airspace data, openaip seems to be decent: http://www.openaip.net/

For some reason they make you register in order to download the data, and the site is a bit confusing, but the data seems good.


Here's an app I wrote for FS2020 that uses OpenAIP airspace data: https://twitter.com/lemonodor/status/1384611707314606090


In the past I wanted to build a system that checks flight paths and will tell you what kind of plane noise you can expect in the area where you want to buy a house.

Anyone build it yet/ needs something like this?


Why don't airlines provide a good free API for flights and reservations? I would think they would want developers to help make accessing their offerings and buying them easier.


APIs to airline reservation systems already exist, they're just supported by the likes of Amadeus that provide the relevant IT infrastructure to the airlines as well as distribution services for travel agents and OTAs to book [nearly] all the airlines

Since finding and booking flights is actually trivially easy for the consumer already, it's actually in the interests of airlines (as well as the middlemen) to be cautious about who gets access to which API function, especially when it comes to actually selling tickets.

At the extreme end of the scale, some low cost carriers can only be booked on their website, because they make more in upsell from the booking flow than they could make with extra bookings from other channels


Skiplagged.com (and the legal issues around it) is a good example of why they may not want that data easily accessible.

But other than that, I assume there's a lot of money in partnerships with sites like Kayak and Priceline. But I'm not even sure which direction that money flows.


I also don't understand why they objected to skiplagged. If they make money on the flight if I'm in the seat, wouldn't they make slightly more (saving on fuel) if I'm not on it? Why would they care? Plus, the slightly less populated flight might be a bit more pleasant to the passengers sitting nearby where the missing person would've sat.


Because If you didn't use it you'd be paying more money for the flight you did take.

Also they don't care about giving coach passengers a pleasant experience. They'd remove the seats altogether if they could.


Why would they? They prefer to keep their pricing structure secret and earn booking fees.


My intuition is that if you are selling something (like a flight) then you would want to make it as easy as possible for people to buy. Getting the data out as widely as possible would seem to help with that.


They are selling cheaper and more expensive tickets. They have cheaper options for price-sensitive customers, but they don't want to make them easy to find for everybody.

They have their own ticket shop online. So they prefer if people buy from there.


does anyone know any current dataset I could query to check historical routes status?

Since the pandemic I've found plenty of airlines selling tickets and systematically cancel the flight a few days before. I was looking to scrape some data to avoid this kind of unreliable flights.


Check out my other comment...

The USDT on-time performance data goes back as far as October 1987 (and you can specify the period to the download script with the --first-year , --first-month , --last-year , --last-month command-line switches).

Once the data is loaded you can use spiffy SQL to print out routes the way you like them. Unfortunately the data is also a bit dirty (which is something I'm working on).


typing in Toronto does not return anything, neither do many of the smaller airport ICAO I trired.


I wish there was open train data.


Here's my chance to plug something I wrote long ago (back in 2012), and revived earlier this month. I used the train/bus/riverboat schedules from Transport for London (TfL) to create an animated view over 24 hours of every single vehicle as it journeys through London. The 2021 version is in 4K at 60 fps: https://www.youtube.com/watch?v=0rj60B7w59s – with details posted here: https://log.kv.io/post/2012/06/04/public-transports-in-londo...

"Open train data" is a bit vague without mentioning where these trains might be, but I did find the London Tube schedule[1] in GTFS[2] format, as well as the bus schedule[3] also in GTFS format. Look for your city or country name followed by "open data" and you might find interesting datasets. In the UK the National Public Transport Data Repository (NPTDR) publishes a database of every public transport journey in Great Britain for a selected week in October each year[4] (only goes until 2011 though).

[1] Tube, scheduled trips: https://hash.ai/@tfl/tfl-gtfs

[2] GTFS is a CSV-based transit data format: https://developers.google.com/transit/gtfs/reference

[3] Buses, scheduled trips: https://data.bus-data.dft.gov.uk/timetable/download/

[4] NPTDR database: https://data.gov.uk/dataset/d1f9e79f-d9db-44d0-b7b1-41c216fe...


Talking about trains and data about trains, I really dig this visualisation - https://minitokyo3d.com/




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: