Hacker News new | past | comments | ask | show | jobs | submit login
Fixing Google Map Transit Feed Mistakes in Taiwan (taipeiurbanism.com)
140 points by danso on Jan 23, 2020 | hide | past | favorite | 37 comments



Somethings up with transit data feeds in general...

Creating a transit data feed typically requires local knowledge, technical knowledge of data feed formats, and access to make changes to the data feed.

Actual transit providers typically lack the technical knowledge, especially for making day-to-day service updates. Transit enthusiasts lack the access to make changes to the feed. Google and mapping companies lack the local knowledge.

End result: Probably half the worlds population's local bus/train isn't accurate/present on Google/Apple/OSM maps.

How can this be solved? Perhaps some kind of feed-wiki so the three interested parties can do human collaboration?


> Actual transit providers typically lack the technical knowledge

This is actually getting better, as most of the transit providers already rely on software companies that provide their own route planning software for passengers and/or administrative software like tools to create the actual schedule with. These companies are slowly getting better in providing standardized feeds.

I am skeptical that something like a feed-wiki would work, both because of vandalism, and because schedules are usually incredibly complex. You may have noticed that your bus starting at 14:30 from station X was scheduled incorrectly, but does this service has this mistake every day? What about the 3 days a year where this bus has a slightly different schedule? If the bus travels every 5 minutes, and each trip has this mistake, you have to manually check hundreds of trips per day if they contain this mistake. Automation is difficult here, because trips that occur frequently are very often not grouped by any mechanism (although for example GTFS provides such mechanisms).

I think the better way is a legislative change which requires all transit companies to submit their schedule data to a central authority, which then publishes the merged schedule.

There is now (since the beginning of 2020) a new EU regulation [0] which basically pushes for an EU-wide multimodal route planner and schedule dataset (published in a special format, NeTEx). This makes it now possible to rely on the data of existing national schedule aggregators, which have been checking and merging timetables of all national transit providers for decades. For example, in Switzerland, the national railway company SBB does this (GTFS feed published here [1]), in Austria its also the national railway company ÖBB, and in Germany, its Deutsche Bahn (GTFS feed published here [2]).

[0] https://eur-lex.europa.eu/eli/reg_del/2017/1926/oj

[1] https://gtfs.geops.ch/

[2] https://gtfs.de


>most of the transit providers already rely on software companies that provide their own route planning software

This doesn't seem to apply to mid size networks and smaller.

For example, when tree falls and blocks a road, in a midsize network there'll probably be one admin-person in the office who decides what to do about it. Usually they'll just print off a paper sign saying "All services to X won't go past Y today", then ask a driver of another service to put that sign at each stop. They'll log into the website and put a banner at the top saying the same, and they'll ring the local radio station to announce it.

At no point does that info ever go into a computer in a machine-parsable form. Even if it did, timetables are probably saved in a big MS Word document.


Indeed, but you are already one step ahead with real-time updates. This is an entirely different beast than static schedule data. In my experience (I occasionally talk to people in the industry), even providers of public transportation in cities with over 500,000 inhabitants have trouble providing reliable and machine-readable real-time information (like: this trip is cancelled today, this trip is 5 minutes delayed at station X, there is an additional trip Y today, expect delays this afternoon because of Z, ...). At least where I come from, the solution was that the state government decided to build a centralized infrastructure for all providers to use. Real-time information is displayed there on a website which also provides a real-time route planning. The real-time data is also available in machine-readable form.

It is a lot easier to convince a small bus operator in a rural area to just enter cancellations / delays into a website than it is to convince him to buy and administer servers and software to built its own real-time service.


The Transit app crowd-sources this from users who are using the "Go" feature in select markets.

There's also the Tripshot model -- an enterprise product where drivers install an app on a cell phone to track ridership, and it also sends the gps coordinates along.

That said, the other arcane part of transit directions is that if you update or break your feed, you have to wait hours or days before the result shows up on Google. (Sigh, batch processing...)


You know you can request Google scrape your 'static' feed at any interval you like, down to 10 minutes IIRC? Contact the maps-transit team with your feed URL and they'll update it for you.

They only scrape daily by default because some transit providers host their multi-gigabyte GTFS feeds for all bus arrivals for the next century on an ISDN line...


Transit is also run by a bunch of transit geeks, who also invest a lot of time in getting local knowledge and fixing feeds. Just a simple example, they will add colors to the transit lines in a feed, so that in the app they show up the same way as on maps etc.


A major issue with relying on drivers is that if the real-time position of a driver is known, you can see both what they're doing and what they're not doing. Which is bad if you want to get away with, say, taking longer breaks than you're supposed to.

Anecdotally, at a local system I once used there was an issue where drivers were intentionally not turning on the locating tracking, or configuring it incorrectly, because of this.


As an aside, sometimes local data-providers prohibit use of transit data by Google, with the result that you have to use an inferior route planning tool.


Google can not get it right in LA. Ghost busses are common, either on the app and not there in reality or right in front of you and not there in the app. The times are basically fixed; if it says the buss is 12 mins away, it will still be 12 minutes away 12 minutes later. If there are two routes that can take you down the exact same road to the exact same destination, you will be randomly assigned one route due to all of the above problems and be unable to view any information at all about the other, as if it doesn't exist.


Boston's public transportation developers are great at what they do. Their v3 API is extremely good. And that's why Google Maps and all the other transportation maps for the Boston area are very good. The map and transit app developers can only do so much themselves.


This will come out as cynical, I know, but I have to say that when I see people so excited about freely contributing their time and efforts to the product of a global behemoth, something feels off. I know we all have a better free service as a result, but still this is basically free work. If anything, Google has been the most successful company ever in convincing people to work for free (and hand over their data). No judgement intended, just an observation.


GTFS feeds are open. The ‘G’ stands for general. This person is excited about freely fixing this issue for _everyone_, they’ve just chosen to frame their blogpost around the most popular surface for that feed.

huge disclaimer: I work at google, and on transit at that


well, the "G" originally stood for Google.


Somebody has to write the standards. Transit agencies don't care about making a data format that works for other transit agencies, as it doesn't affect them. Only companies that need data from multiple transit agencies will do this kind of work.

There are two approaches to aggregating data. You can either treat each agency as a unique snowflake, or you can convince all agencies to use the same format. The former approach means that all work you do is only for you, and you've built up an interesting base of intellectual property in an industry that doesn't matter. (There is no money in providing transit directions.) The latter means that anyone can show up and do neat things with transit data. (I have my own little webpage, jrock.us/mta.html that uses this data to provide me with exactly the interface I want. Do you think the MTA wanted to hire programmers to make this for people like me? Absolutely not. But they did make it for Google, and now I benefit.)


Are all the GTFS feeds that google maps consumes necessarily open? I've been searching for a GTFS datasource for TfL (Transport for London) data but couldn't find anything. Google maps of course has that data.


Totally agree. If you really want to help the society, consider contributing to openstreetmap.


While it’s true that there will be a lot of visitors using Google Maps, I still think that it’s a shame that we’ve ended up reliant on a private company to direct us around. In the ideal longer term, it’d be better to use this effort to improve crowd-sourced or collaborative mapping solutions.


The blocker to open transit feeds is mostly transit providers.

Most transit providers don't see timetables and routes as something to be distributed for free to everyone. They see it as a revenue opportunity - they will only make schedules available in their own app (which only routes on transit from their company), or sold for money.

For smaller transit providers, they still want to be paid to produce the transit feeds, since they typically have to hire a programmer or buy software to make the transit feeds, and will only take on that cost if someone will pay for it - after all, they're happy with a paper printed timetable at the bus station.

In some cases, there are also bureaucratic hurdles. In one case I would really like to name but can't, a (very big) transit provider didn't own the copyright on the names of their own stations - they only had a license to use the names within their own country. They made a transit data feed, but since the feed contained station names, they needed to sign a contract with anyone who used it assuring them the names of the stations would never leave the country. Obviously no app can ever guarantee the names of the stations would never leave the country, so the city had no transit mapping for ~a decade.


Google Maps ingests public data feeds that anybody can use and many other transit aggregators do use: https://developers.google.com/transit/gtfs

This is much the same as AMP, which any link aggregator can use and many link aggregators do use, and dissimilar to something like Apple News API, which requires direct integration with Apple.


I'm only aware of one open-source public transport lookup app, which is Öffi https://oeffi.schildbach.de/index.html

It doesn't have data for Taiwan, unfortunately. Are there any that do?

Aside: Audrey Tang is also on HN, though not active much lately: https://news.ycombinator.com/user?id=audreyt It sometimes strikes me as surreal that powerful people actually hang out on the same websites as people like you and me.


There's also Transportr, they use the same library for accessing transit data though. Transportr has some really good guides here, how it works under the hood and how to help with contributing other networks:

https://transportr.app/contribute/


Yes, and in this case there is enough information in airports and train stations to get from one point to another without ever needing to use internet. When in doubt (which I precisely was yesterday to find a station reached only by local train), staff is here to help. Sad if people don’t consider this option, which can also be useful to save money.


This is true for the metro, but bus lines require Chinese for the most part. There are some really good apps for transport in Taipei.


> It’s unclear how long the feed was incorrect, but the first time I noticed the error was in 2018.

My first visit after the new Taoyuan line was completed was in July 2017, and I'm pretty sure it was incorrect then, too. A local told me about the new line and said it was so much better, but when I tried to use GMaps to plot the trip, it showed as much longer than the older, more complicated options (and yes, I did end up taking a taxi).

On my more recent trips I've ignored GMaps and taken the Taoyuan line to/from the airport, and it's so much nicer. If you're going somewhere near Taipei Main Station, the total trip time is roughly the same as that of a taxi; even if you're not, the cost savings is nice.


Love it, someone who cares and is willing to drive the issue all the way home.

I'm currently visiting San Francisco (SF) and experienced an opposite issue where GMaps reported cheaper and it ended up being more expensive. The BART train is a relatively fast, very cost effective way to get from SFO to downtown SF, Powell station in particular.

I was debating Uber at $30-$40, but looked at GMaps for public and it told me $2.50 to make that trip! Sure it takes 40 minutes with a bit more sketch, but I'm visiting, I'm willing and it's cheap.

Loaded up my Clipper card (BART's transit payment card) and went - ultra surprised when exited the turn style that my charge was $9.80.

The to/from the airport is significantly more expensive, which is understandable, but $9.80 vs $30 is much different than $2.50 vs $30.

Mildly frustrating, but in the end, not the end of the world.


Looks like this is a known issue due to a misconfiguration of BART's transit feed.

(Source: I work at Google and located the relevant bug, but I don't work on maps.)


Looks like it is still wrong too (says $2.60).

Also, SamTrans bus takes twice as long but is $2.25.


When I was in Taiwan this past summer we took Uber to the airport from Taipei. I'm pretty sure our decision was based on in part on this misinformation. Well, now I'll know for our next visit!


My preferred method is a bus that gives you a great view of the mountains on your way into Taipei. Plus they're like five bucks or something and take an hour just like all the other ways.


Except that the train actually takes 36 minutes, which is most of the point of TFA.


Add on getting to and then waiting for the train to leave... The bus depot is right outside the terminal exit.

I mean yes trains good, just saying.


Exactly too late for me.

I've been there in October and the data was not updated yet. I took the bus (which was shown as being faster!) and got stuck in a traffic for 2h. I arrived at my hotel exhausted and frustrated.


I've got a strategy that's worked well almost everywhere in the world: never take road-based transport if there's a rail option, even if the rail option supposedly takes a bit longer.

Buses are less reliable, prone to traffic, less comfortable (especially if crowded)...


What I don't understand is that Google sees exactly how long a trip with the train takes and could have automatically fixed the estimate to the correct one...


Google Maps entirely relies on the data given by transit agencies through their GTFS feeds, whether they be static or real-time. It doesn't do the same type of calculation that the driving navigation does where they average travel times based on other users' movement.

There are some other apps like Transit App that attempt to crowdsource the same data to reflect ETAs.


I only use CityMapper now in Paris

1) I don't want to feed the Google behemoth

2) It's very very accurate, even tells if you should get in the back/center/front to be closer to next commute/exit

3) They joke about strikes to brighten a bit those shitty days

https://citymapper.com




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: