Hacker News new | past | comments | ask | show | jobs | submit login
How We Built the World’s Prettiest Auto-Generated Transit Maps (medium.com/transit-app)
432 points by ant6n on Oct 8, 2016 | hide | past | favorite | 60 comments



That’s a fascinating case study. Thanks for sharing.

Not so long ago, I was designing visualisations for a different type of underlying graph structure, but one that had some similar elements in terms of “close” edges that could be drawn together, a desire to minimise edge crossings, and the like. I spent quite a while studying Beck’s famous London Underground map and its modern derivatives, and then experimenting with some of the same ideas mentioned in the article here. I too found that rendering clean, practical diagrams to show messy, real-world underlying data can be surprisingly difficult! I have a lot of respect for the Transit App team if they’ve successfully implemented algorithms that can produce output as beautiful as those examples in their general case.


Looks lovely. But their comparison with apple and google (https://medium.com/transit-app/transit-maps-apple-vs-google-...) is missing something: "works on a desktop computer". I have a surface 3 pro with chrome/firefox/edge installed, I'd love if there were some html5 web view onto this for planning trips in advance, where I quite like the larger screen size and the ability to screenshot/print just in case.


Yes, I scrolled until the end only to find out its only available in app form. I wanted to play around with it, but this made me give up. I'm not going to install an app (worse: I'll have to free some space on my phone to do it) just to play around with it for one minute and uninstall it (to get back that storage space).


FWIW, Transit on my iPhone, after using it in several large, dense cities (including San Francisco and the Bay Area), only uses up 42.5 MB.

So I doubt it would use very much on your device, especially if you only want to try it out once in your area to see how it works.


I would have installed it, but it's not even available on my platform :/


It's easy enough to run apps on a desktop using emulator's like http://andyroid.net/

I doubt their business goals involve a desktop version


Awesome material. I really love automated renderings that attempt to approach handmade quality.

A couple years back I attempted to automate our production planning. The effort failed alas (complexity and feature creep killed it) but I did make nice visualizations along the way.

Example: https://www.dropbox.com/s/jxiareohe0wi4z9/planning-example-1...


I would love to get more details on the linear-integer solve methodology, as it sounds impressive. Was the problem formulated to also work as a linear problem where the binary or integer variables were first treated as positive non-integer variables, and then checked using branch-and-cut (That's how I would do it)? Or did you do something differently?

  Fortunately, we found a different plan of attack: one
  which allowed the integer-linear-programming solver to
  explore the problem space more efficiently and find 
  optimal solutions faster. What previously took an hour, 
  now took 0.2 seconds.


It was simply modelled using a bunch of binary variables. Basically for every segment, we want to find the position of every line. So first we tried the 'traditional' approach of modelling the positions as a set of binary variables that say

    Line_i_at_position_k
It just takes a long time because there are a lot of binary variables, and the penalties derive from the binary variables in an awkward way.

Later we started using a model using variables like

    smaller_i_j
Just denoting whether line i is at a smaller position (index) than line j. You can enforce total ordering using the triangle inequality as a constraint. And the penalties very directly derive from the binary variables, because the penalties are all about things like 'apply penalty if line i is left of line j'.

Another thing to note is that New York-Washington is all connected. But some of the connections have only a single (commuter rail) line. So mathematically speaking, Washington and New York are actually independent. The MILP solver sometimes seemed to have trouble finding those independent components, so it helped providing them as separate models.


totally naive and off the top of my head...

Can your process be applied as a plugin/core function of cad programs for designing buildings/other systems...

Such that I can relatively roughly draft a layout of a floor-plan, and then have your ideas "think" out a layer for piping, electrical, lighting etc...

the idea would be to avoid physical interferences, and then take your logic to say "oh this is a floor plan and the lighting layer-elements should be within the boundary of the walls, and I am to lay them out on an NxN grid, so ill propse this layout and the designer can just adjust as needed - but I am aware of the walls so I know I can only place this many WRT the layout" and "ah, this is a wall, and a ceiling, my electrical conduit must run up/in the wall and along the ceiling, and no conduit can intersect - but I need a junction box every N feet/[condition] and my radius for each turn must be within [spec]"

Basically ML/AI assisted CAD... I think you should explore that - electrical/plumbing conduit designs effectively adhere/require your design logic....

Just a thought.


To reply to my own comment, I know that this already can be ~accomplished with, say, revit, I think there are efficiencies to be gained by what they are dount (autodesk should aquire these MOFOs...)

Basically define an elements requirements;

BusStop; Range = X Schedule = Y Frequency = Z FuelReq = AA

etc..

Then you do something like a fixture:

2X4 Fixture; Power = X T24 = Y

Then you setup a standard and a repo for people to post objects to a lib and let them select those things and just plop them on a drawing and the reqs will get calculated.

Although, I was not able to know how they calculated the cost per year for any of their examples... where does that data come from???


Was it GLPK or something else used?


We used PuLP, which in turn uses cbc (part of the COIN-OR) project. I used to use lp_solve, but cbc turned out to be able to solve more complicated problems more quickly.


This is a tour de force in data science style computational geometry. I really loved the story. And very nice maps. Wow. :)


It's an interesting problem, and it's cool that they've found a decent solution...

But I can't help wondering how often these kind of complicated transit junctions actually occur in practice? There aren't really that many complicated metro systems in the world (to my continuing disappointment), and even they often still have fairly simple junctions.

Maybe it would be cheaper and more effective to just hire a graphic designer to hardcode solutions to the worst examples?


Maybe it would be cheaper and more effective to just hire a graphic designer to hardcode solutions to the worst examples?

That’s a pragmatic solution if you’re only designing static graphics, say to print on a leaflet. If you need something more dynamic, with various elements being selectively hidden as they show in the animation just before the article’s conclusion, then beyond a certain point there are too many combinations to prepare them all manually with a sensible amount of effort.


^ That.

Plus, there are actually a lot of systems.

https://en.wikipedia.org/wiki/List_of_metro_systems Lists 160 systems. And that's only metro systems. You have to add light metro, light rail, commuter rail. And then you still only have rail. Start thinking about adding buses, and it is impossible to create all these maps by hand, at the various zoom levels.


Sure, but outside maybe the top 20, most of those systems only have simple two line intersections (even if they do have a lot of them). And a lot of other rail systems don't make sense to depict as metro lines, because they run detailed timetables rather than linear services.

Buses do usually run linear services, but they're often sufficiently dense that metro style maps are difficult to read (they're often best represented as simplified roadmaps with each road labelled with route numbers).

Also, you don't need to get everything perfect everywhere right away to start acquiring users. It can be a continuing improvement process.


I think the missing detail in this argument is where is this company trying to acquire users from? They are competing for mobile users on Android and IOS who are most likely quite happy with Google and Apple maps.

So "good enough" may not actually be to get people to try something new.


If they're doing it for bus systems too, I'd imagine the complexity is much greater. Cities with two or three metro lines might have dozens of bus lines.


Buses do usually run linear services, but they're often sufficiently dense that metro style maps are difficult to read

Unless you can selectively hide/show bus routes, I guess. Which is something that this app allows you to do.


One advantage of their system over a static map is that it varies with zoom level, as lines that are near but not touching when zoomed in would end up on top of each other when zoomed out. The map can display several types of transit simultaneously, like subways and commuter rail, which can be toggled on or off. So you'd end up needing to hardcode many different versions with different settings and zoom levels to get similar results.


They are deriving a database from OpenStreetMap. Has anybody found where they share that data? It's likely that the OSM data license (the ODBL) requires them to publish the derived database.


Generated map tiles are "produced work" and don't have to be licensed under ODbL. If they make their (raw) data available as download would it have to be under ODbL.

Explained in 3b and 3c on http://wiki.openstreetmap.org/wiki/Legal_FAQ


IANAL, but that's not how I read the FAQ or the license:

    4.6 Access to Derivative Databases. If You Publicly Use a Derivative
    Database or a Produced Work from a Derivative Database, You must also
    offer to recipients of the Derivative Database or Produced Work a copy
    in a machine readable form of:

          a. The entire Derivative Database; or

          b. A file containing all of the alterations made to the Database or
    the method of making the alterations to the Database (such as an
    algorithm), including any additional Contents, that make up all the
    differences between the Database and the Derivative Database.

    The Derivative Database (under a.) or alteration file (under b.) must be
    available at no more than a reasonable production cost for physical
    distributions and free of charge if distributed over the internet.


I worded my comment softly because I don't think I have a strong opinion about this stuff, but I'm not sure it is a work produced strictly from the OpenStreetMap database. They take GTFS databases from transit systems and use them to build routes out of OpenStreetMap data. They then render something based on this combined data. So the correct characterization may be that the final work is produced from a derivative database (rather than produced from OpenStreetMap). If that is correct, then the ODBL would apply to that database (as is explained in 3b,3c,3d...).


Where you see a derivative, I am seeing cartography applied to unchanged source data. I don't see any requirement in the ODbL to publish cartography styles under the same license.


The routes are a combination of transit system data and OpenStreetMap geometries.


This is fascinating stuff, and nicely polished work.

We would like to develop something similar, but a non map overlay version, using our transit data at Rome2rio. Basically code to auto generate something like this for all of our 4,000 transport operators: http://content.tfl.gov.uk/standard-tube-map.pdf

We've been talking about it for a while but don't have the constraint layout expertise to do it internally.

Anyone interested in working for us to tackle this problem?


Hey, might be interesting to solve. We are solving something like this for India, as a part of a larger initiative. My email is takenottie at google's mail.


Have you had any reports where the algorithmically snapped-to-OSM route maps diverge from the actual bus/train routing, and if so, how do you handle that?


Where there is divergence, it's usually due to bugs. The 'matching' is basically just a trip planner along OSM, so generally it finds some trips that travel along the shape points of the transit line.

If there's no trip found within a reasonable distance, then we generate errors and the data integrators will have to take a look.

We've fixed a couple of issues in OSM already, usually it was different rail stretches (ways) not being connected.


Great write up! I just downloaded the app and found a nice Easter egg. You can say you are the first app to support Hyperloop? :)


In this example image: https://d262ilb51hltx0.cloudfront.net/max/2048/1*bSjX6T0OaMX...

Why doesn’t the integer linear ordering put the orange line completely inside the loop?

That would remove 3 crossing sections, and look better.


It wouldn't remove 3 crossing sections (see bottom right) and ir probably decided to cross on the top left as opposed on the bottom right


I think so as well. There are 3 crossings either way, so the solver just picks one solution.

As a human you can look at the solution and decide it's not so pretty, but the question is how to model this as a general set of penalties. These also have to be linear - and if you have too many complex constraints, the solver may not finish in time.


Also, we see only a small part of the map. Outside this part there may be more crossings.


Yes, it would? One would just have to move the point where it splits in the bottom right a bit up.


Maybe it is the actual point where the track splits hence it can't be moved?


I thought about that later as well, but as the map isn’t exactly accurate, they could likely have moved it a tiny bit for that, too.


Why did you jump straight to MILP for ordering the lines of segments? It seems like it would be more obvious to first try ordering the routes by id for each segment. That would guarantee that routes are always ordered the same way relatively to each other and would eliminate the problem in the before/after image with much less effort.


Pretty neat. I like the idea of using pixel space. They could break it out to tiles to parallelize things instead of handling things globally if that's a bottleneck for them.


Technically yes. But you have to be careful - if you want the skeletons to match up at the seams, you have to make sure the overlap (overscan) is larger than the radius of any white area that's produced in the data -- basically the the 'skeletonization' thins out one pixel at every step, so with every iteration the 'effects' can travel one pixel far.

Basically you have to make sure that the overlap is larger than the number of iterations of the data, so it's a bit messy. In the meantime, our little sparse image library turned out to be pretty effective.


Beautiful, yes, but I'm a form follows function guy, and at least from looking at their screenshots of the Chicago Loop, I'm left thinking their version of the map is quite a bit less useful than the (admittedly unsightly and cluttered) official one[1].

Some things that are iffy or missing from a functional perspective:

• Any indication that the red and blue line stations are connected by tunnels at Lake/Washington and Jackson.

• For that matter, the fact that the Jackson blue line station exists in the first place - it's obscured by the B in the street label for Jackson Boulevard. Same goes for the LaSalle blue line and State red line stations, and the Washington blue line station is also iffy. That's over 1/4 of the stations in the map's area hidden under street labels.

• Which stations have elevators? Most of the ones downtown don't.

• Which directions are the trains traveling in? With the exception of the green line, all of the elevated trains go only one way around the loop.

• Color matters, especially on a system where all the trains are identified by color. Why did they use a dark mauve to indicate the pink line? In the real system it's indicated by a bright bubblegum pink. They've created a big opportunity for confusion with the purple line.

• A human touch might be able to make some better arrangement choices. The CTA map crosses the blue line over the green and pink lines a little outside of the loop, at the point where it diverges from the other two. That's a much better choice than trying to do it right in the middle of the already jumbled mess that is the confluence of all of the trains in Chicago's transit system (save one small spur line out in the suburbs) along a three block section of Lake street.

• Clearly indicating that the purple line operates differently from the others is also useful, and might save someone who's unfamiliar with the system from a lot of time spent waiting on the wrong side of the station while watching a bunch of brown lines pass by.

I'll grant the loop section of the L system may well be the most fiddly, nit-picky light rail mapping problem in the world, and the datasets they were working with might not have given all the detail they needed. (On the other hand, that thing with hiding stations under street names feels pretty egregious to me.) I guess what I'm really going for here is, when it comes to drawing maps, I still think involving a human hand in the process can make an enormous difference in the quality of the final product.

[1]: http://www.transitchicago.com/assets/1/clickable_system_map/...


Great points...

It would be great if the system could take other data inputs (from trusted sources) to say that point X on open map has [service]

And illustrate it - [escalator, elevator, tunnel, handicap, food, coffee, restroom] etc... and make it so that it would accept updates from users....

ie users travel along a line, and they report what services/things/shops/what-not are available... in the same logic that WAZE uses user input for traffic incidents - but this for infra/social-stuff


Here, use this to make a bulleted list •, there's no markup support for it. Leading whitespace triggers the code formatting that breaks wrapping so remove that too.


What's wrong with the way it is now? Wrapping looks fine to me.


I had to go through several rounds of edits to get the wrapping right - HN doesn't automatically wrap code blocks.

Still, I prefer using code blocks to do bulleted lists on HN. Not sure why. Maybe its that thing I just confessed to having for laborious hand editing.

Probably does mean I'm being a jerk to mobile users, though. Eh, I can go change it.


Amazing process to build such an impressive result. Man I wish I could work on problems like this. This would making come to work fun again.


For what it's worth, it says that they are hiring :)


Are there any plans to make these programmatically-generated transit maps available as posters or dead-tree paper maps?


In the map for Boston, two of the four core T stations are connected by a pedestrian tunnel. That tunnel isn't shown.


ant6n, there's something weird going on with images on your homepage - http://imgur.com/a/CGse2 - they all appear shrunk horizontally. That's in recent Firefox on Windows.


Thx for the tip. I told the designer webby people.


Um it's completely wrong, the purple line doesn't run west.


Is there a publicly available list of cities?



p.s. TransitApp guys - there's a broken link from blog article to: https://transitapp.com/regions


Awesome work. I use the app frequently.


Very nice




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: