Hacker News new | past | comments | ask | show | jobs | submit login
Using Hilbert Curves to 100% Zelda (merovius.de)
164 points by Merovius on July 22, 2017 | hide | past | favorite | 61 comments



For reference: The generic name what op does is "spatial indexing". There is an algorithm for efficiently working with such data called Hilbert R-tree (https://en.wikipedia.org/wiki/Hilbert_R-tree). But there are alternatives (see https://en.wikipedia.org/wiki/Spatial_database).

Many databases nowadays contain functions to these operations (e.g. https://dev.mysql.com/doc/refman/5.7/en/spatial-analysis-fun...)


They can even be explained to a surprisingly low tech operator, with pre-calculation at a high tech level: http://www2.isye.gatech.edu/~jjb/research/mow/mow.pdf


> There might also be better approaches than Hilbert Curves. For example, we could view it as an instance of the Traveling Salesman Problem with a couple of hundred points; it should be possible to have a good heuristic solution for that. On the other hand, a TSP solution doesn't necessarily only have short jumps, so it might not be that good?

TSP is actually very amenable to heuristics and state of the art branch-and-bound algorithms can often find optimal solutions even for instances with thousands of points.

Does anyone here know if there is a good open-source solver that we could throw this problem instance at?


The issue isn't about approximating a TSP instance, but ensuring that points aren't too far apart from each other, even at the cost of increasing total length.

I can't immediately think of a reduction that would factor this in so the best thing may be to just reduce this to the appropriate ILP instance and use a mixed ILP approximator like GLPK (or Gurobi is free for students too).

Incidentally, achieving a polynomially-sized ILP formulation for TSP isn't quite obvious. Wikipedia has a good explanation of how to do this: https://en.wikipedia.org/wiki/Travelling_salesman_problem#In.... I'm not sure if the metric TSP has a simpler formulation.

Edit: Now that I think about it, it may be difficult to express the constraint that edges should be balanced out in a linear way. I happened to write about a similar problem a while ago (https://modalduality.org/posts/optimizing-color-coding/), I ended up giving up on finding a linear formulation and went for sequential least squares instead.


If you want to minimize the long distance walks add an exponential weight for distance then solve. There are many ways to go about this stuff, but the goal is to map your preferences to the weight function.


Right, but then the problem is no longer linear and approximating is more difficult. I'm not exactly sure what you mean by "exponential weight" in a linear program, do you have an example?


I am not sure what your asking? Your just assigning constants for weights on each trip a>b = K1, a>c = K2.

I am saying you may map K1 as √((x1-x2)^2+(y1-y2)^2) to find least distance, but TSP allows for arbitrary constants. So remove the √ and long trips will be strongly avoided.

PS: As far as I know you can use any arbitrary set of constants then use a linear solver. Or am I forgetting about something?


So say you have a graph on distance, d, that you want to weight based on a preference, p, from 0-5 going from least preferred to most preferred.

You might make your weight function as w(d, p) = d * (2 ^ p)


That's fine, but `p` is a variable, not a constant, in the original post's case, right?


That doesn't seem necessary. Picking a value of p just selects how much you prefer short hope to shorter total distance.


Concorde is open source and I believe holds the record for largest instance of TSP solved optimally (85,900 cities): http://www.math.uwaterloo.ca/tsp/concorde.html

Note that the licensing terms do restrict its free use to "academic research."


> Note that the licensing terms do restrict its free use to "academic research."

Then it is not "open source" (violates "No Discrimination Against Fields of Endeavor") nor is it free software (violates the freedom to use for any purpose). It's proprietary -- just because the source code is available doesn't make something "open source" nor does it make it free software.


Unfortunately, such terms are ubiquitous in operations research.


OptaPlanner is an open source constraint satisfaction solver (https://www.optaplanner.org/). Here is an explanation of how it can be used to solve the Vehicle Routing Problem, closely related to TSP: https://www.optaplanner.org/learn/useCases/vehicleRoutingPro...


You might get a laugh out of the fact that Hilbert curves are used to approximate Euclidean TSP. If I remember correctly it yields a log n approximation where n is the number of cities.


> TSP is actually very amenable to heuristics and state of the art branch-and-bound algorithms can often find optimal solutions even for instances with thousands of points.

In own research (inverse problems related to signal recovery), I've noticed that a lot of NP-hard problems are actually easily solvable given that the data has some kind of structure to it. I know very little about computational complexity theory, but I've read some papers describing how there is often a phase transition where a high SNR puts the problem in P territory, but a sufficiently low SNR moves the problem moves to NP territory (but it's still solvable). Beyond the phase transition, a solution is information theoretically impossible. I wonder if most of these real world TSP problems actually lie in P, but I don't know how one would go about showing that. (I haven't performed a literature search, but I wouldn't be surprised if someone has already demonstrated this.)


That's very unlikely to be true in general, as the proofs that a problem lives in NP usually have a very high signal-to-noise ratio -- i.e. in order to prove that solving this problem is at least as hard as solving an arbitrary boolean circuit, we need to find a subset of the problem which behaves like wires and boolean gates do, and the resulting "circuits" are very heavily structured.


If you wanted to guarantee you "only have short jumps", it would probably be good to consider that you can teleport to the top of the towers and hang-glide down from them (at the cost of some loading time) much more easily than climbing up a cliff.


Given that this wasn't about actually walking the map, but about just finding the locations still missing (and those locations where few and sparse), this really isn't necessary.

It would be, if you'd want to do a speedrun, though.


TSP is actually very amenable to heuristics and state of the art branch-and-bound algorithms can often find optimal solutions even for instances with thousands of points.

With that many points, how do you prove that your solution is optimal?


Often in approximations you can prove an upper bound on the optimal solution, even if you don't know the optimal solution itself.

An easy example is by using duality when solving linear programs.

And sometimes your upper bound is close enough to what you already have so you can just say something like "well my solution gets 190 points, I have an upper bound of 190.5 points, and all point rewards are integral so I must have the optimal solution."


TSP (in the traditional sense of finding an optimal solution) is NP-complete, so (by definition of NP-complete) it's much easier to check a proposed correct answer than to find that answer.


Hmm. So there's a polynomial-time method for verifying that you've found the shortest path in TSP, but not for finding it in the first place?


Yes. Crash-course complexity theory:

This is easy to understand with integer factorization: It is very hard to factor large integers. Yet, if I where to give you two numbers, it is very easy to verify that their product is the integer you where looking at. So factorization is easy to verify, but hard to do (side-note: This is an intentional illustrative simplification. Integer factorization is not known to actually be "hard" and it is not known whether "hard" is even a thing. See below).

And that's the definition of an NP problem: It has a polynomial time deterministic algorithm to verify a solution.

An NP-complete problem is a problem that is at least as hard as any NP problem. That is, if you have an algorithm for an NP-complete problem, you can take any NP problem, transform it efficiently into that problem, use your algorithm and transform the solution efficiently back. Thus, if you've solved an NP-complete problem efficiently, you can solve all NP problems efficiently. TSP is NP-complete. Whether or not Integer Factorization is NP-complete is unknown.

Lastly, there is the question of whether there are problems that are in NP (that is, have a polynomial algorithm to verify a solution) but not in P (that is, have a polynomial algorithm to find a solution). That's the famous P vs. NP question, which is currently undecided.


(I experimented with using space filling curves in City Skylines (a sim city clone) a while ago; here's awriteup for any who might be interested: https://inventingsituations.net/2015/11/28/space-filling-cur... )


This is really interesting, thanks for linking :)


> So I started on the onerous task of finding the last 17 locations.

A French guy (Xalikah) who did the first, manually planned 100% speedrun of the game had a similar problem; he spent a few hours with a couple folks helping him check his map for obscure place names he was missing, and when he was at the last one, someone joked "99.81% speedrun," and people were suggesting he do a slow systematic scroll over the map so they could look for missing placenames. He wound up sleeping five hours or so and in the morning remembered that he'd skipped using some bridge somewhere. 49 hours!

It's kinda wild that game worlds are now large enough that you can reasonably use algorithms not only to write games but to get 100% completion playing them as well.


> It's kinda wild that game worlds are now large enough that

I haven't played Breath of the Wild yet, but eg Chrono Trigger has an enormous game world as well. Did they really get much bigger since?


It is hard to make an apples to apples comparison against Chrono Trigger, but you'd not be grossly wrong if you described BoW as having a 100X bigger map. It is also much, much more populated with things to do, enemies to beat, mushrooms to pick, etc.


> mushrooms to pick

And durian. Delicious, delicious durian.


2-opt can probably improve on the Hilbert solution (https://en.wikipedia.org/wiki/2-opt)

Is the world planar? There's an epsilon approximation scheme for planar TSP that's linear in the graph size (but exponential in 1/ε)


I'm confused by the grammar - is "100%" being used as a verb? Is that correct in English?


Strictly speaking you can use any noun as a verb in English. The grammar allows for it, though most nouns will not have a meaningful verb counterpart, semantically, or it will not have a usage that most people will understand (i.e. it won't be colloquial)

So in a speed running context, I would definitely say it is acceptable, but it might not be in the greater population. But the grammar nazi have nothing on this one.


Yes it is being used as a verb. It's at least common terminology among speedrunners.

Many games, including the one in question, have an in-game progress tracker. To "100%" the game is to complete everything necessary to make it reach 100%.

For games without such a tracker, the community usually reaches a consensus on what is considered to be 100%.


If it isn't, I'm invoking my right to English incorrectly, given that it's not my native language ;)

(yes, I'm being toung-in-cheek by using "English" as a verb here :) )


There's even a relevant Calvin and Hobbes comic:

http://www.gocomics.com/calvinandhobbes/1993/01/25


Why is the Hilbert Curve any better in this situation than say lexicographical order on (x,y)?


It's pretty easy to imagine a counter-example that would have you traverse the length of the map, when the best route would be to detour on an existing traversal. Consider:

    1...2...3...4...5
    7...............6
    8.......9........
    13..12.....11..10
versus

    1...2...3...4...5
    13..............6
    12......9........
    11..10......8...7


Good point, though I'm not convinced Hilbert really avoids this. For example, look at n=3 here:

http://www.texample.net/media/tikz/examples/PNG/hilbert-curv...

This might lead to say

. . . . . . . .

1 . . . . . . 2

4 . . . . . . 3

. . . . . . . .

when

. . . . . . . .

1 . . . . . . 4

2 . . . . . . 3

. . . . . . . .

is more efficient.

Generally speaking, it seems this is basically TSP.


I tried describing here https://www.reddit.com/r/programming/comments/6oxra8/using_h... why I think that this is theoretically a TSP problem, I don't consider that necessarily the best way to treat it. tl;dr is, that at least to me, at the time, for this problem, the speedup of an optimal solution wouldn't have outweighed the additional thinking time to get to that optimal solution (and, most importantly, to actually make it useful; after all, I wasn't actually drawing a path, I was outputting a list of names).

You can even see that in the argument I added to the post; while a zig-zag line might've been, in theory, a stupider, easier way to solve the problem, in practice it would've meant spending some time thinking about the right discretization. With a Hilbert-curve, I could just choose some n that is definitely large enough and be done with it and the additional cost of the more complicated curve doesn't really factor in, as I could just copy-paste it anyway.

But yes. The theoretical problem is a TSP and with the right set of tools, I could've added some efficiency to the search by viewing it as such.


I think is TSP only if you start by the first point in the ordered list. If you start any other point, you will be going in only one direction (top-bottom/left-right or bottom-top/left-right) (See how the space is filled)


Given the precision of the points, lexicographical order on (x,y) would essentially be lexicographical order on just x, which then induces a lot of jumping around the map.


Yeah, sorry I meant like lexicographic on x, then swapping between lexi and reverse-lexi on y. Basically snaking back and forth.


Basically, what others have replied is true. But, in essence, yes, I now believe that this would've worked too, if you choose an appropriate discretization of the grid. Hilbert Curves probably were simply the Hammer I had lying around. In the end, it didn't really matter, given that the actual Hilbert Curve part was mostly copy-pasted, but at least it gave me the opportunity to actually use them (and learn from it).


It's a long walk from (1, 10) to (2, 1).


Yeah sorry I meant like snaking back and forth, which you're right is not the same as pure lexicographic.


Yeah, the length of such a snake is equal to the Hilbert curve.


Yes, but we wouldn't actually walking the whole snake, just as we are not walking the whole Hilbert curve. We are walking the point cloud in the order dictated by either, and those will, in general, differ.


I have added a section about this question to the post. Let me know what you think :)


d


The returned file turns out to not actually be JSON (that'd be too easy, I guess) but some kind of javascript-code which is then probably eventually eval'd to get the data:

    /**/jQuery3110644358575
    2152035_1500757689075(
    /* json-data */)
Is that just JSONP?


Yes this is just a standard JSONP response as served through jQuery; not sure why this wasn't clarified.


My guess is author has never heard of JSONP before. Obligatory: https://xkcd.com/1053/


I have seen JSONP. But I'm not a frontend dev and in my experience the web suffers from a persistent overspecialization of terms. I didn't mention the term JSONP, because I didn't want to end up calling something JSONP just to have someone point out, that this isn't really JSONP, but floobleworp, which is when you use jQuery and add the comment-symbols to make it safe against XSS or something in that regard.

Also, I still consider JSONP an incredibly gross idea :)


Spoilers! Hacker news bot posting the map image into a facebook post was a good way to make sure that I know now where all south shrines are :( https://www.facebook.com/hnbot/posts/1427803187274958


excellent! i wonder if there are more algorithms that can be used in this game. binary search is an algorithm that can be easily done by a person (doesn't need a ton of iterations to locate result).


Binary search would make a good bar bet. “Pick a number between 1 and 1000. I’ll make guesses, and you just tell me whether they’re too low or too high. I’ll bet you $10 that I can guess the number in only 10 tries or less.”


A Zelda game that depends on Google Maps. I'm getting old.


The game doesn't. He references the zeldadungeon.net map which uses google maps javascript api. Nintendo would be full of morons if they ever did something like that. Just read the fine print.


Except that this doesn't take into account elevation.


As this was only about efficiently enumerating the missing locations, not walking them, that's fine. I.e. I had to find the ~5 needles in the haystack of hundreds that I needed to visit. Given that sparseness, it was most efficient to just jump to the nearest warp point and directly go there.

Elevation and the like would be more interesting to figure out what's the most efficient way to collect all Korok Seeds, for example, when the interesting point cloud isn't as sparse. But then you'd definitely want to view it as a TSP problem anyway.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: