Hacker News new | past | comments | ask | show | jobs | submit login
Algorithm Solves Graph Isomorphism in Record Time (2015) (quantamagazine.org)
212 points by howard941 on May 24, 2019 | hide | past | favorite | 72 comments



Here's the updated paper with some significant corrections http://people.cs.uchicago.edu/~laci/upcc-fix.pdf


There's an expository paper by Harald Helfgott that goes into earlier work, more foundational stuff, etc — even has exercises: https://arxiv.org/abs/1710.04574

(Originally presented for the Bourbaki Seminar, so not exactly for the non-mathematician, but has more context than reading the paper directly.)


Those are just the corrections, not the full paper.


This is the full paper the OP was referring to: http://people.cs.uchicago.edu/~laci/17groups/version2.1.pdf

The fixes linked above are mentioned in the WARNING below the abstract and explained at 18.1.


On page 2, in the "coherent configuration" section, what does the x with the - through it mean? I haven't been able to search my way to an answer


Slightly off-topic, this article has one of the best expositions I've seen (the Arthur-Merlin stuff) of P, NP, NP-complete, witnesses, and the idea of "evidence" for what complexity class the problem is in.


Absolutely, I think the author did a magnificent job!. Reading the article, all the intricacies (P/NP, polynomial/exponential, blind test, graph coloring) are quite clear and easy to comprehend.

Here are some gentle articles on the zero-knowledge in case anyone is curious:

https://blog.cryptographyengineering.com/2014/11/27/zero-kno...

https://blockgeeks.com/guides/what-is-zksnarks/


Unfortunately I found all of the story-like exposition infuriating: they never even say what the new conjectured lower bound is.


Likely it's the lower bound (of something) that an important conjecture in the paper gives? That is, it's not some new kind of a lower bound, but a way to refer to it?


100% this. One of the cleanest descriptions of key insights and broader scientific landscape ("glide path" from quasipolynomial to P) I have seen. Erica, if you are seeing this, nice work and I'll be watching for more.


Agreed! I was already prepared to be annoyed by sketchy metaphors that didn't quite make sense, but this was delightful and very informative to read.


"That would be the most interesting possibility, Trevisan said, since it would make graph isomorphism the first natural problem to have a quasi-polynomial algorithm but no polynomial algorithm. "

"Proving it would amount to solving the P versus NP problem, since it would mean that graph isomorphism separates the problems in P from the NP-complete problems."

(exp((log n)^O(1)))

Interesting that there could be a much more varied landscape than was previously thought.


One thing that's bothered me ever since I heard about complexity theory is why there is no explanation given as to why polynomial and exponential complexities are given such special status. It's just taken for granted that polynomials and exponential are the 2 classes you care about... with no explanation as to why the rest aren't interesting.

EDIT: Edited to clarify, sorry the question was unclear. Thanks for all the replies, but yes, I know the basics :-) I'm not asking about why polynomial complexities themselves are special. Rather, I was trying to ask why we only ever talk about polynomials and exponentials. The only thing that's ever justified is polynomial-time reducibility within a class (which also inherently defines the P class itself). That part is fine. But then once that's established, people suddenly start talking about exponentials. Sorry, but what just happened? There are an infinite number of complexities in between to consider! Why should I suddenly not care about everything in between? Why am I not taught a single algorithm whose complexity is between these? is my question.


Well in mathematics there is a somewhat similar phenomenon concerning the growth of finitely generated groups. If you have a finitely generated group, pick a generating set as an alphabet and a natural n, then ask how many elements of the group you can build by using only words of length at most n. This is the growth function, whose growth does not change if you pick another generating set.

See [1] the Tits alternative which implies that if your group is linear (huh) then the growth function is either polynomial or exponential. Then there is a result by Gromov [2] which says that if the growth is polynomial then the group is virtually nilpotent. There was a long standing question if there exist finitely generated groups with an intermediate growth and the first example was found [3].

So it's not interesting until it becomes interesting due to a newly discovered fact.

[1] https://en.wikipedia.org/wiki/Tits_alternative

[2] https://en.wikipedia.org/wiki/Gromov%27s_theorem_on_groups_o...

[3] https://en.wikipedia.org/wiki/Grigorchuk_group


Wow that's definitely interesting :-) thanks for sharing this! Makes you wonder if there's a deeper connection between these two areas!


> Why am I not taught a single algorithm whose complexity is between these? is my question.

Despite the fact that there are infinitely many such algorithms, it's a very small infinity, where infinity is approximately equal to 2.

There are the algorithms for factoring large numbers, and the graph isomorphism problem. All these algorithms are a lot of work to implement with lots of fiddly bits, and it's very difficult to show precisely what their complexity is. (unless you get one of the fiddly bits wrong, in which case it's easy to show that they're either polynomial or exponential time) So they're generally bad teaching examples. If the professor give it to them as a homework assignment, they'll fail it, and if the professor goes through the algorithm and the proof in class it will be an exercise in "But I don't understand the step here where you just waved your hands and moved on to the next step."

It's like asking why we don't care about the stuff between Mars and Jupiter; it's not that there's nothing there, it's that they're less interesting than Mars or Jupiter and you can't do Useful Science on them with undergraduate time/money commitments.


> Despite the fact that there are infinitely many such algorithms, it's a very small infinity, where infinity is approximately equal to 2. There are the algorithms for factoring large numbers, and the graph isomorphism problem.

I assume you mean "interesting" ones? Since they could just show a program that counts up to n^log(n) and voila, you have a super-polynomial sub-exponential algorithm.

In which case... I still don't think it's just algorithms for these problems? https://en.wikipedia.org/wiki/Adleman%E2%80%93Pomerance%E2%8...

But if they're all still complicated, and this is why they're not taught, then I feel like I'd have appreciated being told this at least...


Good lord! Reply after reply after reply that simply refuse to speak to the question you asked!

You're quite right that complexity theory takes us to a hierarchy far richer than simply polynomial and exponentiation [0]. Your question about the emphasis on those two in particular, is a good one. Might be worth submitting on cs.stackexchange.com as a 'soft question'.

[0] https://en.wikipedia.org/wiki/Time_complexity#Table_of_commo...


Thanks, yeah, might do that at some point. :)


You may like reading Scott Aaronson's P vs NP survey (2017), which may answer some of your questions about complexity theory: https://www.scottaaronson.com/papers/pnp.pdf

In particular, the section relevant to your question is "1.2.2 The Polynomial-Time Objection" on page 7, quoted in full below:

> Objection: But why should we draw the border of efficiency at the polynomial functions, as opposed to any other class of functions—for example, functions upper-bounded by n^2, or functions of the form n^((log n)^c) (called quasipolynomial functions)?

> Response: There’s a good theoretical answer to this: it’s because polynomials are the smallest class of functions that contains the linear functions, and that’s closed under basic operations like addition, multiplication, and composition. For this reason, they’re the smallest class that ensures that we can compose “efficient algorithms” a constant number of times, and still get an algorithm that’s efficient overall. For the same reason, polynomials are also the smallest class that ensures that our “set of efficiently solvable problems” is independent of the low-level details of the machine model.

> Having said that, much of algorithms research is about lowering the order of the polynomial, for problems already known to be in P, and theoretical computer scientists do use looser notions like quasipolynomial time whenever they’re needed.

To add a couple of points of my own:

- We need to consider at least polynomial functions, for the closed-ness reason (though in practice the exponent is usually small), and superpolynomial functions (in the theoretical/asympotic sense, slower than any polynomial, even n^100000 or whatever!) are already slow enough not to be considered efficient. For that matter do you learn any single exponential algorithm in (say) an undergraduate course? If you're in an area where exponential algorithms are of interest though, then it's not actually unusual to study sub-exponential (and superpolynomial) ones, as those are the most efficient we have.

* There are not many "simple" superpolynomial subexponential algorithms, suitable for a typical first course. Factoring could be taught though, if the students are all interested in number theory I guess. When I took a course on "algebra and computation" (problems in group theory etc), we did encounter quite a few such algorithms.


> Scott Aaronson's P vs NP survey

Thanks for the link, but again, it only explains why polynomials are interesting, not why we don't care about things bete

> For that matter do you learn any single exponential algorithm in (say) an undergraduate course?

Fibonacci? And basically brute-force solution to pretty much any problem they give you in a first course?

> There are not many "simple" superpolynomial subexponential algorithms, suitable for a typical first course.

Yeah I'm getting to the conclusion this is the crux of it. It'd be nice if they at least mentioned this is why?


In a typical introductory course, there are two subtly different contexts in which the asymptotic complexity comes up:

* Case 1: Analysis of algorithms: This is where you write down some algorithm/program, and analyze how long it takes. Here, you're limited by the kind of algorithms you can write down easily; you're not likely to encounter “polynomial” in general but about specific typical functions like n, n log n, n^2, n^3, 2^n, k^n, etc. Functions like n^13 or n^(log n) are equally exotic and unlikely to be encountered this way (as there are no natural/simple algorithms with that complexity); perhaps the most “exotic” it gets are the n^(log 3) and n^(log 7) of Karatsuba's and Strassen's algorithms respectively (both happen to be polynomial).

* Case 2: Computational complexity theory: Here you care about which complexity class a problem / language lies in, like P, NP, PSPACE, whatever. As we've already established that polynomials are the smallest closed class, and anything superpolynomial is definitely not “fast”, here the only interesting distinction of note is between polynomial and superpolynomial, as between “fast” and “not fast”. From your telling it would seem that exponential complexity is somehow treated as especially interesting, but in my experience this is not the case. If at all exponential time comes up it's mainly because that turns out to be the result of analysis of some natural algorithm (as in Case 1 above) and “exponential” is interesting only insofar as it implies “not polynomial”.

My impression/guess is that you're combining (1) the fact that superpolynomial-subexponential algorithms don't turn up naturally (just as, say, n^53 doesn't either), (2) the fact that we do talk about polynomial complexity, and (3) the fact that exponential algorithms do turn up naturally, to claim that we care about polynomial and subexponential complexity but nothing in between, which doesn't seem to be the case.


> Yeah I'm getting to the conclusion this is the crux of it. It'd be nice if they at least mentioned this is why?

A lot of people really struggle through these introductory courses. It wouldn't be very kind or helpful to stop and point out that the stuff they're struggling with is the easy stuff.

When undergrads do go a bit further, it tends to be about things like parallel algorithms, sub-linear algorithms, and space-bounded algorithms. Those are plenty interesting and tend to be more useful to know about.


I think the answer is much more trivial than you think. Polynomials and exponential complexities are given such special status simply because when we start classifying "natural" problems, many of them either fall into one of these classes, and not many in the intermediate classes. If they fell into polynomial and sub-exponential time, computer science text books would have chapters dedicated to these two instead. At the end of the day CS is used to make building technology easier and so computer scientists naturally interested in "natural" problems and not artificially designed problems.

Now, why "natural" problems so easily fall into these two categories and not others is a different mystery and not what you asked.


It's also a robust distinction because polymomials are closed under addition, multiplication and composition.

Converting an algorithm from a Turing machine to a random-access model will often give you a polynomial speedup, but not an exponential one, so "polynomial time" means the same for both.

If you have a problem A, and a polynomial reduction from A to B, and a polynomial algorithm for B, you have a polynomial algorithm for A.

Sometimes algorithms can be polynomial but intractable (or even exponential but practical for some real cases), but it's less "neat" or clean to reason about those things. (Very useful, though, just more done outside the academy.)

As for things that aren't polynomial, I guess exponential things just happen to be the most common kind? Not sure myself why "pseudo-polynomial" bounds aren't more common.


Polynomials are convenient because they allow you to mostly ignore the details of the underlying model of computation. E.g. if you try to simulate a multi-headed Turing machine with a single-headed one, you'll spend a lot of time moving between different positions on the tape, increasing the runtime complexity by a polynomial factor. By considering all polynomials as essentially equivalent, you can choose whichever abstraction you want without having to think too hard about the overhead it adds. That's of course bad if you actually want to implement the fastest algorithm, but it makes proving (and stating) theorems a lot simpler.


Yup this is the justification for why polynomial reductions should define everything within class, but it doesn't seem to justify why we should only focus on the polynomial and exponential classes to begin with, given there are infinitely many classes in between (and beyond exponential). Sorry my question didn't make it clear that's what I was asking about; I edited it to hopefully clarify.


Complexity theorists do care about reductions within subexponential time classes, but it's not a great topic for P vs NP focused Complexity Theory 101. And when you do focus on P vs NP, the dichotomy is quite natural because of the https://en.wikipedia.org/wiki/Exponential_time_hypothesis, the conjecture that a broad class of NP-complete problems are exponential-time.


The class of polynomial time algorithms is closed for some things, that you often see in algorithms: - If you run two polynomial time algorithms in parallel, the result is a polynomial time algorithm again. - If one operation of a polynomial time algorithm, is calling another polynomial time algorithm, then the result runs in polynomial time again.


By this justification exponentials wouldn't be any more interesting than stuff between exponentials and polynomials either, right? So why we still generally taught both, but only these two? Aren't there an infinite number of complexities above polynomial to worry about? is my point.


"Exponential" is often colloquial short-hand for EXPTIME, which is closed under some useful operations (because there's an "any polynomial" in the exponent.)

For one thing, it's robust to reasonable changes of model, for "No true Scotsman" definitions of "reasonable".


You're talking about closure of EXPTIME under polynomial transformations. You could do similar for quasipolynomial and such; there's nothing about EXPTIME that makes it special in this regard. Edited my comment slightly to hopefully clarify.


Exponential is interesting for P=?NP because the naive method for NP problems runs in exponential time.


> One thing that's bothered me ever since I heard about complexity theory is why there is no explanation given as to why polynomial complexities are given such special status. It's just taken for granted that polynomials are the class you care about... with no explanation as to why.

The reason is that if we take something more fine-grained than polynomial, whether the runtime is in this class or not can strongly depend on the machine model (e.g. Turing machine, register machine (counter machine, pointer machine, random-access stored-program machine), etc.). On the other hand, for nearly all machine models, one can show that they can be emulated by another machine model with at most a polynomial-time overhead. This is seen as a strong sign that polynomial-time is probably what we are looking for.

Another strong sign is that there exist problem classes (NPcomplete, coNPcomplete) to which every problem in NP/coNP can be reduced by a polynomial-time reduction (that such a class exists is IMHO a very surprising result). This is seen as another strong sign that polynomial-time could be a good idea.


> if we take something more fine-grained than polynomial

I didn't mean more fine-grained than polynomial. I meant between polynomial and exponential.

> On the other hand, for nearly all machine models, one can show that they can be emulated by another machine model with at most a polynomial-time overhead. This is seen as a strong sign that polynomial-time is probably what we are looking for.

> Another strong sign is that there exist problem classes (NPcomplete, coNPcomplete) to which every problem in NP/coNP can be reduced by a polynomial-time reduction (that such a class exists is IMHO a very surprising result). This is seen as another strong sign that polynomial-time could be a good idea.

These just say why we care about polynomial-time reducibility as a means for specifying a class. It doesn't say why we only look at polynomial and exponentials classes though, which is what i'm saying. There are infinitely many classes in between. Why just focus on these two?


There are only two "natural" problem classes known that are not either definitely in P or definitely in NP, and this article concerns one of them. (The other is integer factorization / discrete logs). And I'm not sure anybody believes there are any natural problems that are inherently in this in between state.


The thing is though -- at the very least they could be honest and clear about this just being the state of affairs, not being something more fundamental (if it isn't believed to be). I (and I'm sure you) see over and over again so many people (probably including myself at some point) who end up being left with the impression that graph isomorphism and factorization are NP-hard, because (a) they hear those problems are hard, and (b) they're told the polynomials are the easy problems, and (c) the only non-polynomial algorithms they've ever seen are exponential-time.

(Btw, I think you mean neither definitely P nor definitely NP-hard.)


> who end up being left with the impression that graph isomorphism and factorization are NP-hard, because (a) they hear those problems are hard, and (b) they're told the polynomials are the easy problems, and (c) the only non-polynomial algorithms they've ever seen are exponential-time.

In a decent textbook on computational complexity theory, you can read that a problem is NP-hard if for every problem in NP there exists a polynomial-time reduction to it. Nobody claims that such a reduction exists, e.g. for integer factorization and graph isomorphism.


When you start listing algorithms, you see that their speeds fall in a line. There are the log ones, the polys (with n lg n between n and n^2), the exponentials, and higher.

So why the distinction between poly and exp? A modern computer runs at about 10 GHz, 10,000 million ticks per second, and there are about 3.16×10^7 seconds in a year. Suppose your problem has size n=100. Then an n^3 algorithm takes 3.17×10^{-12} years. A 2^n algorithm takes 4.02×10^{12} years. The first is perfectly practible, the second is not (its time is longer than the age of the universe).


Thanks! But my question wasn't as basic as this :-) I tried to edit to clarify.


Because O(e^x) > O(x^n), for all Integer values of "n".

So Polynomial is simply a class that we know is strictly "faster" (well, asymptotically faster) than any exponential function.

In practice, I think real-life humans only care for "small polynomials". An algorithm with complexity (x^5000) will be growing too quickly to be used in practice, even if its big-Oh is technically polynomial.

Well, I say that, except 3SAT solvers are a thing and that's provably NP-complete. So lots of people bang their heads against the NP-complete problems anyway.


> Because O(e^x) > O(x^n), for all Integer values of "n"

No, that just tells me why we make a distinction between exponentials and polynomials, not why we always focus on those two in particular. It's not like there aren't other complexities in between that you can't make the same kind of statement about...


Hmmm, I think you're trying to divine a deeper meaning in something that doesn't necessarily exist.

Complexity theory itself is a gross estimation on reality and performance. Complexity theory tells you nothing about how fast (or slow) an algorithm is in practice: quicksort typically is faster than merge sort, despite both being O(n * log(n)).

Heck, quicksort can be faster than radix sort in many cases. Another case: Bitonic sort is asymptotically slower at O(n * log^2(n)), but Bitonic sort lends itself to parallelism very well, so its "faster" in practice (even if its not work-efficient).

So why do people use complexity analysis at all? Its completely irrelevant to determining quicksort vs mergesort vs bitonic sort!

Well, simple: because its relatively easy to calculate, and relatively easy to understand. Polynomial time, and Nondeterministic Polynomial time (aka NP-complete) are both "easy" to describe and understand (for a... somewhat interesting definition of easy). At least, in the greater scope of mathematics they're concise and "simple" definitions.

----------

A great number of algorithms in Knuth's "The Art of Computer Programming" are calculated out to the precise number of assembly-language statements executed... including the constant. No "big-O" estimates involved.

This deeper analysis is great to read once or twice, but read enough of it and you realize how much harder it is to think about. Complexity theory is far faster to analyze and apply to algorithms, even if its an incomplete picture. As such, emphasis should be on "ease of use" and "convenience" as opposed to any real amounts of practical rigor.

----------

The other issue: is that NP vs P is the great unsolved mathematical problem of our lifetime. So a huge amount of professors introduce the concept of NP vs P so that their students can be ready for a major mathematical debate relevant to the field.

-----------

EDIT: In practical optimization, I think most people care about "Linear", "Sublinear", and "Superlinear". For example, the Painter's Algorithm for GPUs is superlinear: O(n * log(n)). But a depth-test based approach to GPU scene painting is linear: O(n). Linear is "good enough" so optimization basically stops there.

Any algorithm that visits all data is going to be, at best, linear. So that's why GPU programmers probably are happy with "depth test", because Asymptotically Linear complexity is the best you can reasonably hope for.


Maybe a dumb question, but what exactly are you talking about between exponential and polynomial?


It's not a dumb question at all - it gets beyond most undergraduate classes on computational complexity. What's interesting is that the algorithm presented for this problem is exactly one of those classes: the quasi-polynomials, which grow exponentially in a polynomial of logs of x.

For more see: https://math.stackexchange.com/questions/226628/orders-of-gr...


It's not a dumb question! It's actually a typical exam question -- "write down a time complexity that is faster than polynomial but slower than exponential". It makes you think outside the box they spend the rest of the semester shoving you in :-) as an example, if you consider n^k is polynomial, but make k increase with n, then you can get n^log(n). Can you show that that's slower than exponential?


I think it's just that many natural algorithms happen to have polynomial time complexity, since it's basically the complexity of iterated loops. Same with logarithms.


Because polynomial complexity is the measure of what makes a tractable problem [1]. If the question is why - there's no formal reason. Tractability is really just subjective human standards: even high-degree polynomial time is often considered to not enough for a solution to be worthwhile computing.

[1][https://en.wikipedia.org/wiki/Cobham%27s_thesis]


Yes, my question was indeed why. At the very least, I would hope for explicit mentions that this is an arbitrary choice when the subject is being taught, and mentions of algorithms with running times between polynomials and exponentials. Otherwise it gives people the wrong impression that there isn't anything in between worth considering, which is misleading and just wrong.


That's exactly how I felt about polynomial time definition when I was just starting to study complexity theory. Now, after a two year complexity-focused master program, I realized a few things which changed the way I perceive it:

1. P is a roughly approximate definition of «feasible» problems.

P doesn't perfectly capture problems we intuitively understand as feasible (e.g, P includes problems which are solved in O(n^10000) time), but it is good in other respects: its definition is independent of computation model (but the model has to be able to simulate a Turing machine with at most polynomial slowdown, which is true for all reasonable models); P has complete problems; P is closed under many ways of combining algorithms (e.g. if you run a poly-time algorithm poly times, what you get is also a poly-time algorithm). So P isn't perfect way to define feasible problems, but it's good enough (or maybe the best we have?).

There may exist other ways to formalize feasible computations, which could fix P's drawbacks. Maybe complexity theorists will come up with one later, when they get a better understanding of computations. But until then, they work with what they've got.

2. Even if you don't buy all this «feasibility» interpretation, P will still make perfect sense for theory. Especially, when you ask questions about its relation to other classes.

It is natural for computer scientists to consider variations of some computation model and try to understand which of those variations are stronger and how much stronger they are. For example, randomized and nondeterministic Turing machines are variations of usual deterministic Turing machine (DTM). But the former two have their own unique superpowers, which are randomness and nondeterminism. But how powerful these superpowers are? Can a DTM provided with such a superpower solve more problems? And if so, can we compensate the DTM's lack of superpower by providing it more running time? How much time would suffice for that?

In these terms we can say, that P-vs-NP question actually asks whether polynomial-time DTM's lack of nondeterminism can be compensated by allowing it to run poly(n) times longer. Similarly, if you ask the same question but with «randomness» instead of «nontereminism», you will get another famous problem of P-vs-ZPP (or P-vs-BPP, depending on your definition of randomness).

Computer scientists asked similar questions for models other than Turing machine, and successfully answered them. For example, for finite automatons we already know how many extra states we need to go from nondeterministic to deterministic one. Similarly, for communication complexity we know something about the relation between nondeterministic, deterministic, and randomized complexity.

EDIT: Ooops, I replied to the original version of your post, before you clarified what you meant.


even small graphs can be made to look very different just by moving their nodes around

I'm curious: for the graph isomorphism problem, how are the graphs defined then? It seems there must be a geometric component in there because if they were defined just in terms of abstract sets of vertices and edges between them, it would be rather trivial to show if two graphs are the same, wouldn't it?


Graphs are given as abstract sets of vertices and sets of edges (as adjacency lists or matrices). The problem is that corresponding vertices in the two graphs may not have the same label, so to solve graph isomorphism naively, one would need to try all n! possible bijections between the vertex sets (where n is the number of vertices).


Wikipedia has a nice picture of two graphs that are isomorph: https://en.wikipedia.org/wiki/Graph_isomorphism.

It should be noted than one of the common used algorithms, called VF2, and descendants of it basically just check all possible bijections, although they use some heuristics to prune the search tree.


I think pictorial representations are what is confusing the grandparent. The problem isn't about comparing two graphs that look different visually, it's about comparing two graphs which have differently named labels (but might still have the same structure). The graph isomorphism algorithm doesn't know anything about where on the page you drew the nodes, just what is connected to what


Thanks much for the clarification. I didn't know that one was "allowed" to change labels. It makes sense now.


Not if the number of nodes is high, no. The complexity of the naif algorithm grows exponentially, making it unmanageable very fast.

In order to show that one node in the first graph correspond to a particular node (i.e. is isomorphic) in the second graph, its not enough to check that they have the same number of outgoing edges. You have then to check all the connected nodes, and then make sure that those are also isomorphic to the respective same nodes in the same roles at the other graph.

This makes the problem recursive and dependent on the rest of the graph; you can't simply split the graph and solve each sub-problem fast.


The problem is to show that the graphs are the same or different up to relabeling of the vertices. I.e. the graphs

    0-1-2     0-2-1
are the same if you swap the labels 1 and 2. You could try to brute force it by trying all possible permutations of vertex labels, but there are n! of those to try, so that doesn't help much. Any polynomial-ish algorithm would need to do something clever that takes advantage of the structure of the problem.


Its actually pretty hard for even humans to eyeball a graph and figure it out. For example, the wiki article on the peterson graph (https://en.wikipedia.org/wiki/Petersen_graph) has a few drawings that you cant easily say are the same graph.


A graph is defined as it always is. A set of edges that connect nodes by their IDs. We can use this arrow notation for convenience. Imagine you have a graph: a->b b->c c->d d->b

Is that the same as this graph? a'->b' b'->c' c'->a' d'->a'

You can sort the edges alphabetically in the first graph. Then you could take the second graphs labels, and figure out if there is a way to "relabel" it, so when you sort those edges alphabetically, you can the same result.

There are |V|! possible ways to relabel |V| vertices. Other algorithms are better than "try everything", but this shows why it's not trivial. Like a puzzle, you can put the pieces into place, and everything looks right, but then when you get to the end you realize one piece doesn't fit. Does that mean it's not-isomorphic? or maybe you just need to re-arrange the pieces differently? That's why the complexity is so high.


”It seems there must be a geometric component in there”

No, there isn’t.

”if they were defined just in terms of abstract sets of vertices and edges between them, it would be rather trivial to show if two graphs are the same”

No, it isn’t, unless you know a novel argument that hasn’t been thought of yet by mathematicians.

It is fairly simple for graphs with very few vertices or very few edges, but in general, it isn’t easy at all.


You might say that it's easy in general (since all but a handful of isomorphism classes are easily distinguished), but hard in full generality. :)


Explanation of the current state of review: https://cstheory.stackexchange.com/a/40373


"Monads are monoids in the category of endofunctors. What's the problem?" -- that's how every paper looks at me!


Is the algo practical, implemented by say major generic graph libraries?


No, graph libraries usually relay on some variation of VF2 and maybe (but I haven't checked) in some cases on the more complicated nauty/traces algorithms.


(basically speaking) no-one does Graph Isomorphism with VF2, they use (as you say) Nauty, or some system which derives from that.

Things like VF2 can be used for sub-graph isomorphism (and various variants of that problem). Nauty (and this paper) are useless for solving that problem.

I work in this area, and it's interesting (to me at least) that you would imagine the algorithms for graph isomorphism and sub-graph isomorphism would be quite similar and they really aren't.


Still, the more well known graph libraries seem to avoid Nauty, probably because it is not that easy to implement. From a quick search: - NetworkX: VF2 - JGraphT: VF2 and some newer algorithm called "color refinement algorithm" - seems to resemble Nauty - BGL: VF2 - SnapGraph: Does not seem to contain graph isomporphism - Igaph: Bliss and VF2

So are you convinced that Nauty is still the best thing that we have nowadays? I was looking at the paper last summer with the intention to implement it for Julia, but then got sidetracked with other stuff. Maybe I should resume that.


Nauty is I think the best trade-off. The improvements in Saucy and Bliss (which have mostly been added to Nauty now) are useful.

The basic algorithm isn't that hard -- I have a few implementations lying around (nothing in Julia). Traces is better, but is much harder to implement and it isn't clear the improvement is worth it for someone implementing from scratch.

There are some improvements, particularly for graph isomorphism (as opposed to automorphism), but the basic algorithm framework is the same.


As you work in this area, is there any such thing as a clique bestiary? I spend a lot of time looking at large-ish graphs in Gephi but it's hard to find tools/code about shapes/structures rather than quantitative analysis of degree or betweenness. Not a scientist, it's a means to an end for domain exploration for me.


(2017). This is old news, I thought for a moment he had a new paper. Please add the year.


(2015)


Technically 2017 was the latest update to the paper, but still not as groundbreaking as the submission title.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: