Hacker News new | past | comments | ask | show | jobs | submit login
GetACoder.com: Solve P Vs NP (getacoder.com)
110 points by olalonde on Aug 16, 2010 | hide | past | favorite | 73 comments



It is funny that the author chose the SUBSET-SUM problem. You may be amazed but this is a special-case of the KNAPSACK problem for which we have:

Fully polynomial approximation schemes.

Pseudo-polynomial time algorithms.

In other words, you may find that for all practical purposes this problem can be solved so fast that it doesn't really warrant the NP-badge in some sense. You can say that the problem is an NP problem which disguises itself as a P problem. It would be really fun to use David Pisingers knapsack algorithm codes, http://www.diku.dk/hjemmesider/ansatte/pisinger/codes.html and then bait the bidder into finding a set for which the algorithm is NOT fast (hint: That is also pretty hard :)


Yes. There's even a conjecture that says that for basically all NP problems all reasonable [1] instances will be solvable in expected polynomial time. Even if P != NP.

[1] Add some hand-waving about probability distributions here.


Does that include factoring large prime numbers? If a good approximation would exist for that, wouldn't that make public-key encryption pretty useless?


Included in that handwave about probability distributions was a caveat about not being handed pathological instances of problems. Encryption harnesses pathological instances and wouldn't be covered by such a handwave.

(Is there ever any other reason to generate multi-hundred digit numbers and try to factor them? Honest question, if anybody's got a fun answer, though I am asking about something actually useful that you know about, not something hypothetical.)


Actually, it's quite hard to come up with good primes for RSA. They need to avoid lots of properties, because researches have found efficient attacks on numbers with those properties.


Factoring numbers is already suspected to be much easier than NP-complete.

That is because factoring prime numbers is in the intersection between NP and co-NP. Co-NP is the set of problems where a "NO" answer is easily checkable with a certificate. If NP ?= co-NP is as much a question as P ?= NP. (I.e. no known proof either way, but every expects them to be unequal.)


Public-key encryption schemes would simply need to avoid reasonable circumstances. I suspect "product of two large primes" falls outside the "reasonable circumstances" for factoring.


It's even harder than that. There are lots of known attacks, if you are not careful with your choice of prime numbers. (`Careful' is equal to `choosing pathological instances'.)


>You can say that the problem is an NP problem which disguises itself as a P problem. It would be really fun to use David Pisingers knapsack algorithm codes, http://www.diku.dk/hjemmesider/ansatte/pisinger/codes.html and then bait the bidder into finding a set for which the algorithm is NOT fast (hint: That is also pretty hard :)

This makes me think that the post could be a phish for programmers with strong maths comp-sci backgrounds, something Googlish?

Please someone try this!


That would be fun, but alas, I don't have time. One can simply submit Pisingers "decomp" codes from the page I linked above, but if you do that, you are not the author of the work...


I have a question, as a CS student.

Why is finding a set for which the algorithm does not work (edit:) WELL so hard? I have admittedly not read anything about -- or indeed, heard of -- KNAPSACK, but my impression was that we could estimate running time based on various measures of complexity of the problem. Is this incorrect, or is measuring the complexity of NP problems simply incredibly difficult?


For all NP-problems, there are trivial instances. As an example the given instance might have structure which effectively makes it a P-problem. What makes NP-problems interesting is that there are also hard instances where the search breaks down and you end up walking an enormous solution space. Most NP-solvers "cheat" by looking at the structure of an instance in order to cut down the amount of searching it needs. How good they are depends on how good they are at cutting down the solution space. A lot of research is going into this - which is really a bet on P not equal to NP.

The KNAPSACK problem is one of the "weak" NP problems in the sense that there is a pseudo-polynomial solution, see

http://en.wikipedia.org/wiki/Pseudo-polynomial_time

Intuitively, this means that KNAPSACK is still hard, but is easier than most other NP problems. Pisingers codes are so good that for most inputs the algorithm actually terminates quickly. There are still hard instances out there, but there are far between them. This means that a search for a hard instance might actually become rather cumbersome. Further, all realistic instances are probably realistically solvable so the instances we lack are more or less of theoretic interest.

So when you call NP-problems hard it is because it has been proven there are some nasty instances among them for which even the most clever NP-solver algorithm for the problem gives up and resorts to basically searching the whole solution space in exponential time.


But WHY are they trivial? Because we can extract metadata that help us prune the solution pool? And if that's the case for the KNAPSACK problem, does this hold for all such problems?


Yes, the idea is to look at the structure of the problem and use the structure to prune the solution pool aggressively. The usual method is Branch-and-bound algorithms, but the wikipedia article does not seem to convey the idea especially well, so dig around for a better reference.

Sometimes we are lucky however and there is more structure to a problem than what BB-algorithms usually give. For KNAPSACK, remember that we have a burglar with a knapsack who has just broken into a house. Each item in the house has a given profit and a given weight. The problem is to maximize the profit while still keeping the weight low enough to be in the knapsack. In the 0-1 KNAPSACK problem, there is only one of each item, so we either have to leave it, 0, or to pick it, 1.

The inherent structure is that we can order the items by the ratio p/w of the profit over their weight. Item with a high p/w ratio tend to be items we need in the knapsack, whereas elements with a low p/w ratio tend not to be. It turns out you can preprocess and prune some elements this way before you start on the BB algorithm.

But for this problem it turns out that there is a critical item, the break item and a window of items around it. If you can identify this window, solving that is enough for solving the whole knapsack problem. There is a theorem from the 80'es stating this. It turns out that for knapsack problems with thousands of items, the window tend to be fairly small, some 20-30 items.

Pisingers codes work by being extremely clever at finding the window. It is a smart BB/Dynprog hybrid and as soon as the window is found, it is almost done.

Of course, there are knapsack instances where the window is almost all of the item set and then the algorithm fail. But it turns out to be rather hard producing an instance from a real-world problem which exhibit the large window structure. Hell, Pisingers codes are often faster than sorting the items by the p/w ratio - which makes for a great discussion on complexity :)

So what did I mean by trivial instances? In the SUBSET-SUM problem, this is a trivial instance: {-3,-2,-1,1,2,3}. This one too: {1,2,-3}. Or how about this one: {1,-1, 337}. In the first two, the whole set works. In the latter, it is easy to see that 337 can never be part of the sum as its inclusion lead to a value so great we can never hope to get back to 0. Even if we construct extremely large sets, it is easy to construct them in a way such they can be pruned down to a small set and then solved.


You have been most helpful. Thank you indeed.


Old joke, but still funny. There was a similar "job" posted on the same site a couple of years ago, only that that one asked for solving the halting problem: http://docs.google.com/viewer?a=v&q=cache:WVkySfVPHCoJ:b...


Haha and it's posted by AlanT!


The posts by KurtG and georgecantor are good too.


As an extensive (and generally happy) user of elance and similar sites, this really made me laugh! This is fantastic example of the pitfalls of such sites.

Whatever the project, you get a bunch of replies (the majority from Indian) who either just say 'yes we can do that' or come back with boilerplate drivel without understanding the requirement.

The Indian guy who responded with a $300 bid and said: "I can do this for you in php or javascript. If you want this work in one language, I can charge less."


He could've done it in phpjs and killed two birds with one stone.

My bid was rejected and my account suspended for completing the work before my bid was accepted. I just hope those Clay guys see it so i can earn my $300.


Hey. It's business not accademia. When giving a task you have to be prepared that not all of your requirements will be met.


Hrm.. you know, there could be something here. If people kept posting variations of NP problems as jobs on sites like GetACoder as a honeypot, you could easily devise a blacklist from all the bids as people you'd never want to hire for a real project.


That's actually kind of brilliant. I'm looking at this reply that reads "Hello, Easy job for me,please send me a PM ." and thinking, "you stay away from me." This one is also golden, "Read your requirement and everything has been understood. We have already delivered a couple of similar projects and given our strong capabilities. I am confident we can deliver a powerful end-result to you as well."


I think most of them just skim the description and bid on a lot of projects without really reading them and hope to get some of them awarded. Being non-native English speakers compounds the problem.

Or am I being too gentle? It's hard to miss the title if you know what it means.


You've got it right. Derek Sivers recommends doing this for any serious rentacoder/vworker/getacoder projects you post:

""" At the end of your post, write something like, “VERY IMPORTANT: To separate you from the spammers, please write I AM REAL as the first line of your bid. We will delete all bids that do not start with this phrase, since most bidders never read the requirements. Thank you for being one who does.” """


This doesn't really work all the time, as some of them are skimming through the description for this sort of stuff.


Of course it's not a sufficient criterion by itself; Sivers' point was that a lot of them won't even do this, so you can at least save yourself the trouble of evaluating them.


This is why everyone who calls themselves a programmer or developer needs a strong computer science background -- university provided or not. Without it you can't accurately assess the difficulty of a problem, even seemingly simple ones like this.


"Without it you can't accurately assess the difficulty of a problem, even seemingly simple ones like this."

And with a background, you'd recognize hard problems?

Sure, very well-known NP-complete problems you'll recognize, but there are plenty of other problems that are equivalent to them, which you might stumble upon without even realizing it (I recently wrote a post about one such problem: http://www.loopycode.com/a-surprisingly-hard-problem-post-co...).


Even with one you may miss it, but you're definitely going to miss it without one.


Post's Correspondence Problem is nice. As an exercise you may try to figure out how to simulate a Turing machine in the Correspondence System. If you've done so, you basically have a prove of the undecidablility of this problem.

I worked on this, when I was bored in theoretical computer science classes.


That's why I learned how to prove the complexity class of certain problems ;)


Well, in this case you mainly need strong common sense and the ability to use Google/Wikipedia. Any programmer that doesn't automatically Google something he doesn't recognise or understand isn't worth his salt either.


That's only useful if you happen to get the academic name of the problem when you're trying to solve it. What if it just turns out that the problem you're solving resolves to the knapsack or travelling salesman problems? You're not going to be able to Google TSP unless you already know what TSP is.


At least the academic world knows this problem is hard to solve. If only HR databases or interest rate swap pricers were so easy to reason about.


"PS: The program should run in polynomial time. "

Is that optional?

(Maybe I've been reading too specs that reference RFC2119.)


Sure seems optional to me, he should have used must. :)


And he did not say polynomial in what.

I can make the program run polynomial in the size of sum of the input numbers. (I.e. pseudo-polynomial in the input.)


"Hello, Easy job for me". This thing is hilarious.


Don't worry, phpdeveloper111 has got this.


To be fair, the algorithm for solving this problem is trivial (but it doesn't run in polynomial time). Still hilarious :D


You can make a dynamic programming approach run in polynomial time in the size of the input. If you specify the input in unary, that is.


Since the post only specified polynomial time and placed no constraint on space, this is actually solvable (e.g., it's not too hard to come up with an polynomial implementation that uses a number of VMs that grows exponentially with the size of the set).


Each VM does some work. If the number of VMs grow exponentially, the work grows exponentially. !=P.


But the getacoder poster only specified that it had to be done in polynomial time, and didn't clarify whether that had to be total CPU time (i.e., "work") or wall clock time.

Given that underspecification, it's perfectly justifiable to use an exponential amount of resources to achieve the goal.


What? Super-polynomial space implies super-polynomial time.

You can have an algorithm that uses exponential time and only polynomial space, but not the other way around.


Some non conventional computers [1][2] do exponential space in polynomial time.

[1] Lipton, R. DNA solution of hard computational problems. Science 268 (1995), 542-545.

[2] Head, T. Aqueous Solutions of Algorithmic Problems: Emphasizing Knights on a 3 x 3. Lecture Notes In Computer Science; Vol. 2340


I skimmed [2] and it seems very interesting.

Still, regardless of whether P!=NP, it is known that P is contained in NP which is contained in PSPACE. These classes are defined for certain computational models. If you change the computational model then those definitions might not make sense.


>Some non conventional computers [1][2] do exponential space in polynomial time.

For under $500 ...


As a re-representation of the DNA computing work cited above, you can actually do exponential work in poly time using a photocopier.

EDIT: here's a link to a paper describing this approach: http://www.springerlink.com/content/j5213p8761224304/


Whoa... I never thought I'd see my dad's work reffed on HN.


In the real world, taking exponential space implies taking exponential time. The fastest you can transmit information is the speed of light, so it will take exponential time to communicate with the furthest pieces of your exponentially large system.


Depends on the problem and how you distribute data.

e.g., each node can generate its own subset to check, so you only need to distribute the program and input. That can be done using a multicast tree of some kind which brings distribution costs back down by a log.


But if you only have a finite amount of hardware available (which is probably the case), each node's data set grows exponentially with input size.

If your node population increases exponentially with input size, you're still running into the speed-of-light delay problem mentioned above (cube root of exponential is still exponential). If your solution is polynomial time with polynomially many nodes, your space cost is also polynomial.


This could be said about almost all "polynomial time" algorithms though. The bigger the loop, the larger the variables you'll need (in general), and the more time you'll need to communicate between the pieces.


OK, but the amount of data you need to access will be bounded by a polynomial...


You know, if that guy from Mumbai could actually solve P = NP for a cool million, that's bargain. That is, if you get the rights to the solution. :D


VinayDeolalikar: I think I have done this. http://i.imgur.com/Oedd0.png


Love the reply by one "PierreDeFermat"


I greatly appreciate that style of response over the "this is a joke, you idiots!" style or the twee exaggerated response that accomplishes the same end.


I can't remember much of the P/NP stuff from my comp sci university days.

Wouldn't this work? O(n * k)?

  <?php

  $input = array(-2, -3, 15, 14, 7, -10);

  $sums = array();

  foreach ($input as $val) {

    $i = 0;
    $count = count($sums);
    foreach ($sums as $sum => $bool) {
      if ($i >= ($count / 2)) {
        break;
      }

      $sums[$sum + $val] = true;
    }
	
    $sums[$val] = true;
    if (array_key_exists(0, $sums)) {
      print "yes\n";
      break;
    }
  }



This is awesome. I will send him the program which runs every Turing machine simultaneously on his problem instance, and checks the output of each to see if it is a correct solution.

If he manages to prove that my submission doesn't solve his problem in polynomial time, I'll refund his money, but I'll also send his proof on to the Clay Institute.


There are infinitely many Turing machines.


But there are only countably many Turing machines, and you only have to evaluate countably many steps of each, so all you need is a counting of NxN.


When do you accept? When do you reject? This is bogus, and can't even solve problems in P in polynomial time.


You accept when you find a Turing machine that outputs something which turns out to be the polynomial-time verifiable certificate for the NP problem.

You never reject, because the requirement was only to accept in polynomial time. (If you prefer, you could reject in exponential time by exhaustive search, of course.)

For another description see here:

http://en.wikipedia.org/wiki/P%3DNP#Polynomial-time_algorith...


What am I missing? I can see the solution for those 6 numbers {−2, −3, 15, 14, 7, −10} or is that the point, the more numbers you add, the longer it takes to solve and p=np is that it should get all the answers instantly?

Confuzzled!


I can see the solution for those 6 numbers {−2, −3, 15, 14, 7, −10}

The program should be able to work on any set given by the user.

the more numbers you add, the longer it takes to solve and p=np is that it should get all the answers instantly?

The more numbers there are, the longer it takes. If and only if P=NP, there is a solution which always finds the answer in polynomial time.


To be fair, the description is bad and misleading. The submitter probably knew that the responses would miss that the input might be very (infinitely?) long.


Don't miss the bid by "VinayDeolalikar" below.


Or the one by some dude called Fermat. Too bad he run out of space...


Too bad I don't have user there. As requirements does not mention computational time, the simple brute force approach would work.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: