Hacker News new | past | comments | ask | show | jobs | submit login
NP-complete problem solved with biological motors (arstechnica.com)
87 points by YeGoblynQueenne on Feb 28, 2016 | hide | past | favorite | 26 comments



Having worked worked in both quantum information and biology, I have a few things to say about this article...

1) The 1st paragraph really annoys me. Quantum computers don't solve NP Complete problems by trying every solution. This is a common misconception. In fact, it's widely believed that quantum computers wouldn't solve NP-Complete problems in polynomial time, though they may get a moderate speedup over classical computers related to their superior searching capability.

2) The biological computer in the article also doesn't solve NP problems in polynomial time. Rather, it proposes a highly parallel machine that divides runtime by a very large constant. The real claim is to make a low-energy parallel processor that makes it easier to throw lots of processors at the problem. It does not alter the complexity classes. The linked PNAS article states, "it is inherent to combinatorial and NP-complete problems (assuming P != NP) that the exploration of the entire solution space requires the use of exponentially increasing amounts of some resource, such as time, space, or material. In the present case this fundamental requirement manifests itself in the number of agents needed, which grows exponentially with 2^N. Effectively we are trading the need of time for the need of molecular mass." Unfortunately, the experiment hasn't quite got this far, since "the error rates of this first device are too large for scaling up to problems containing more than ∼ 10 variables." It may constitute a step in the right direction, though.

3) The popular science article (as opposed to the original research it links to) is probably too focused on trying to make this about P vs. NP and misses the actual accomplishment: that the researchers can control a microbiological system outside of their own brains well enough to compute with it. This is pretty cool and may be valuable progress in nano/biotech even if it doesn't end up being a viable computer.


In grad school I always heard Scott Aaronson harp on these issues too. He was fond of saying that quantum algorithms organize things so that amplitude here and amplitude there is arranged just so, to ensure that wrong answers experience cancelling amplitudes, rendering them highly unlikely to be the observed outcomes.

It's not about "trying everything in parallel" but rather shoving amplitude around to ensure that correct answers are overwhelmingly more likely to be observed.


This is spot on and what they actually managed to build is actually super interesting.


Ugh. This again.

http://www.scottaaronson.com/papers/npcomplete.pdf

"this totally scales in polynomial time... but we lose statistical significance as n increases"

"this totally scales in polynomial time... but scales exponentially in the number of proteins needed"


The linked article is a little over-hyped, but the paper authors themselves acknowledge that their system is simply a really cool example of parallel computing and doesn't solve or avoid the P?=NP problem. From the paper:

"...it is inherent to combinatorial and NP-complete problems (assuming P! = NP) that the exploration of the entire solution space requires the use of exponentially increasing amounts of some resource, such as time, space, or material. In the present case this fundamental requirement manifests itself in the number of agents needed, which grows exponentially with 2N. Effectively we are trading the need of time for the need of molecular mass."

They could have implemented many different paralellizable algorithms, so the choice to solve a case of subset sum was probably for PR/the awesome factor (not to be underestimated as an academic motivation).


That's a cool paper. It touches on a question I've had before about how much of our algorithmic reasoning is predicated on the implementation of CPUs and our current tech stack.

For example, I understand that it's been proved that sorting can at best be done in O(n*log(n)), I guess related to the number 1-to-1 comparisons that need to be done.

However, how does sorting, say, the physical length of sticks come into this? If you stand them all in a bundle you can easily select out the tallest, and then the next tallest, etc, by e.g., lowering a sticky horizontal plate.

It's beyond my cleverness to think about how that could be implemented in a useful way for a computer, but could we prove that it's not possible? Could we have custom-designed units for specialized operations that take advantages of differences between the digital/analog world?


I'm going to provide an avenue for thinking about why the stick sorting example is problematic.

Before starting, you have to specify the the size of the plate / the sorting machine. Let's say it supports 10,000 sticks.

I'm going to give you 20,000 sticks.

Does your machine still solve the problem in O(N)? Nope, it's back to O(n log(n)).

The issue with complexity theory is that it's for _arbitrarily large N_, and solutions that work intuitively for small N often run into walls, especially physical solutions.

EDIT:

BTW, I had to stop and think for a bit on that one. This question made me smile; it reminded me of why I enjoyed school. Great question, where the answer isn't obvious and elucidates something about the nature of complexity and how it disagrees with intuition.


I don't think this is the reason the sorting machine can beat O(nlogn). Just as any real sorting machine is limited in size, every real computer is limited as well. Just as the machine can't hold arbitrarily many (or arbitrarily large) sticks, your computer doesn't have enough memory to hold arbitrarily many (or arbitrarily large) numbers. Complexity theory is predicated on idealized computers (Turing machines with infinitely long tapes, or register machines able to hold infinitely large numbers).


But isn't this just radix sort, effectively? That's linear-time, no problem. There's no fundamental rule saying that sorting has to be O(n lg n), it's only sorting-by-comparison that needs O(N lg n) comparisons.

Though I guess one might point out that, without direct (and infinitely precise!) comparisons, eventually you'll have enough sticks that you won't really be able to tell which is the largest.


You can build an arbitarily large sticky plate in linear time, before using the plate. It doesn't add a log(n) factor.


Well, I don't think that's true.

Eventually, the plate will be higher density than the material available at the point of its construction, so it will need to take material from outside the plate's current area to fill it.

As the plate extends further into lower-density regions, it will have to extend its materials gathering exercises further and further away from the actual construction site, so they'll take longer and longer to get there.

Even if the plate is very low density indeed, eventually you'll run out of planet and have to start dismantling the moon - which is quite a long way away - unless the plate has a lower density than the volume of space the moon encloses.


> However, how does sorting, say, the physical length of sticks come into this? If you stand them all in a bundle you can easily select out the tallest, and then the next tallest, etc, by e.g., lowering a sticky horizontal plate.

But here you are only linear in the number of plate-lowering- and longest-stick-removal-operations. The actual number of comparisons you do in this example is still very much quadratic, as, in every step, you have to check every stick on whether or not it touches the plate; you just found a very fast way to check this (but still linear in the number of sticks remaining).


Note 1: Comparison sorting is O(n log (n)).

In domains where the range of values is smaller than the number of elements (so there are duplicate values), sorting can be faster.

https://en.wikipedia.org/wiki/Radix_sort

Note 2: for aribtrarily large sticks, is "lowering a sticky horizontal plate" a constant-cost operation? Or is it a factor of K (difference between largest and shortest object)?


O(nlogn) is the bound for a serial version of the algorithm. A head that can read or write one value (stick) at a time. Parallel sorts achieve better bounds. Your sticky horizontal plate is essentially N parallel reads, each scanning through values, continually asking if it's reached its stick.


This is called spaghetti sort. The analysis on Wikipedia should help.


The NP-complete problem here is merely used as a intensive benchmark. It's still just "bruteforcing" the solution.


Sounds very click-baity.

1) Quantum computers are not able to solve NP-complete problems. They have their own complexity class, BQP, that is suspected to be disjunct from the class of NP-complete problems.

2) If they solved just one configuration, like it said in the article, that doesn't prove anything. They must be able to solve any problems in that class of problems.

3) However, even if they "just" found a way to solve an NP-complete problem 10000 times more energy efficiently, that's impressive, even if it won't scale. (And it won't, because it's NP-complete.) Just don't do those clickbait things, OK?


>> Sounds very click-baity.

Oh dear. Not again :) [Edit: by that I mean that I had another sumbission in the same day that readers considered had a clickbaity title]

I guess I could have changed the title to "NP Complete problem tackled with parallel computation using biological motors" or something similar- that would have been more informative and less of a strong claim. But the submission guidelines also say to avoid changing the title (unless it's clickbaity) so in some case you just have to make a judgment.

I think in this case the title is a bit ambiguous. You could read it as "NP complete problem class solved" or "NP complete problem instance solved". But I think the author makes a good job of detailing the result, they do link back to the original paper and there's plenty more to the article than just a catchy title. Also it's on Ars, that generally has a good record of not trying to pull the wool over their readers' eyes. So really, I didn't think it would be read as an attempt at generating an undeserved er, buzz.

So I erred on the side of keeping the title as it was.

Well, anyway- this was a bit unexpected (two titles read as clickbaity in one day) so I'm a bit worried now- I don't want to come across as a ... clickbaiter? I'll err on the side of caution on the next few submissions and see what happens.


No worries, it's the title of the original article, and I'm criticising mainly that. But the article itself definitely has it's own share of problems/inaccuracies.


Isn't solving one NP-complete problem solving all NP-complete problems? Reducibility is how you almost always prove something is NP-complete.


You're confusing solving an instance of subset-sum with solving subset-sum.

If you solve subset-sum, yes, you've solved all NP-complete problems.

If you solve an instance of subset-sum, you've only solved that instance of subset-sum.

The issue here is that to show you've solved subset sum, you need to show that you've solved it for _all possible inputs_.

Subset sum is "given a set (or multiset) of integers, is there a non-empty subset whose sum is zero?".

If I give you the set {-2, -1, 0, 1, 2}, you can tell me in O(n) time that a subset of this sums to zero.

That doesn't mean you've solved subset sum, it means you've solved an instance of subset sum. To solve subset sum, you need to provide a proof showing that you have an algorithm that works for ANY set.

The work here gets results for a specific configuration. To show it solving subset-sum, it would need to be shown to work for all possible sets.


Yes, because transforming a problem in one class of NP-complete problems into another is itself a problem in P.


These comments are ridiculous. Nothing in the article claimed they could solve NP-complete problems in polynomial time. They didn't even claim it was very practical. But it's a super interesting way to do computation.


I'm not seeing how this solves an NP-complete problem at all.

First, the actual press release from the researcher: http://nicolau.lab.mcgill.ca/?p=1207

However,

> We provide a proof-of-concept demonstration of such a device by solving, in a parallel fashion, the small instance {2, 5, 9} of the subset sum problem, which is a benchmark NP-complete problem.

> Actin filaments exploring the {2, 5, 9} device. The movie shows a time lapse of 200 typical fluorescence micrographs of actin filaments (shown in white) moving through a network (shown in blue) encoding the SSP with the set {2, 5, 9}. Exits labeled with green numbers represent correct results, and magenta numbers represent incorrect results.

They have 17 different exits. What isn't clear to me is how the exits from the device map to actual solutions or non-solutions of the subset-sum problem. {2, 5, 9} has only 8 subsets (including the empty set), so there's more exits than what I'd call "possible solutions". Roughly half of the exits map to a "correct" answer, roughly half do not. (If you ignore the "which subsets?" questions, and just as is there a solution?, then I suppose half makes sense, in that half the exits are "no solution" and the other half are "a solution exists".)

But my other question, and the one I feel like neither site adequately answers, is how do you set up the device that the protein moves through? If it requires as many exits as possible solutions to the problem (i.e., one for each brute-force "attempt"), then creating the device requires as many exits — and at least as many gates, if not more; creating the device then is still hard, and has the same issues scaling that a computer does. That you can evaluate the device quickly becomes meaningless: you must build it, first.

Edit: The 17 exits correspond to the sums, not the subsets. (I was also missing how they never specified what the target sum was; this is how.) So, "correct" exits are exits where there exists a subset that sums to the number of that exit. (They're solving all possible values for the sum for a given set.) As an example, exit 14 is "correct" because there exists a subset, {9, 5}, which sums to 14.

Edit: Also, here's the paper[1]. This is the weirdest hyperlinking job I've seen in a while. The only link Ars seems to give to the researcher's press release is in the image credit (and even then, only to the site's home, not the release itself…). The press release only links to the paper on the page title?

[1]: http://www.pnas.org/content/early/2016/02/17/1510825113.full


The Ars article links to the abstract. There's a line at the bottom of the article with the text:

PNAS, 2016. DOI: 10.1073/pnas.1510825113 (About DOIs).

The DOI is a link to the abstract. I'm not sure why they don't link to the full text directly, but it's not restricted access.


I do not believe that an actin-based system for parallel computation could ever be more energy efficient than ceramic hardware.

It might give off less waste heat than your standard desktop PC; but that is not a valid comparison when you consider other, specialized types of circuitry.

Furthermore, the energetic cost of producing and purifying protein should be noted as expensive in the calculation of efficiency;

And the fact that actin filaments are not permanently stable in solution makes this system untenable.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: