The Easiest Hard Problem: Balanced Number Partitioning

cperciva · on Jan 3, 2008

This is a generally good article, but it gets two things irritatingly wrong. First, it presents the problem as only existing in the case of partitioning into two sets; in fact, the problem is usually considered in the case of partitioning into three or more sets, since the two-set case is really just the (far more widely known) Subset-Sum problem.

Second, the article gives an example of a two-set partitioning problem and then states that "the only sure way to find the one perfect partition is to check all possible partitions", i.e., that the optimal algorithm runs in O(2^N) time. In fact, there is a trivial O(sqrt(2)^N) algorithm for the problem.

auferstehung · on Jan 4, 2008

It is my understanding that the hard part is when m/n ~1 (P/N~1 in wikipedia terms). Does your O(sqrt(2)^N) algorithm work well for 1000 numbers between 1 and 2^1000. Wikipedia says, "We give efficient algorithms for both small N and small P cases below," which would seem to imply the answer is no.

Regardless, the most interesting aspect of the article to me was the existence of a phase transition between easy and hard parts of this type of problem. I found the reference to Stephan Mertens work relating this computational problem to statistical physics, specifically spin glasses, the most interesting. See samples of his work at http://arxiv.org/find/all/1/all:+AND+stephan+mertens/0/1/0/a...

Incidentally, Angell has a new paper out on the glass transition. http://arxiv.org/abs/0712.4233

bayareaguy · on Jan 3, 2008

Where can I read more about the O(sqrt(2)^N) algorithm?

cperciva · on Jan 3, 2008

The gist of the algorithm is covered in http://en.wikipedia.org/wiki/Subset_sum_problem#Exponential_...

Going from the O(2^(N/2) N) algorithm described in the wikipedia article to O(2^(N/2)) just involves keeping the lists of subset sums sorted during the construction process, thereby avoiding the need for the expensive sorting step.

That said, the O(2^(N/2)) algorithm requires O(2^(N/2)) space, making it less than ideal; in most circumstances, an algorithm which generates the lists of subset sums for each half of the list on-demand using a priority queue is better -- this approach takes O(2^(N/2) N) time, but only O(2^(N/4)) memory.