There's a simple iterative alternative for generating power sets. For each item of a set, you will either include it or exclude it in one of the results. If you count from 0 to 2^n, the binary digits of your counter will enumerate every combination of these possibilities. In Java 8:
If you enjoy this sort of thing, it is also the start of Knuth's volume 4. In it, he goes over many different schemes of generating things.
My favorites are the ones he calls "loopless" since they do not have a loop during each iteration of the algorithm. So, in this example, there is no need for the filter line that will ultimately have to loop (or hash or something else not trivial) to find the items to filter out). In particular, to generate all permutations of binary digits requires something like 4 assignments and one test for each iteration.
The ridiculously interesting ones are based on De Bruijn sequences. Basically, think if you have a combination lock that constantly shifts in 1 character when you type. Is it possible to test all combinations of digits without "wasting" a keypress?
My personal interest is in generalized LFSRs, which can be used to generate de Bruijn sequences. From the wiki page for LFSR:
"Non-binary Galois LFSR
Binary Galois LFSRs like the ones shown above can be generalized to any q-ary alphabet {0, 1, ..., q − 1} (e.g., for binary, q = 2, and the alphabet is simply {0, 1}). In this case, the exclusive-or component is generalized to addition modulo-q (note that XOR is addition modulo 2), and the feedback bit (output bit) is multiplied (modulo-q) by a q-ary value, which is constant for each specific tap point. Note that this is also a generalization of the binary case, where the feedback is multiplied by either 0 (no feedback, i.e., no tap) or 1 (feedback is present). Given an appropriate tap configuration, such LFSRs can be used to generate Galois fields for arbitrary prime values of q."
Here is a spitbol/snobol4 solution. Assumes the number of items in the set is not greater than the alphabet.
* routine stolen from gimpel
* algorithm from peck and schrack
define('p(s)t,n,c,k','p_init') :(p_end)
p_init n = size(s)
r = array('2:' n, 0)
&alphabet len(n) . y
x = array('2:' n, y)
k = n + 1
p_0 k = k - 1
x[k] len(1) . s1 tab(k) . s2 = s2 s1 :s(p_0)
define('p(s)i,k')
p = s :(return)
p k = size(s)
p_1 s = replace(x[k],y,s) :f(p_2)
r[k] = r[k] + 1
k = eq(remdr(r[k], k),0) k - 1 :s(p_1)
p = s :(return)
p_2 define('p(s)t,n,s1,s2','p_init') :(freturn)
p_end
* example: all permutations of string abcdefgh
s = 'abcdefgh'
abc output = p(s) :s(abc)f(end)
end
Here is another solution that only returns the unique permutations. The items of the set must first be sorted or grouped, e.g, a string like "cabcd" could be given as "ccabd", "adbcc", "abccd", etc. Duplicate items must be adjacent.
define('r(s,ors)c,f,s1,a,d,os')
:(r_end)
r ors rtab(1) len(1) . c :f(freturn)
s (span(c) | null) . f =
s arb . s1 len(1) . d c = :f(r_1)
r = s1 f c d s :(return)
r_1 ors break(c) . os
r = r(s,os) f :s(return)f(freturn)
r_end
s = 'abcdefgh'
output = s
x01 output = r(output,s) :s(x01)
end
>"For each item of a set, you will either include it or exclude it in one of the results."
I found this interesting. This may be a silly question(my combinatorics knowledge is limited) but what exactly is the relationship between power sets and "N choose K"?
The classic recurrence is if you're choosing K elements out of N, it either has the first element (in which case you choose K-1 elements out of the remaining N-1) or it doesn't (in which case you choose K elements out of the remaining N-1), so:
N choose K = (N-1 choose K-1) + (N-1 choose K).
Also, if you sum (N choose K) for all values of K you get 2^N.
The combinations (n choose k) tell you how many ways there are to take k things from n, and so count that subset of the power set. And since the sets of k things are obviously distinct from the sets of j != k things, there is no overlap. This gives the well-known result that 2^n = Sum n choose k.
If anyone wants to know how deep down the rabbit hole this stuff goes, they should read this blog post [1] on writing a Levenshtein Automaton to speed up fuzzy matching in Lucene by 100 times. It gets deep!
Posts like this are why I love HN. I recently wrote a program to find anagrams of a given string (a Countdown solver if you live in the UK).
It includes a really naive method for generating all possible permutations of the string, but reading this post I can immediately see a far better way to do it.
That's my weekend taken care of! thanks to the poster and author.
There was a recent post about anagrams and the author just put all the letters in alphabetical order and then compared the words, which was more efficient than generating permutations.
I ran an anagram website in the 90s, and that was what we did. We stored word lists with a key of all their letters sorted. We then calculated anagram scores on demand because I couldn't work out a nice enough way to store those.
Python recursive solution for generating permutations:
s = 'abcd'
prefix = ''
permutation(prefix, s)
def permutation(prefix, s):
n = len(s)
if n == 0:
print(prefix)
else:
for i in range(n):
permutation(prefix + s[i], s[:i] + s[i+1:])
You'd need to be finding the permutations for a list ~1000 elements long, but for such a list there are far too many permutations for such a request to be meaningful.
Nice post. I don't know why there is no a bigger push in experiments/investigations with tree-like massive combinatorial expansion with lazy-evaluation, so the solution space can be somewhat "reasonable" (e.g. assuming you have infinite combinatorial expansion capability, adding restrictions, so once you start the evaluation most of the combinations get discarded, so you'll have to deal with a "sparse combinatorial expansion" analog to a convex space for solutions in geometry).
I don't know how efficient it is, but the bsd glob() function in their c stdlib can generate unique permutations with brace syntax. If you wanted all permutations of "abc", you would pass it "{a,b,c}{a,b,c}{a,b,c}"
It's also easy to get to bsd's glob() on non-bsd platforms with Perl:
For the second but last level (which will be entered n! times) it is doing n iterations which gives us a (not so tight) lower bound of n * n! instead of the optimal n!.
The issue is the use of an array for visited instead of, say, a linked list where each level gets a list of only unvisited nodes.
In many Prolog implementations, this predicate is already implemented as permutation/2, but the point here is to show that you can also implement it easily yourself, at least in the input->output direction, where the first argument is fully instantiated.
To solve both the second and third task, let us relate a list to one of its subsequences:
I am curious where you would use something like this. I've worked with Minimum Edit Distance, which is similar (what is the minimum number of steps to convert one string into another).
Not sure where you would need to pull together all variations of the string and apply it to a problem.
It is true that due to complexity reasons,enumerating all combinations is usually avoided. However, in my opinion it is useful to know how to do things the basic way when learning more advanced algorithms such as RANSAC, APRIORI and others...
ha, fun.. I used to choke so much on powersets, after reading a lot I finally got the recursive version; here in python (I know, but I couldn't find back my lisp/ml version so..)
It doesn't contribute much to the conversation to note that your favorite language has a canned procedure for solving an algorithmic problem like this. If you're genuinely interested in demonstrating Haskell's capabilities it would be far more constructive to offer an implementation of such a routine, that it might be compared directly to the others in the article and this thread.
There's a similar "canned procedure" a guy posted in python, why he wasn't judged with the same severity, is there some kind of bias against Haskell around here?