Hacker News new | past | comments | ask | show | jobs | submit login
Generating all permutations, combinations, and power set of a string (2012) (exceptional-code.blogspot.com)
144 points by rimher on May 5, 2017 | hide | past | favorite | 44 comments



There's a simple iterative alternative for generating power sets. For each item of a set, you will either include it or exclude it in one of the results. If you count from 0 to 2^n, the binary digits of your counter will enumerate every combination of these possibilities. In Java 8:

    static <T> Set<Set<T>> power(List<T> a) {
      return IntStream.range(0, 1 << a.size())
      .mapToObj(x -> IntStream.range(0, a.size())
        .filter(y -> (1 << y & x) != 0)
        .mapToObj(y -> a.get(y))
        .collect(Collectors.toSet())
      ).collect(Collectors.toSet());
    }
In JS/ES6:

    function power(a) {
      var r = [];
      for(var x = (1 << a.length) - 1; x >= 0; x--) {
        r.push(a.filter((y, i) => 1 << i & x));
      }
      return r;
    }
In K6:

    {x@&:'+!(#x)#2}


If you enjoy this sort of thing, it is also the start of Knuth's volume 4. In it, he goes over many different schemes of generating things.

My favorites are the ones he calls "loopless" since they do not have a loop during each iteration of the algorithm. So, in this example, there is no need for the filter line that will ultimately have to loop (or hash or something else not trivial) to find the items to filter out). In particular, to generate all permutations of binary digits requires something like 4 assignments and one test for each iteration.

The ridiculously interesting ones are based on De Bruijn sequences. Basically, think if you have a combination lock that constantly shifts in 1 character when you type. Is it possible to test all combinations of digits without "wasting" a keypress?


The Wikipedia page for de Bruijn sequences is pretty good. https://en.wikipedia.org/wiki/De_Bruijn_sequence

My personal interest is in generalized LFSRs, which can be used to generate de Bruijn sequences. From the wiki page for LFSR:

"Non-binary Galois LFSR

Binary Galois LFSRs like the ones shown above can be generalized to any q-ary alphabet {0, 1, ..., q − 1} (e.g., for binary, q = 2, and the alphabet is simply {0, 1}). In this case, the exclusive-or component is generalized to addition modulo-q (note that XOR is addition modulo 2), and the feedback bit (output bit) is multiplied (modulo-q) by a q-ary value, which is constant for each specific tap point. Note that this is also a generalization of the binary case, where the feedback is multiplied by either 0 (no feedback, i.e., no tap) or 1 (feedback is present). Given an appropriate tap configuration, such LFSRs can be used to generate Galois fields for arbitrary prime values of q."


Here is a spitbol/snobol4 solution. Assumes the number of items in the set is not greater than the alphabet.

   * routine stolen from gimpel
   * algorithm from peck and schrack 

    define('p(s)t,n,c,k','p_init') :(p_end)
   p_init n = size(s)
    r = array('2:' n, 0)
    &alphabet len(n) . y
    x = array('2:' n, y)
    k = n + 1
   p_0 k = k - 1
    x[k] len(1) . s1 tab(k) . s2 = s2 s1  :s(p_0) 
    define('p(s)i,k') 
    p = s :(return)
   p k = size(s)
   p_1 s = replace(x[k],y,s) :f(p_2)
    r[k] = r[k] + 1
    k = eq(remdr(r[k], k),0) k - 1 :s(p_1)
    p = s :(return)
   p_2 define('p(s)t,n,s1,s2','p_init') :(freturn)
   p_end

   * example: all permutations of string abcdefgh
    s = 'abcdefgh'
   abc output = p(s) :s(abc)f(end)
   end
Here is another solution that only returns the unique permutations. The items of the set must first be sorted or grouped, e.g, a string like "cabcd" could be given as "ccabd", "adbcc", "abccd", etc. Duplicate items must be adjacent.

    define('r(s,ors)c,f,s1,a,d,os') 
    :(r_end)
   r ors rtab(1) len(1) . c :f(freturn)
    s (span(c) | null) . f =
    s arb . s1 len(1) . d c = :f(r_1)
    r = s1 f c d s :(return)
   r_1 ors break(c) . os
    r = r(s,os) f :s(return)f(freturn)
   r_end

    s = 'abcdefgh' 
    output = s
   x01 output = r(output,s) :s(x01)
   end


>"For each item of a set, you will either include it or exclude it in one of the results."

I found this interesting. This may be a silly question(my combinatorics knowledge is limited) but what exactly is the relationship between power sets and "N choose K"?


An example can be helpful:

Given Set A = {1, 2, 3}

This is the power Set of A P(A) = {{}, {1}, {2}, {3}, {1, 2}, {1, 3}, {2, 3}, {1, 2, 3}} (All the subsets of A including the empty set and A itself)

3 choose 2 = 3 (If you look in the power set, you'll see that there are only 3 ways to choose 2 items out of 3)

Note that order does not matter.


The classic recurrence is if you're choosing K elements out of N, it either has the first element (in which case you choose K-1 elements out of the remaining N-1) or it doesn't (in which case you choose K elements out of the remaining N-1), so:

N choose K = (N-1 choose K-1) + (N-1 choose K).

Also, if you sum (N choose K) for all values of K you get 2^N.


I came at this from the perspective of the Binomial theorem:

(x + y)^n = sum{k=0,n} (n choose k) x^(n-k) y^k

And since a powerset is:

sum{k=0,n} (n choose k)

The way we can turn the first equation into a powerset is by setting x=1 and y=1, hence:

2^n = sum{k=0,n} (n choose k)


Thanks for the good explanations. Cheers.


The combinations (n choose k) tell you how many ways there are to take k things from n, and so count that subset of the power set. And since the sets of k things are obviously distinct from the sets of j != k things, there is no overlap. This gives the well-known result that 2^n = Sum n choose k.


I prefer the use of iterators, instead of generating a whole collection. An example of a permutation iterator in C++ is:

    class Permutations
    {
    public:
        Permutations(int val_n) : n(val_n), _more(true)
        {
            a = new int[n];
            for (int i = 0; i < n; i++)
                a[i] = i;
        }
        ~Permutations()
        {
            delete []a;
        }
        bool more() { return _more; }
        void next()
        {
            _more = false;
            for (int i = n-2; i >= 0; i--)
                if (a[i] < a[i+1])
                {
                    _more = true;
                    for (int j = n-1; j > i; j--)
                        if (a[j] > a[i])
                        {
                            swap(a[j], a[i]);
                            break;
                        }
                    i++;
                    for (int j = n-1; j > i; j--, i++)
                        swap(a[j], a[i]);
                    break;
                }
        }
        int operator[](int i) { return a[i]; }
            
        int n;
    private:
        void swap(int &x, int &y)
        {
            int h = x;
            x = y;
            y = h;
        }
        bool _more;
        int *a;
    };


If anyone wants to know how deep down the rabbit hole this stuff goes, they should read this blog post [1] on writing a Levenshtein Automaton to speed up fuzzy matching in Lucene by 100 times. It gets deep!

[1]: http://blog.mikemccandless.com/2011/03/lucenes-fuzzyquery-is...


I am going to have to go through [1] referenced in the post.

[1]: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.16.6...


You can use binary to help generate your combinations of k elements (every number of n bits with k 1s represents a combination of n-choose-k): http://alquerubim.blogspot.com.br/2012/05/combinacoes-de-dig...

And you can use factoradics to generate your permutations (because the leftmost digit tells you which is the next element): http://alquerubim.blogspot.com.br/2012/06/combinacoes-de-dig...

So you transform your problem into counting. You just have to choose a suitable base.


Posts like this are why I love HN. I recently wrote a program to find anagrams of a given string (a Countdown solver if you live in the UK).

It includes a really naive method for generating all possible permutations of the string, but reading this post I can immediately see a far better way to do it.

That's my weekend taken care of! thanks to the poster and author.


"Good morning, that's a nice tnettenba." [1]

[1] https://www.youtube.com/watch?v=lsFAokXCxTI


There was a recent post about anagrams and the author just put all the letters in alphabetical order and then compared the words, which was more efficient than generating permutations.


I ran an anagram website in the 90s, and that was what we did. We stored word lists with a key of all their letters sorted. We then calculated anagram scores on demand because I couldn't work out a nice enough way to store those.

We had about 300 hits/day so it didn't matter.


That is what I did for this challenge: https://www.reddit.com/r/dailyprogrammer/comments/52enht/201...

https://github.com/Rhebel/DailyProgrammer/blob/master/DailyP...

I know my code is super verbose compared to other people's but I'd rather be understood than clever.


Sounds interesting, do you have a link?



Python recursive solution for generating permutations:

  s = 'abcd'
  prefix = ''
  permutation(prefix, s)

  def permutation(prefix, s):
      n = len(s)
      if n == 0:
          print(prefix)
      else:
          for i in range(n):
              permutation(prefix + s[i], s[:i] + s[i+1:])


An iterative Python recipe can be found in the docs: https://docs.python.org/3/library/itertools.html#itertools.p...


Straight from the horse's mouth! Thanks, Raymond :)


Wouldn't you hit Python's recursive limit rather quickly doing this?

There might be a way to avoid messing with the recursive depth if you swapped out the main part of the for loop for a generator though.


You'd need to be finding the permutations for a list ~1000 elements long, but for such a list there are far too many permutations for such a request to be meaningful.


Python standard lib...

  import itertools
  list(itertools.permutations('abc'))


Nice post. I don't know why there is no a bigger push in experiments/investigations with tree-like massive combinatorial expansion with lazy-evaluation, so the solution space can be somewhat "reasonable" (e.g. assuming you have infinite combinatorial expansion capability, adding restrictions, so once you start the evaluation most of the combinations get discarded, so you'll have to deal with a "sparse combinatorial expansion" analog to a convex space for solutions in geometry).


In Java there are libraries for this (as with most things): https://github.com/tomgibara/permute


I don't know how efficient it is, but the bsd glob() function in their c stdlib can generate unique permutations with brace syntax. If you wanted all permutations of "abc", you would pass it "{a,b,c}{a,b,c}{a,b,c}"

It's also easy to get to bsd's glob() on non-bsd platforms with Perl:

  perl -MFile::Glob=bsd_glob -e 'print bsd_glob("{a,b,c}{a,b,c}{a,b,c}\n");'


Note the sub-optimality of generatePermutations.

For the second but last level (which will be entered n! times) it is doing n iterations which gives us a (not so tight) lower bound of n * n! instead of the optimal n!.

The issue is the use of an array for visited instead of, say, a linked list where each level gets a list of only unvisited nodes.


Very nice, thank you for sharing!

For comparison, here are Prolog solutions for these tasks.

The first building block is a relation between a list, one of its elements, and the list of remaining elements:

    list_element_rest([E|Ls], E, Ls).
    list_element_rest([L|Ls0], E, [L|Ls]) :-
            list_element_rest(Ls0, E, Ls).
In the following, I assume you have set double_quotes to chars, so that you can easily work on a string as a list of characters:

    :- set_prolog_flag(double_quotes, chars).
Here is a sample interaction, using this relation:

    ?- list_element_rest("abc", E, Ls).
    E = a,
    Ls = [b, c] ;
    E = b,
    Ls = [a, c] ;
    E = c,
    Ls = [a, b] ;
    false.
Importantly, we can use it in all directions, for example also to insert an element into a list:

    ?- list_element_rest(Ls, a, "bc").
    Ls = [a, b, c] ;
    Ls = [b, a, c] ;
    Ls = [b, c, a] ;
    false.
Using list_element_rest/3 as a building block, we can define the relation between a list and one of its permutations as follows:

    list_permutation([], []).
    list_permutation([L|Ls], Ps) :-
            list_permutation(Ls, Ps0),
            list_element_rest(Ps, L, Ps0).
This completes the first task. Example:

    ?- list_permutation("abc", Ps).
    Ps = [a, b, c] ;
    Ps = [b, a, c] ;
    Ps = [b, c, a] ;
    Ps = [a, c, b] ;
    Ps = [c, a, b] ;
    Ps = [c, b, a] ;
    false.
In many Prolog implementations, this predicate is already implemented as permutation/2, but the point here is to show that you can also implement it easily yourself, at least in the input->output direction, where the first argument is fully instantiated.

To solve both the second and third task, let us relate a list to one of its subsequences:

    subseq([]) --> [].
    subseq([L|Ls]) --> ([] | [L]), subseq(Ls).
With this definition, you can for example get all 2-element combinations of "abcd" as follows:

    ?- length(Ls, 2), phrase(subseq("abcd"), Ls).
    Ls = [c, d] ;
    Ls = [b, d] ;
    Ls = [b, c] ;
    Ls = [a, d] ;
    Ls = [a, c] ;
    Ls = [a, b] ;
    false.
The powerset is a natural generalization of this query, where you simply omit the first goal:

    ?- phrase(subseq("abcd"), Ls).
    Ls = [] ;
    Ls = [d] ;
    Ls = [c] ;
    Ls = [c, d] ;
    Ls = [b] ;
    Ls = [b, d] ;
    Ls = [b, c] ;
    Ls = [b, c, d] ;
    Ls = [a] ;
    Ls = [a, d] ;
    Ls = [a, c] ;
    Ls = [a, c, d] ;
    Ls = [a, b] ;
    Ls = [a, b, d] ;
    Ls = [a, b, c] ;
    Ls = [a, b, c, d].


Generating necklaces and bracelets is fun too, see eg. http://www.sciencedirect.com/science/article/pii/S0304397512...


I am curious where you would use something like this. I've worked with Minimum Edit Distance, which is similar (what is the minimum number of steps to convert one string into another).

Not sure where you would need to pull together all variations of the string and apply it to a problem.


It is true that due to complexity reasons,enumerating all combinations is usually avoided. However, in my opinion it is useful to know how to do things the basic way when learning more advanced algorithms such as RANSAC, APRIORI and others...


Thanks! I've been looking into this topic and the article is very helpful.


ha, fun.. I used to choke so much on powersets, after reading a lot I finally got the recursive version; here in python (I know, but I couldn't find back my lisp/ml version so..)

https://gist.github.com/agumonkey/6ab51dfa759ef1993c55


imo the coolest way to generate permutations https://en.m.wikipedia.org/wiki/Factorial_number_system



Interesting thread.

Apropos of permutations, I had written these two posts a few months ago:

Permutation facts:

https://jugad2.blogspot.in/2016/10/by-vasudev-ram-nicomachus...

perm_facts version with list comps and functional programming:

https://jugad2.blogspot.in/2016/10/permfacts-version-with-li...

The first post above also has some interesting facts about factorials, their occurrences and applications, which was a revelation to me.


In Haskell:

    import Data.List ( subsequences )
    powerset = subsequences


It doesn't contribute much to the conversation to note that your favorite language has a canned procedure for solving an algorithmic problem like this. If you're genuinely interested in demonstrating Haskell's capabilities it would be far more constructive to offer an implementation of such a routine, that it might be compared directly to the others in the article and this thread.


There's a similar "canned procedure" a guy posted in python, why he wasn't judged with the same severity, is there some kind of bias against Haskell around here?


We'll have to agree to disagree.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: