For anyone interested in implementations of the Sieve of Erastothenes, here is a very interesting paper about functional ones, which are harder than they appear at first:
https://www.cs.hmc.edu/~oneill/papers/Sieve-JFP.pdf
it uses Nietzsche which can be found in my repos. On my three year old computer, it computes the first ten million primes in about 40 seconds (up to, and including 179424673). That is only about 4 times slower than the optimized non-lazy naive sieve.
The sieve of Eratosthenes can be adapted to build tables of other number properties: smallest prime factor, totient function, number of unique prime factors (omega), product of unique prime factors (radical).
I first learned of this sieve while reviewing the code for the Primecoin proof-of-work. Primecoin is a cryptocurrency that tries to have its PoW provide greater utility than hashing, but its PoW otherwise is similar to hashing. The task is to produce Cunningham chains [1] of prime numbers. The reference implementation of Primecoin computes primes using the Sieve of Eratosthenes (w/arbitrary precision integer math).
There's a small optimization on the original article which I think your implementation doesn't do: "At each stage we know that every composite number up to p^2 has definitely been marked as such"..
Assuming your implementation is working fine (didn't test it), I think the change would be j = i * i on the inline for:
void sieve()
{
for(int i = 2; i < n; i++)
for (int j = i*i; j < n; j += i) p[j]++;
}
It's tricky to explain, but take for instance the numbers from 1 to 20. When i=2, it will clear [4, 6, 8, 10, 12, 14, 16, 18, 20] as non-primes. So when i=3, you can skip 6 (which was cleared by i=2) and start directly at 9 (i * i = 3 * 3) to clear [9, 12, 15, 18]. Also, when i=4, you can skip the 8 and 12 (which were cleared by i=2 and i=3) and start on 16 to clear [16, 20]. Finally, when i=5, you can start directly at 25 (out of bounds in this case) because 10, 15 and 20 were already cleared by the previous iterations..
Edit: But as other posters are saying below, the original sieves algorithm shouldn't even have to test i=4 because the numbers it clears (naivelly [4,8,12,16,20]) have already been cleared by the previous iterations of the algorithm.
Suppose we have a composite number K that is less than P^2.
Then K = A * B, for some A and B not equal to 1, (which is the definition of composite). Let us say that A is less than or equal to B. (If this is not the case, switch the labels.)
Then A must be less than P:
If A is greater than P, K = A * B > A * P > P^2. But we chose K such that K < P^2.
If A is less than P, then K will have been marked during the cycle of Q, a prime divisor of A, which will occur before P. Therefore, we do not need to examine any numbers below P^2 -- they are either prime or have been marked.
Maybe stop the outer loop as soon as i * i > n then as well.
Also, where's the check whether `i` is actually prime before you start crossing off multiples? You don't need to cross off multiples of composite numbers.
> Also, where's the check whether `i` is actually prime before you start crossing of multiples? You don't need to cross off multiples of composite numbers.
You are absolutely right as the original method seems to specify that. I've edited my answer to show it.
So the function would be something like (untested):
void sieve()
{
for (int i = 2; i*i < n; i++)
if (p[i] == 1)
for (int j = i*i; j < n; j += i) p[j]++;
}
There's the problem, as other posters are saying, that if p[j] == 1 is because it was cleared as non-prime, but I've just tested p[i] == 1 for primality for coherence with the original author.
My Project Euler prime sieving / counting code was
vector<uint8_t> is_prime(n, true);
// Sieve
for (uint64_t k = 3; k * k < n; k += 2)
if (is_prime[k])
for (uint64_t multiple = k * k; multiple < n; multiple += 2 * k)
is_prime[multiple] = false;
It only loops over odd numbers; it only crosses off odd multiples of prime numbers. Note the stop condition of the outer loop and starting point of the inner loop.
Note that this code is memory-bound. You can get nearly 10x speed up if you implement a segmented sieve (sieving [2,n] in intervals, keeping memory costs more or less fixed)
It fills the array with the number of factors each number has, by taking each number less than n, and for each integer multiple of it less than n incrementing that cell.
> in other words he's not doing the test that something is prime, and crossing out the multiples of composites as well.
This doesn't change the outcome, since the multiples of composites are also multiples of primes and never primes.
> There are other problems as well, he adds one each time to p[j], but tests to see if it's ==1, so that won't work.
You're wrong here -- testing equality to 1 for the number of times an entry in the array was hit is exactly testing of primality post-sieve. If it was only touched once, then it has nothing that divides it (since the only cycle it was incremented on is the one starting there).
> it's certainly not the actual Sieve of Eratosthenes for any one of several reasons
If you initialized the array entries to 1 and started indexing your inner loop at j= i * i, it's exactly the sieve (modulo multiple strikeouts of removed numbers).
>> ... he's not doing the test that something is prime, and crossing out the multiples of composites as well.
> This doesn't change the outcome, since the multiples of composites are also multiples of primes and never primes.
It doesn't change the outcome (I said that somewhere else) but it changes the algorithm. The algorithm now produces primes, yes, but it's not the Sieve of Eratosthenes.
>> ... he adds one each time to p[j], but tests to see if it's ==1, so that won't work.
> You're wrong here -- testing equality to 1 for the number of times an entry in the array was hit is exactly testing of primality post-sieve. If it was only touched once, then it has nothing that divides it (since the only cycle it was incremented on is the one starting there).
You're right, so this is definitely a different algorithm. In this algorithm what he does is:
Initialise a vector V to 0
For every number n>=2:
For every multiple m>=n of n:
Add one to V[m]
The primes are those with V[n]==1. Yes, that works, I was mistaken because I was expecting the SoE. This is not that.
>> it's certainly not the actual Sieve of Eratosthenes for any one of several reasons
> If you initialized the array entries to 1 and started indexing your inner loop at j= i \* i, it's exactly the sieve (modulo multiple strikeouts of removed numbers).
... and conditioned on striking out multiples of primes you've already identified.
There are two key components to the Sieve of Eratosthenes. One is that you walk down the vector instead of doing divisions, the other is that you identify the primes so far and only strike out those multiples.
In particular, this code must have the outer loop run all the way to n whereas the SoE only runs to sqrt(n). That's a key part of being the Sieve.
As observed by stabbles[0], the time complexity of the code presented here is worse than the time complexity of the Sieve of Eratosthenes, therefore it's a different algorithm.
I was wrong when I claimed the output would be wrong - see my comments here[0]. The output is correct, but the code is not the Sieve of Eratosthenes. I was misled by the expectation that it would be.
Back about 1964, when I was learning to program (IBM 7090 in assembly language), I did it at assembly time in the surprisingly powerful IBM 7090 macro assembly language. Probably somewhat slower.
JFYI, I find it simpler to use this method, (same method of halving and doubling) used often for Roman numerals multiplication, but in the version that uses two colums only, crossing out the lines not needed (more similar to the "sieve" approach BTW):
Thanks for fixing the link - I was going to remove this comment for the sake of being tidy (sys admin thinking) but it appears I can't delete it, only edit it.
Reminds me of some code I wrote for a codeeval challenge for returning the sum of all primes up to n inclusive. IE: sum_primes(2) = 2, sum_primes(3) = 2+3 = 5 = sum_primes(4)...
It's odd that this version doesn't use a set(), but dictionary/list.
Anyway, I'll have to refactoring my code and measure against op - although I only used the python as a reference to sanity-check my naive c++ code:
I usually use this sieve any time I need to work with small prime numbers, but I dug deeper into probabilistic primality tests recently. If you try to run a sieve on large numbers you quickly realize that it is not feasible. Probabilistic primality tests like the Miller-Rabin are very fast and incredibly accurate, yet they are nearly as simple as this sieve. I didn't fully trust RSA until I learned enough about Miller-Rabin to convince myself the prime numbers it uses are safe.
Upon a stack of bits, about so tall,
just think, O traveller, what we could do
if every other bit was set to false
beginning with (but not including) two?
Now take two lowly numbers, A and B
that equal three and two. What would happen
if they, by twos and ones respectively,
were incremented in a nested fashion,
And if we falsified, at every turn,
the value offset by A groups of Bs?
Then to the aether let those bits return,
to fly back home to Eratosthenes!
The Eratosthenes sieve is pretty basic. I would expect any programmer to be able to do it on the whiteboard. Now the quadratic sieve, that is a challenge. The GNFS even more so.