This is really elegant! To make the explanation more concrete, here's a snapshot of the algorithm's state (the dict D) after it has processed numbers up to 10 (inclusive):
12: [3, 2]
25: [5]
49: [7]
So each prime up to 10 is present exactly once. When we process 12, we move 2 to the list for 14, and 3 to the list for 15.
The Sorenson paper linked in the post is also a great read, I remember reading it a couple of years ago and it is very clear. It adds many further optimizations on top of standard sieve of Eratosthenes (e.g. wheel factorization is a generalization of the idea of considering only numbers in {1, 5} mod 6, or {1, 7, 11, 13, 17, 19, 23, 29} mod 30, or …), and segmenting is what you need to do to avoid running out of memory.
However, if you just want the nth prime, it's even faster to compute π(n) by combinatorial/analytic methods, instead of a sieve: the same author has https://github.com/kimwalisch/primecount for that. (Try the ten-trillionth prime number.)
My favorite is this FRACTRAN program: (17/91, 78/85, 19/51, 23/38, 29/33, 77/29, 1/17, 11/13, 13/11, 15/2, 1/7, 55/1) with input 2.
It doesn't quite produce primes directly. It gives a sequence of integers, some of which are powers of 2. Those powers of 2 are 2^2, 2^3, 2^5, 2^7, 2^11, ..., i.e., the sequence of 2 to the power of primes.
For those who have never seen FRACTRAN it is an esoteric programming language invented by John Conway. It is Turing-complete.
Here's how you execute a FRACTRAN program, with input N.
1. Step through the list of fractions in order until you find a fraction f such that fN is an integer or you run out of fractions.
The last time this formula came up, I added some references to the Wikipedia article (current version: https://en.wikipedia.org/w/index.php?title=Formula_for_prime... about the "worthlessness" of such formulas, but it doesn't deter people from being interested in that formula… as long as they don't imagine it's useful in any way, it's fine to enjoy its perverse cleverness, I suppose.
The video does well to quote Wilf on “the disease was preferable to the cure” and to end with “Could it [C. P. Willans] be a pseudonym for someone who didn't want to tarnish their reputation by writing about useless prime-generating formulas?” :-) Dudley (who mentions a large list of other such formulas) also has a nice conclusion:
> The conclusion to be drawn from all this, I think, is that formulas for formulas' sake do not advance the mathematical enterprise. Formulas should be useful. If not, they should be astounding, elegant, enlightening, simple, or have some other redeeming value.
Wow this is a great optimization! You're under-selling it :) It changes the memory requirement from O(π(n)) to O(π(√n)), e.g. after processing numbers up to 100, the internal state in the OP's algorithm is:
This seems to be a matter of the above source reporting a variation of the algorithm that is asymptotically slower than the algorithm given in the paper *and* uses asymptotically more memory. Thanks for posting the paper!
I enjoyed this, but I am a little stuck on how stuck the author is on yield. If one ends up creating a large list of prime numbers with this generator, why not just return a large list of prime numbers? The author clearly notes how optimizing Python code is "odd" when you can more easily switch languages, but languages like C/C++/Rust would not have pretty code because it does not have yield. It is obvious that returning an array of prime numbers is what one would do in C/C++/Rust.
I understand that not using yield would mean it is not strictly a "generator", but the spirit of this problem is generating the sequence of primes (unless I am missing something).
The list you need to process is large enough that it would be prohibitive to put it all in memory at once. Furthermore you do not start knowing things like how many primes to look at. Therefore generating and dealing with them iteratively makes for simple code and a lot fewer headaches.
Why wouldn't someone just make a C++ class that keeps some state and call a method that gives them as many primes as they want, but runs 100x faster than python? Why would python yield somehow be better than this?
Project Euler is always about finding the right algorithm, and not raw performance.
With the right algorithm you can solve it in Python in a max of 30 seconds. With the wrong one, you sometimes can't solve it in C++ in the lifetime of the universe.
Therefore you should program in the language that you find it easiest to express yourself in. And not in a language chosen for raw speed.
You'll see the win pretty quickly when you have to search a large range for primes.
This isn't just shifting the goal posts, you are now talking about something completely different.
With the right algorithm you can solve it in Python in a max of 30 seconds
Then you can do the same thing in C++, but when it runs the C++ program will be done 100x faster. A lot of modern C++ is as clean as python if you were being more explicit about types.
With the wrong one, you sometimes can't solve it in C++ in the lifetime of the universe.
Nothing about this makes any sense.
Therefore you should program in the language that you find it easiest to express yourself in. And not in a language chosen for raw speed.
If you continue on with python and need more speed you will run around in circles trying to get it while the C++ version is already done. Anything where speed can be a bottleneck is not fit to be written in pure python.
This pattern plays out over and over again. If something is resource intensive, a scripting language is not a long term solution.
Absolutely none of your arguments make sense in a context of a series of programming puzzles of fixed size. Back in the real world, there is a real need for exploratory programming on mathematical problems. I' prefer to use Python for that. If your problems are statistical, R is a better choice. C++ is a poor fit for exploratory programming.
Looking to the future, I would choose Rust over C++ for most new projects where people currently use C++. It seems bizarre to me that C++ evangelists can talk about type safety and ignore memory safety, when memory safety is a giant problem for security and correctness in any large program.
So, yeah. C++ is great for what it's great for. But if I'm going some exploratory back of the envelope calculations, I'm still reaching for Python. And yield is one of the reasons why.
It works fine for me. Are you sure this isn't just because you don't have a lot of practice with C++?
It seems bizarre to me that C++ evangelists can talk about type safety and ignore memory safety, when memory safety is a giant problem for security and correctness in any large program.
This has nothing to do with the current thread, but this isn't really a problem in modern C++. Are you you sure you are up to date with modern C++? Most programs don't look too different than python due to for iteration and auto type deduction.
I have no idea why you are trying to evangelize C++ here. I use a wide variety of languages for different purposes. For Project Euler type tasks I switched to Python about 15 years ago because of yield, and the convenience of using objects as hash keys. If you make different choices and they work for you, then have fun.
You seem to be saying there is some advantage to python because of the yield feature and I just don't think that's true. You can keep state in a class and call a method. There are a lot of disadvantages though.
The generator is already maintaining a dictionary containing multiple lists that hold all the primes found so far. Even if it instead kept yielding a list of all the primes so far (rather than the most recent prime) it would take less than twice the memory usage.
I'd say the flexibility to not need to know how big of a prime you'll need is the main draw.
I obviously have a version without the memory issue.
The programming flexibility is indeed the win.
def primes ():
yield 2
yield 3
f = {}
n = 5
for p in primes():
if p == 2:
continue
p2 = p*p
while n < p2:
if n in f:
double_p = f.pop(n)
m = n + double_p
while m in f:
m += double_p
f[m] = double_p
else:
yield n
n += 2
f[n] = 2*p
This allocates a dictionary and the primes() generator for every single prime generated though, so it still has the memory issue, unless there's something I'm missing about how it works?
Put a debugging statement in and you'll verify that the dictionary only contains primes up to the square root of the one just generated.
To do that it recursively contains a second prime generator that only produces up to the square root. Which contains one that produces up to the fourth root. And so on until you get down to 3. The work and memory for those recursive generators is a rounding error compared to the initial prime generator. The work and memory saved by not keeping unnecessary primes is a significant win.
"I can't try to rewrite it in C++ or Rust for now, due to the lack of generator support; the yield statement is what makes this code so nice and elegant, and alternative idioms are much less convenient."
That is the context of the first sentence in your comment, pretty much the only part I left alone.
Everything I replied to is the opinion you personally expressed.
And note that the author had already preempted your comment in the first place:
> When we want a list of all the primes below some known limit, gen_primes_upto is great, and performs fairly well. There are two issues with it, though:
> We have to know what the limit is ahead of time; this isn't always possible or convenient.
It's a bit of a convolution, but let's say we've got some iterator of unknown size and we want to have a prime each iteration.
We can do the following in python, which saves us from having to pre-compute the primes _or_ know how long `some_iterator` is.
primes = gen_primes()
for item, prime in zip(some_iterator, primes):
This implementation runs until it crashes. So you need a way to output the values as you go, or they will be lost when you run out of memory. It also lets you control how many values are generated if you just want to see a few without actually using all the RAM.
Can I just say how surprised I am the many people have a favourite prime number generator.
I can’t say I have a favourite but I do enjoy multiplying two random prime numbers with each other and using the random result when I need a random number with added random.
A way to add even more random is to generate a list of N random primes, then shuffling it M times before multiplying the elements. N and M are of course random positive integers.
To add maximum random, sometimes I like to return a string with a random Gertrude Stein quote, segfault, or make the PC speaker beep instead of returning a number. So random!
Nice, I like the optimization replacing the list, and just hunting for the next 'free' composite for the prime.
Teaching my kids coding, we also arrived at most of the other optimizations, in Janet:
(defn primes-sieve []
(fiber/new (fn []
(yield 2) # the only even prime, yield it explicitly.
(let [sieve (table/new (math/pow 2 12))]
(loop [n :range [3 nil 2]] # only odd numbers
(if-let [current-composite (sieve n)]
(do
(each prime-factor current-composite
(let [next-composite-key (+ n (* 2 prime-factor))] # add 2*prime-factor, else composite would be even
(if-let [next-composite (sieve next-composite-key)]
(array/push next-composite prime-factor)
(put sieve next-composite-key @[prime-factor]))))
(put sieve n nil))
(do
(yield n)
(put sieve (* n n) @[n])) # next composite is n^2. Anything less is composite a smaller prime, already in the sieve
))))))
import itertools as it
def erat3( ):
D = { 9: 3, 25: 5 }
yield 2
yield 3
yield 5
MASK= 1, 0, 1, 1, 0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0,
MODULOS= frozenset( (1, 7, 11, 13, 17, 19, 23, 29) )
for q in it.compress(
it.islice(it.count(7), 0, None, 2),
it.cycle(MASK)):
p = D.pop(q, None)
if p is None:
D[q*q] = q
yield q
else:
x = q + 2*p
while x in D or (x%30) not in MODULOS:
x += 2*p
D[x] = p
This adds two things over the OP's initial version, both mentioned in the post: wheel factorization (with 30), and the "linear probing" to avoid lists. It's possible to keep improving (up to a point) by using e.g. wheel with 210 instead of 30, and so on, if one doesn't mind hard-coding more and more.
The point isn’t hardcoding those numbers. The point is that you don’t iterate on any multiples of those numbers (which we know are composite) and doing allows us to eliminate 2/3 of the iterations. It’s an extension of enumerating over only odd numbers, which eliminates 1/2 of the iterations.
Hardcoding those values is a semi-requirement of the above technique.
d = defaultdict(list)
# Magic
for i in range(2, 10000):
for y in i in d and d.pop(i) or [i]:
d[i+y] += (y,)
# Not sure how to print this in a succinct way
print(sorted(list(set().union(*d.values()))))
from collections import defaultdict
d = defaultdict(list)
for i in range(2, 10000):
for y in d.pop(i, [i]):
d[i + y].append(y)
for y in sorted(y for x in d.values() for y in x):
print(y)
you'll never be in cicada 3301 with that kinda attitude my guy!
but to counter your point: people do have to write those libraries that you depend on. others have to validate them, and yet still others are doing their best efforts to invalidate them and attack them.
but you are correct that most people should not roll their own cryptography suites.
[1] https://tromp.github.io/cl/cl.html
[2] https://github.com/tromp/AIT/blob/master/characteristic_sequ...