Yeah, that's the thing with silly examples (like my Mandelbrot program). For real code, I frequently need to write low level loops which aren't easily expressed as parallel operations. If numpy doesn't have it, or if you can't figure out how to parallelize it, you're screwed.
Moreover, for some very common things in signal processing, like working with evens/odds or left/right parts of an array, the parallel numpy operation will create lots of temporary arrays and copies.
And for what it's worth, your version of mandel should work with PyPy. So you can have your cake and eat it too.
EDIT: I should add the reason my code is "strange" is because I wrote it so I could do a one-to-one comparison with other languages which don't have builtin complex numbers. Maybe I should've cleaned that up before posting.
> Moreover, for some very common things in signal processing, like working with evens/odds or left/right parts of an array, the parallel numpy operation will create lots of temporary arrays and copies.
iiuc, this shouldn't be correct. given an ndarray A, `A[::2, 1::2]` will provide a (no-copy) view of the even rows/columns of A. Same with A[:len(A)/2] to get only half of A.
> And for what it's worth, your version of mandel should work with PyPy. So you can have your cake and eat it too.
Indeed, most of the scipy stack works with Pypy, it's great.
Unnecessary temporary arrays is definitely a major source of inefficiency when working with NumPy, but recent versions of NumPy go to heroic lengths (via Python reference counting) to avoid doing so in many cases:
https://github.com/numpy/numpy/blob/v1.18.3/numpy/core/src/m...
So in this case, NumPy would actually only make one temporary copy, effectively translating the loop into the following:
for j in range(255):
u = z**2 # create a new squared array
u += c # add in-place
z = u # replace the old array
This gets rid of temporary arrays, but this still isn't optimal if z is large. Memory locality means it's faster to apply a scalar operation like z2+c in a single pass, rather than in two separate passes.
Explicitly unrolling loopy code (e.g., in pypy or Numba) is one easy way to achieve this, but you have to write more code.
Gotcha. I feel like I remember some numpy or scipy way of creating complex ufunc ops and applying them simultaneously, but maybe I'm misremembering or thinking np.vectorize was fast?
Moreover, for some very common things in signal processing, like working with evens/odds or left/right parts of an array, the parallel numpy operation will create lots of temporary arrays and copies.
And for what it's worth, your version of mandel should work with PyPy. So you can have your cake and eat it too.
EDIT: I should add the reason my code is "strange" is because I wrote it so I could do a one-to-one comparison with other languages which don't have builtin complex numbers. Maybe I should've cleaned that up before posting.