Hacker News new | past | comments | ask | show | jobs | submit login

Yeah, that's the thing with silly examples (like my Mandelbrot program). For real code, I frequently need to write low level loops which aren't easily expressed as parallel operations. If numpy doesn't have it, or if you can't figure out how to parallelize it, you're screwed.

Moreover, for some very common things in signal processing, like working with evens/odds or left/right parts of an array, the parallel numpy operation will create lots of temporary arrays and copies.

And for what it's worth, your version of mandel should work with PyPy. So you can have your cake and eat it too.

EDIT: I should add the reason my code is "strange" is because I wrote it so I could do a one-to-one comparison with other languages which don't have builtin complex numbers. Maybe I should've cleaned that up before posting.




> Moreover, for some very common things in signal processing, like working with evens/odds or left/right parts of an array, the parallel numpy operation will create lots of temporary arrays and copies.

iiuc, this shouldn't be correct. given an ndarray A, `A[::2, 1::2]` will provide a (no-copy) view of the even rows/columns of A. Same with A[:len(A)/2] to get only half of A.

> And for what it's worth, your version of mandel should work with PyPy. So you can have your cake and eat it too.

Indeed, most of the scipy stack works with Pypy, it's great.


As soon as you do any operations (addition, subtraction, etc...) on those views, you're going to get temporary arrays.

For instance, your Mandelbrot example doesn't even use views and it creates two temporaries the size of the entire array on each iteration:

    for j in range(255):
        t = z**2    # create a new squared array
        u = t + c   # create a new summed array
        z = u       # replace the old array
And all of this is ignoring how inconsistent numpy is about when it creates a view and when it creates a copy.


Unnecessary temporary arrays is definitely a major source of inefficiency when working with NumPy, but recent versions of NumPy go to heroic lengths (via Python reference counting) to avoid doing so in many cases: https://github.com/numpy/numpy/blob/v1.18.3/numpy/core/src/m...

So in this case, NumPy would actually only make one temporary copy, effectively translating the loop into the following:

    for j in range(255):
        u = z**2   # create a new squared array
        u += c     # add in-place 
        z = u      # replace the old array


Your general point is correct, although in this specific instance, replacing the loop body with

    z **= 2 
    z +=c
gets rid of the temps. But yes there are cases where that isn't possible.


This gets rid of temporary arrays, but this still isn't optimal if z is large. Memory locality means it's faster to apply a scalar operation like z2+c in a single pass, rather than in two separate passes.

Explicitly unrolling loopy code (e.g., in pypy or Numba) is one easy way to achieve this, but you have to write more code.

Julia has some really nice syntax that lets you write things in this clean vectorized way but still get efficient code: https://julialang.org/blog/2017/01/moredots/


Gotcha. I feel like I remember some numpy or scipy way of creating complex ufunc ops and applying them simultaneously, but maybe I'm misremembering or thinking np.vectorize was fast?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: