Knowing where you are: custom array indices in Julia

goatlover · on April 21, 2017

I really like how Julia handles arrays.

VHRanger · on April 21, 2017

Agreed. One big win here over numpy is that there's the `push!` function on Julia arrays (equivalent to the C++ vector's `push_back`).

Numpy arrays on the other hand are awkward to use in custom algorithms because once the memory is reserved, you can't grow the array at either end.

zeptomu · on April 21, 2017

I think that is some kind of feature. Numpy tries hard to guarantee efficient access to its ndarrays and this is simpler to implement using fixed arrays and mostly you know your bounds at initialization time anyway.

Btw. you can grow them using hstack and vstack although I am not sure how efficient that is.

xaedes · on April 21, 2017

Growing with vstack or hstack is very slow. I have seen code where a big csv with time series data is parsed line by line the corresponding numpy array was grown using vstack for each row. worst case: quadratic runtime

xapata · on April 22, 2017

That'd be so much faster using a generator and a NumPy.concatenate at the end. Orders of magnitude faster, if we're talking about a large CSV.

dnautics · on April 22, 2017

There are fixed arrays in Julia too, for those who absolutely must eke out that last 1% gain in performance

xapata · on April 22, 2017

It ain't just a 1% gain. Use a Python list if you want to append/pop.

_Wintermute · on April 22, 2017

A deque is probably a better option for performance.

xapata · on April 22, 2017

Depends on how much it grows. I'd actually expect very little difference for most append/pop situations. A deque is there for appendleft and popleft.

dnautics · on April 23, 2017

that may be so in Python, but have you checked the performance differences in Julia?

rjeli · on April 21, 2017

Yep, a problem I repeatedly run into with Numpy is that my data is too big to fit in memory as Python data, but can easily fit as Numpy matrices. It seems incrementally building Numpy matrices is very awkward and inefficient

edit: if anyone knows a solution to this, please let me know...

inlineint · on April 22, 2017

Under the hood this `push!` function have to do memory reallocation and copying the entire array when the array grows to be larger than some internal buffer, which happens each time the array becomes larger than a^n, where a is usually, but not necessary, selected to be equal to 2.

The amortized complexity is θ(1), but individual insertions might take O(n) time.

It means that it is not hard to have the same in numpy, with the same performance, the only difference is that the growing have to be done manually and that the push_back function has to be called as something like

    x, x_len = push_back(x, x_len, 10)

with push_back being defined like

    def push_back(x, x_len, value):
        if len(x) == x_len:
            x = np.concatenate((x, np.empty(max(len(x), 1)))
        x[x_len] = value
        return x, x_len + 1

so that the x variable is supposed to be updated after each insertion.

tgb · on April 22, 2017

I think the standard recommendation is to know how much you need to allocate and allocate it in advance. You can call numpy.zeros( size ) to get the appropriate array, then fill it with your actual data, for example. If you don't know how much you need, you can try to over-estimate it and then shrink the array (i.e. allocate one of the right size and copy) after your done.

tempay · on April 22, 2017

Where are you filling your matrix from?

codezero · on April 21, 2017

I remember being really intrigued when I ran into Fortran programs that used negative array indices. It's pretty handy in a lot of situations.

Cyph0n · on April 21, 2017

It's commonly used in Python too:

>> last_elem = arr[-1]

>> rev_arr = arr[::-1] # Reverse array

mbauman · on April 21, 2017

That is something completely different. Where Fortran allows negative indexing is when you initialize your array such the the indices are offset in the same manner as described in the linked article.

Cyph0n · on April 21, 2017

I see. I'm obviously not familiar with Fortran :P

rch · on April 21, 2017

Interesting. It would be nice to surface this approach as a simple spreadsheet app so people could develop an intuitive feel for the concept.

jacobolus · on April 21, 2017

> surface this approach as a simple spreadsheet app

I don’t understand what you’re proposing. You want a spreadsheet where the row and column headings have arbitrary starting point?

rch · on April 21, 2017

I'm thinking about snipping out regions, creating views, and propagating indices across sheets.

reggieband · on April 21, 2017

This is very interesting to me but not really revolutionary since I am so used to languages like javascript where array indices don't matter the same way they do in C. However, I do see the utility in being able to generate new "views" of an existing array by offsetting all indices by a set amount.

First thing that comes to mind is time-series data - like a sliding window based on some concept of "now" where indices can be offset by the current date/time. (e.g. timeSeries[0] == now, timeSeries[-1] == one second ago, timeSeries[1] == one second in the future, etc.)

I actually find these linear-algebra based languages even more mind-bending than functional based languages.

JeremyBanks · on April 22, 2017

If you do this in JavaScript, you won't be able to use many standard array operations or get standard array performance. The specification explicitly only supports non nonnegative indices for real arrays.

You're just using Object/map behavior, and if you're doing that with negative indices on an Array Object you're misusing it.

killin_dan · on April 22, 2017

This is possible in ruby too, probably in a lot of languages.

I've rewritten someobject#[] plenty of times to have cleaner code.

one-more-minute · on April 22, 2017

One difference in other languages is that the extra indirection will add overhead. You have to (at the very least) hop through an extra function call, and even that will become significant if you're writing an image kernel.

Julia can inline through all the `getindex!` functions so you get the same performance as with a hand-coded C or Fortran array.

jksmith · on April 22, 2017

All Wirth languages did this. Great for clarity when a specific mapping is needed. Damn I miss me some Modula 2.

evacchi · on April 22, 2017

Nim is very much Oberon inspired and supports this

jksmith · on April 22, 2017

Been keeping up with Nimrod for a long time, and would love to make some money with it. Just doesn't have the libs I need. If you want to see Nim get some traction, have it spit out some golang.