Hacker News new | past | comments | ask | show | jobs | submit login
Knowing where you are: custom array indices in Julia (julialang.org)
98 points by jrevels on April 21, 2017 | hide | past | favorite | 28 comments



I really like how Julia handles arrays.


Agreed. One big win here over numpy is that there's the `push!` function on Julia arrays (equivalent to the C++ vector's `push_back`).

Numpy arrays on the other hand are awkward to use in custom algorithms because once the memory is reserved, you can't grow the array at either end.


I think that is some kind of feature. Numpy tries hard to guarantee efficient access to its ndarrays and this is simpler to implement using fixed arrays and mostly you know your bounds at initialization time anyway.

Btw. you can grow them using hstack and vstack although I am not sure how efficient that is.


Growing with vstack or hstack is very slow. I have seen code where a big csv with time series data is parsed line by line the corresponding numpy array was grown using vstack for each row. worst case: quadratic runtime


That'd be so much faster using a generator and a NumPy.concatenate at the end. Orders of magnitude faster, if we're talking about a large CSV.


There are fixed arrays in Julia too, for those who absolutely must eke out that last 1% gain in performance


It ain't just a 1% gain. Use a Python list if you want to append/pop.


A deque is probably a better option for performance.


Depends on how much it grows. I'd actually expect very little difference for most append/pop situations. A deque is there for appendleft and popleft.


that may be so in Python, but have you checked the performance differences in Julia?


Yep, a problem I repeatedly run into with Numpy is that my data is too big to fit in memory as Python data, but can easily fit as Numpy matrices. It seems incrementally building Numpy matrices is very awkward and inefficient

edit: if anyone knows a solution to this, please let me know...


Under the hood this `push!` function have to do memory reallocation and copying the entire array when the array grows to be larger than some internal buffer, which happens each time the array becomes larger than a^n, where a is usually, but not necessary, selected to be equal to 2.

The amortized complexity is θ(1), but individual insertions might take O(n) time.

It means that it is not hard to have the same in numpy, with the same performance, the only difference is that the growing have to be done manually and that the push_back function has to be called as something like

    x, x_len = push_back(x, x_len, 10)
with push_back being defined like

    def push_back(x, x_len, value):
        if len(x) == x_len:
            x = np.concatenate((x, np.empty(max(len(x), 1)))
        x[x_len] = value
        return x, x_len + 1
            
so that the x variable is supposed to be updated after each insertion.


I think the standard recommendation is to know how much you need to allocate and allocate it in advance. You can call numpy.zeros( size ) to get the appropriate array, then fill it with your actual data, for example. If you don't know how much you need, you can try to over-estimate it and then shrink the array (i.e. allocate one of the right size and copy) after your done.


Where are you filling your matrix from?


I remember being really intrigued when I ran into Fortran programs that used negative array indices. It's pretty handy in a lot of situations.


It's commonly used in Python too:

>> last_elem = arr[-1]

>> rev_arr = arr[::-1] # Reverse array


That is something completely different. Where Fortran allows negative indexing is when you initialize your array such the the indices are offset in the same manner as described in the linked article.


I see. I'm obviously not familiar with Fortran :P


Interesting. It would be nice to surface this approach as a simple spreadsheet app so people could develop an intuitive feel for the concept.


> surface this approach as a simple spreadsheet app

I don’t understand what you’re proposing. You want a spreadsheet where the row and column headings have arbitrary starting point?


I'm thinking about snipping out regions, creating views, and propagating indices across sheets.


This is very interesting to me but not really revolutionary since I am so used to languages like javascript where array indices don't matter the same way they do in C. However, I do see the utility in being able to generate new "views" of an existing array by offsetting all indices by a set amount.

First thing that comes to mind is time-series data - like a sliding window based on some concept of "now" where indices can be offset by the current date/time. (e.g. timeSeries[0] == now, timeSeries[-1] == one second ago, timeSeries[1] == one second in the future, etc.)

I actually find these linear-algebra based languages even more mind-bending than functional based languages.


If you do this in JavaScript, you won't be able to use many standard array operations or get standard array performance. The specification explicitly only supports non nonnegative indices for real arrays.

You're just using Object/map behavior, and if you're doing that with negative indices on an Array Object you're misusing it.


This is possible in ruby too, probably in a lot of languages.

I've rewritten someobject#[] plenty of times to have cleaner code.


One difference in other languages is that the extra indirection will add overhead. You have to (at the very least) hop through an extra function call, and even that will become significant if you're writing an image kernel.

Julia can inline through all the `getindex!` functions so you get the same performance as with a hand-coded C or Fortran array.


All Wirth languages did this. Great for clarity when a specific mapping is needed. Damn I miss me some Modula 2.


Nim is very much Oberon inspired and supports this


Been keeping up with Nimrod for a long time, and would love to make some money with it. Just doesn't have the libs I need. If you want to see Nim get some traction, have it spit out some golang.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: