A Visual Intro to NumPy and Data Representation

lejar · on June 26, 2019

Nice overview! One thing I think you should add, which I find immensely useful is the reordering of arrays using indexing.

Take for example:

    In [2]: numpy.array([1, 2, 3])[[0, 2, 1]]                                       
    Out[2]: array([1, 3, 2])

You index using a list and it gives you a view of the array with the new order (the underlying array is not changed and there is no copy being done).

quietbritishjim · on June 26, 2019

Using "fancy" indices like this does result in a copy because it can't be represented as a simple slice of the original matrix. A good explaination is here (it's from 2008 but still true):

https://scipy-cookbook.readthedocs.io/items/ViewsVsCopies.ht...

You can verify there's a copy by changing the new array after putting the result in a new variable (see above link for why this makes a difference) and verifying the old one is unchanged:

    >>> import numpy as np
    >>> x = np.array([1, 2, 3])
    >>> y = x[[0, 2, 1]]
    >>> y[0] = 3
    >>> y
    array([3, 3, 2])
    >>> x
    array([1, 2, 3])

Edit:

But a view can be based on a slice that includes a skip parameter, and in fact you even slice in multiple dimensions and it will still be a view. That is worth discussing in the article:

    >>> x = np.array([np.arange(7), np.arange(7)+1]*3)
    >>> y = x[4:1:-2, 1:5:2]
    >>> y
    array([[1, 3],
           [1, 3]])
    >>> y[0,0] = 99
    >>> x
    array([[ 0,  1,  2,  3,  4,  5,  6],
           [ 1,  2,  3,  4,  5,  6,  7],
           [ 0,  1,  2,  3,  4,  5,  6],
           [ 1,  2,  3,  4,  5,  6,  7],
           [ 0, 99,  2,  3,  4,  5,  6],
           [ 1,  2,  3,  4,  5,  6,  7]])

improbable22 · on June 26, 2019

A related fun fact, when slicing several dimensions:

    >>> a = np.arange(9).reshape(3,3) # a matrix
    >>> a[0:3,0:3]          # ranges are treated independently
    array([[0, 1, 2],
           [3, 4, 5],
           [6, 7, 8]])
    >>> a[[0,1,2],[0,1,2]]  # but arrays are treated at once
    array([0, 4, 8])

ovi256 · on June 26, 2019

A copy-on-write mechanism triggered by `y[0] = 3` would look the same and pass the test you devised, so you can't eliminate the possibility that it exists.

A better way would be to track memory use. A copy being created by either `y = x[[0, 2, 1]]` or `y[0] = 3` would show as a memory increase.

jcims · on June 26, 2019

As an aside, one of my major challenges grokking numpy and pandas is the semantically dense syntax like the above. I know that the layers of bracing have an impact but it's difficult for me to tell where it is applied and/or described.

grenoire · on June 26, 2019

Pretty, but not particularly in-depth.

Also, nitpick but I can't hold it: Why isn't the MSE np.mean(np.square(predictions - labels)? That's even breez-ier!

manojlds · on June 26, 2019

I think it's generally done this way because of the way the formula is represented mathematically.

milliams · on June 26, 2019

I like this. One change I would make is on the aggregation and indexing section, change the representation of single values (as opposed to single-element arrays) to not be in a coloured box. It's important that the result of these operations is a different type.

pard68 · on June 26, 2019

Numpy was a huge boon in college. I had mostly gotten my homework process down to editing a LaTeX file with the csv files for my datasets and then when I compiled it would first crunch the numbers with Numpy, export it as Tex, and then build a pdf.

alanbernstein · on June 26, 2019

Care to share an example?

pard68 · on June 27, 2019

I might still have something. I didn't version control it, but it might be on Dropbox still.

dintech · on June 26, 2019

This is excellent. I'd love to see even more on Pandas.

dintech · on June 26, 2019

And now I see that you've already started one!

https://jalammar.github.io/gentle-visual-intro-to-data-analy...

iandanforth · on June 26, 2019

It would be good to mention the @ operator in the matrix multiplication section.

https://alysivji.github.io/python-matrix-multiplication-oper...

improbable22 · on June 26, 2019

A warning sign that your faith in 0-based indexing may be faltering -- catching yourself writing comments like this :)

    # element at the top right. i.e. (1, 2) aka (0, 1) in python
    A[0, 0] * B[0, 1] + A[0, 1] * B[1, 1]

iandanforth · on June 26, 2019

That's called the "Matlab Hangover"

1-6 · on June 28, 2019

Wow, this is so timely! I love the visual references. I'm still a little confused about the section on Matrix Indexing. Overall, great work!

Vaslo · on June 26, 2019

Good stuff! I'll definitely look for more from you!

tjpaudio · on June 26, 2019

Nice page, but unless you have never used software for math before, I am not sure it's very useful.

xvilka · on June 26, 2019

Would be nice to have something like this, but for Julia.

improbable22 · on June 26, 2019

There is this, though shorter: https://julia.guide/broadcasting