>In grad school I learned that it all makes beautiful sense. Is there a page you...

btilly · on Oct 27, 2020

Page 238 has the key intuition. But to really understand it you will need some build up to understand linear functions.

Matrices are simply a way to write down linear functions, and matrix multiplication is function composition. All of the algebraic properties of function composition are therefore true of matrices and vice versa.

That intuition also explains why matrices come up so often. For example consider Calculus. The key idea behind the differential calculus is that if y is close to y0 then f(y) is approximately f(y0) + f'(y0) (y - y0).

In the multivariable calculus the same thing is true. Except that f'(y0) is a linear function. Which means that when we write it down we have to write down a matrix.

saeranv · on Oct 27, 2020

Add me to the list of people who have the same question, but with a slightly different perspective.

I understand the interpretation of matrix multiplication as a linear combinations, but I would also like to understand what is occurring geometrically. If I matrix multiply a 2D vector <a, b> by the basis vectors <<0, 1>, <1, 0>>, then I understand I am "weighting" an x, and y unit vector (as defined by my basis) by the x-component and y-component of my <a, b> vector, and then adding them together. So geometrically, I'm stretching by a factor of a, and b, then summing to get my resulting vector.

So, the way I'm trying to think of it geometrically is: any matrix is just a representation of some kind of basis (i.e some kind of linear system). I picture this as a sort of orthogonal cartesian grid that we squish and stretch while still being a series of intersecting parallel lines.

Matrix multiplication, is therefore taking in some vector, and "weighting" the basis vectors represented by our matrix by the components of that vector, and then summing them. To put it another way, we're essentially mapping a vector into that new basis system represented by our matrix.

Is that correct? Is there a better way to try and think of matrix multiplication geometrically?

jimhefferon · on Oct 27, 2020

The subsection on matrix multiplication is on p 228. But, of course, a person can't just dip in, they need to be in the flow of the prior 227 pages.

jfarmer · on Oct 27, 2020

There are two relevant facts:

    1. Linear maps are determined by their behavior on basis elements
    2. The composition of linear maps is still linear

Together, this means you can work out how the composition of two linear maps affects the coefficients of each vector (relative to a fixed basis). If you do this, you get the "formula" for matrix multiplication.

Let V be an n-dimensional vector space. Let {e_1, e_2, e_3, ..., e_n} be a basis for V.

That means any v in V can be written

    v = a_1*e_1 + a_2*e_2 + ... + a_n*e_n

Let f: V → V be a linear map. Let's see what it means to apply f to v. We have

    f(v) = f(a_1*e_1 + a_2*e_2 + ... + a_n*e_n)
         = f(a_1*e_1) + f(a_2*e_2) + ... + f(a_n*e_n)
         = a_1*f(e_1) + a_2*f(e_2) + ... + a_n*f(e_n)

In other words, if we know the values of f(e_1), f(e_2), ..., f(e_n) then we can calculate the value of f(v) for any v.

Every choice of value for f(e_1), f(e_2), ... is valid and determines a unique linear map.

For n=2, use the standard basis where e_1 is (1,0) and e_2 is (0,1).

Write f(1,0) as (a,c) and write f(0,1) as (b,d). This is just "relabeling" the values f(1,0) — under the standard basis we know that f(1,0) looks like (a,c) for some values of (a,c).

Write v = (x,y) so that

    v = (x, y)
      = x*(1,0) + y*(0,1)
      = x*e_1 + y*e_2

Then we have:

    f(v) = f(x, y)
         = x*f(e_1) + y*f(e_2)
         = x*f(1,0) + y*f(0,1)
         = x*(a,c) + y*(b,d)
         = (x*a, x*c) + (y*b, y*d)
         = (x*a + y*b, x*c + y*d)

Or written using matrix notation:

    [ a b ] [x]   [x*a + y*b]
    [ c d ] [y] = [x*c + y*d]

So that's the "formula" for applying a matrix to a single vector. It's determined entirely by the four values (a,b,c,d), but really it's determined by the action of the map on the basis vectors.

Now let f,g be linear maps and calculate g(f(v)) in the same way. You'll get the "formula" for matrix multiplication.

In other words, matrix notation, matrix multiplication formulas, etc. are "just" compact ways of representing the behavior of linear maps.

commandlinefan · on Oct 27, 2020

I was afraid to ask the same question. I can do it, I can program it, and I can check to see if it's correct, but I can't for the life of me understand why somebody saw fit to describe matrix multiplication the way they did.

jordan_curve · on Oct 27, 2020

Matrix multiplication is the composition of linear maps. It’s sometimes lost in the more computational approach, but we can think of this geometrically. If you know the first matrix rotates the plan by some amount, and the 2nd rotates it in the same direction as well, then the product must be the rotation matrix that rotates by the sum of the original rotations. That’s a very simple example. Since every nonsingular matrix is just a change of basis, you can get a rich geometric understanding for multiplying matrices. Moreover when things get more complicated, we can use invariants like the determinant and the trace to help guide our intuition.

I’d highly suggest watching 3Blue1Brown’s videos on Linear Algebra. He won’t get you to understanding everything (you’ll need to sit down and do problems for that) but he will help you see what intuition is out there in a very beautiful way. He makes a very good point that often when we go through the computations without the geometric intuition, we can spend a ton of time crunching numbers to see results that should have been obvious.

mixedmath · on Oct 27, 2020

There is a subsection called matrix multiplication that is about this, and which explicitly mentions the proof of associativity both in the "clear" way and in the "slog through indices" way.