Linear Algebra, at least at my school, is taught pretty poorly. Instead of teaching the beauty of transformations, the course is boggled down in numerical nonsense and tedious calculations (who wants to find the inverse of a 3x3 matrix? Bueller? Bueller?). Only after learning Algebra and homomorphisms, isomorphisms and automorphisms did I appreciate the importance of linear transformations. Stuff like Singular Value Decomposition gets a lot more interesting once you know some basic Algebra. I suppose Linear can't get too abstract because non math majors have to take it, but starting from generalized ideas of transformations is a far better way to teach it imo.
That was exactly my experience. Struggled with matrices theory at uni doing some bullshit exercises but started to grasp the topic only when I needed to apply some linear transformation in a game
I think the situation has improved somewhat as visualization tools have become easier to use. We made this simple visual [1] to help people understand what they might get out of linear algebra, and it was easy enough for some statisticians to accomplish.
Nice site, but it's worth giving some info about yourself on the site and why I should trust your advice, given that these books are expensive.
In elementary machine learning, you give two options. You should really include introduction to statistical learning by the same folks who wrote ESL. It's a great book that covers the same ground as ESL but with less math.
At mine we have an "applied" linear algebra course for engineering students and the "normal" one that math people take.
I didn't pass the "normal" one but I think that's because I had another 300 level math course, capstone, and four other CS courses at the same time. I'm certain I wouldn't have passed the applied one, it looked very tedious and that's usually what gets me with homework.
It took until I started learning differential geometry in the form of General Relativity to arrive at this insight, even though I feel like the notion of a matrix as a linear map was drilled in pretty thoroughly. The notion of matrix multiplication as function composition was presented almost as an interesting side effect of matrix multiplication -- that is, multiplication by these rules came first, and, hey, look, they compose!
Personally I found the prospect of tensor algebra to be much more intuitive than either of these; with matrices thrown in mostly as a computational device. Even a vector (through the dot product) is just a linear function on other vectors, and the notion of function composition carries through to that and to higher-order tensors.
Covariance and contravariance are a little more complicated to completely grok, but for most applications in Euclidean space (where the metric is the identity function) the distinction is of more theoretical interest anyway.
A metric is a distance function. Defining a metric on a space is one of ways you create a topology.
I'm not sure what the parent means by the metric being the identity function, however. The Euclidean metric is basically the hypotenuse of a triangle parameterized by two vectors. The adjacent and opposite sides of the triangle are measured to be the Euclidean norm of each vector (their length), and the hypotenuse is the shortest distance between them.
The Euclidean metric is not the only metric - you can define distance however you'd like as long as it's consistent. But I'm not sure how the identity function works as a metric, because that would map a vector to another vector, not a scalar.
In differential geometry the metric [1] is a tensor that defines the relationship of vectors in the space to vectors in the tangent space. The identity function as a metric means that you are in a locally flat space where geodesics (the path taken by traveling in a given direction) are straight lines.
A metric in a traditional metric space is a global distance function; you can use the metric tensor in a Riemannian manifold to allow integration to find the distance between two points.
The metric in a vector space is a dot product. If you just have one vector space, it’s not that interesting then, but if you have many vector spaces all glued together (like a tangent space to a curved surface), then looking at how the dot product varies between nearby tangent spaces tells you a lot about the surface (Gaussian curvature and so on).
In my high school matrices were first taught in geometry class, starting with using matrices as affine transformations in 2-d and then 3-d, and using that to teach concepts like what eigenvectors/values are, the equivalence of matrix and function composition, etc.
That was taught right after a unit on complex numbers and trigonometry so that we could see the parallels between composing polynomial functions on complex numbers and composing affine transformations.
To this day I think that was one of the most beautiful and eye opening lessons I've had in mathematics.
In hindsight, I think I got lucky that the teachers who wrote the curriculum this way were math, physics, and comp sci masters/phd's who looked at their own educations and decided that geometry class was a great Trojan horse for linear algebra.
You certainly were lucky to be taught Linear Algebra in such a manner! I came to understand the importance of such an approach only after a lot of head-scratching and self-study. IMO, a beautiful and important branch of "Practical" Maths has been needlessly obscured by the pedantic formalism espoused by the teaching community. Linear Algebra SHOULD always be taught alongside Coordinate/Analytic Geometry and Trigonometry for proper intuition.
I found the book "Practical Linear Algebra: A Geometry Toolbox" very helpful in my study.
FWIW, I was told that matrices are linear maps pretty early on in my education. Are there any college level linear algebra / matrix calculations courses that don't tell students about that?
When I went through university the standard set of courses was a Calculus course that was mostly about derivatives, a second one that was mostly about integrals, a third Calculus course that was about multi-variable Calculus. That third course necessarily had to teach matrices, and taught it as rote calculations. There was a follow-up differential equations course which refreshed people's memories of matrices..as a rote calculation.
It was done this way because the multi-variable Calculus course was a prerequisite for a lot of physics+engineering courses. So a lot of students wanted to take that sequence. Differential equations were a prerequisite for some other advanced courses. Linear algebra was pretty much just for math majors.
It's possible if you learned matrices outside the context of a linear algebra class that you might be mystified about what's actually happening.
I took Linear Algebra the same semester I took Computer Graphics, which worked out really well for me - the first half of LA taught me everything I needed to know about transformation matrices, and the second half of CG covered 3D graphics in OpenGL. The first half of CG was all 2D graphics stuff, and the second half of LA was about eigenvectors/values - I've forgotten everything from that part of the class.
> FWIW, I was told that matrices are linear maps pretty early on in my education. Are there any college level linear algebra / matrix calculations courses that don't tell students about that?
I know they taught us about matrices in high school, but I don't recall them talking about any applications at all. I think the topic was pretty drained of context, just rote application of the rules for add/subtract/multiply/etc.
Sure they'll tell you that in passing, but won't really explain what that actually means. Certainly Linear Algebra 1 at college was a lot of apply this method to this object (that we'll call a matrix) to calculate this thing called a determinant. Don't worry about what it is or what it represents, just check if it's zero or not. If it's not zero apply this other method to calculate its inverse. Repeat.
Same here in the UK - a big chunk of the Oxford School Mathematics Project O-level syllabus in the early 80s comprised the various transformations and the matrix operations needed to create them.
I find Strang's text to be unnecessarily tedious. Both of Lang's LA textbooks (Intro to LA, and LA) both take linear maps as the core point of the text.
When I first was introduced to matrices (high school) it was in the context of systems of equations. Matrices were a shorthand for writing out the equations and happened to have interesting rules for addition etc. It took me a while to think about them as functions on their own right and not just tables. This post is my attempt to relearn them as functions which has helped me develop a much stronger intuition for linear algebra. That’s my motivation for this post and why I decided to work on it. Feedback is more than welcome.
What got me for a while was the concept of a tensor:
For example: What is a tensor?
Wrong way to answer it: Well, the number 5 is a tensor. So's a row vector. So's a column vector. So's the dot product and the cross product. So's a two-dimensional matrix. So's a four-dimensional matrix, just... don't ask me to write one on the board, eh? So's this Greek letter with smaller Greek letters arranged on its top right and bottom right. Literally anything you can think of is a tensor, now... try to find some conceptual unity.
Then coordinate-free fanaticism kicked in, robbing the purported explanations of any explanatory power in terms of practical applications of tensors. The only thing they could do was shift indices around.
What finally made it stick is decomposing every mathematical concept into three parts:
1. Intuition, or why we have the concept to begin with.
2. Definitions, or the axioms which "are" the concept in some formal sense.
3. Implementations, or how we write specific instances of the concept down, including things like the source code of software which implements the concept.
If you ask a mathematician a tensor is an element of a tensor product, just like a vector is an element of a vector space. This moves the question to "what is a tensor product", which you can think about as a way to turn bilinear maps into linear maps (this is an informal statement of the universal property of the tensor product, you also need a proof of existence of such an object, but it's easy for vector spaces and alright for modules after seeing enough algebra)
Crikey, I hope I never have to talk to that mathematician! That's a terse, unintuitive definition that isn't very helpful unless you're already familiar with the concepts. (Also maybe you meant linear maps into bilinear?)
Reminds me of the time an algebraist mentioned to me that he was working on profinite group theory. I asked what a profinite group was, and he immediately replied 'an inverse limit of an inverse system', with no follow up. Well thanks buddy, that really opened my eyes.
Math is just a much deeper topic than most others. The things people do in research level math can take a really long time to explain to a lay person because of the many layers of abstraction involved.
It is a very deep and specialised topic. However, there are ways to convey intuition to a 'mathematically mature' audience, and there are quick definitions that are correct but unenlightening. I much prefer the former :)
No, it turns bilinear maps into linear one! If you have three R-modules (one can read K-vector spaces if unfamiliar with modules) N,M,P and a bilinear map N×M→P then there is a unique linear map N⊗M→P compatible with the map N×M→N⊗M which is part of the structure of a tensor product. (What's really going on here in fancy terms is the so called Hom-Tensor adjunction because the _⊗M functor is adjoint to the Hom(M,_) functor, but just thinking about bilinear and linear maps is much clearer)
I think the ideas behind the coordiate-free formulation of tensor calculus make it relatively easy though.
A tensor is a function that takes an ordered set of N covariant vectors (i.e. row vectors) and M contravariant vectors (i.e. column vectors) and spits out a real number. It has to be linear in each of its arguments.
I'm pretty sure all the complicated transforms follow from that definition (though you may have to assume the Leibniz rule - I can't remember), and from ordinary calculus.
As a layman, the word "tensor" always intimidated me. As a programmer, I was surprised then when I found out that a tensor is just a multi-dimensional array (where the number of dimensions can be as small as 0). That was a concept I was already quite comfortable with.
That's a bit like saying a vector is 'a row of numbers'. Not incorrect, but missing the point. What matters is what vectors do. It's the properties like pointwise addition, scalar multiplication, and existence of an inverse that make vectors vectors.
You're confusing a tensor with its representation. Tensors are objects which must obey a certain set of rules. (Which rules depends on whether you're talking to a mathematician or a physicist.)
It's a nice article - you focus on matrices as a kind of operator that takes a vector as input and produces another vector. This is one side of the coin.
The other interpretation is that matrices are functions that take two arguments (a row vector and a column vector) and produce a real number. IMO this interpretation opens the door to deeper mathematics. It links in to the idea that a column vector is a functional on a row vector (and vice versa), giving you the notion of dual space, and ultimately leading on to differential forms. It also makes tensor analysis much more natural in general.
If you're going to attempt a definition like this you need some more conditions (approaching concepts like linearity, for instance). Otherwise you can have a black box like x^3y^3-u^3v^3 that takes in [u,v] and [x,y]^T and spits out a real number; that's not a matrix-y operation.
I'm describing a somewhat unusual way of thinking about vectors, matrices etc. At least, it's unusual from the perspective of someone with an engineering / CS background.
First think about row and column vectors. A row vector and a column vector can be combined via standard matrix multiplication to produce a real number. From that perspective, a row vector is a function that takes a column vector and returns a real number. Similarly, column vectors take row vectors as arguments and produce real numbers.
It turns out that row (column) vectors are the only linear functions on column (row) vectors. This result is known as the Reisz representation theorem. If I give you a linear function on a row vector, you can find a column vector so that computing my function is equivalent to calculating a matrix multiply with your column vector.
Now on to matrices. Matrices take one row vector and one column vector and produce a real number. I can feed a matrix a single argument - the row vector, say - so that it becomes a function that takes one more argument (the column vector) before it returns a real number. Sort of like currying in functional programming. But as we said, the only linear functions that map column vectors into real numbers are row vectors. So by feeding our matrix one row vector, we've produced another row vector. This is the "matrices transform vectors" perspective in the OP's article. But I think the "Matrices are linear functions" perspective is more general and more powerful.
This perspective of vectors, matrices, etc... as functions might seem needlessly convoluted. But I think it's the right way to think about these objects. Tricky concepts like the tensor product and vector space duality become relatively trivial once you come to see all these objects as functions.
I appreciate you trying to explain it, however I believe it would have really helped if you started from the examples where this way of thinking is useful: you mentioned tensor product and vector space duality, however unless someone is already familiar with these concepts (I'm not) then it does seem needlessly convoluted. Are there any practical applications of these concepts that you can describe?
I haven't been involved in abstract math in close to a decade, but I think it's a description of the (general) inner product. So, a generalization of the dot product. The classic dot product is that operation with the identity matrix. My understanding is that using matrices that way is very common in physics.
Yes they can. This follows from singular value decomposition. Let S be the matrix representation of a shear transformation. There exist rotation matrices R, B and a diagonal matrix D such that S = RDC, where C is the transpose of B. D is the matrix representation of a scaling transformation and R, B are the matrix representations of rotation transformations. Since S is a product of rotation and scaling matrices, its corresponding linear transformation is a composition of rotations and scalings.
It would ordinarily be weird to represent shear transformations using rotations and scalings because shear matrices are elementary. But it checks out.
OK, point taken. I considered "scaling" in a less general sense (scalar multiple of the unit matrix), while you want to allow arbitrary diagonal entries. My definition is to my knowledge the common one in linear algebra textbooks because in yours, the feasible maps depend on the chosen basis.
EDIT: To state my point more clearly: in textbooks, "scaling" is the linear map that is induced by the "scalar multiplication" in the definition of the vector space (that is why both terms start with "scal").
Same for the the orthogonal matrices, or the diagonal matrices, or the symmetric matrices, or the unit determinant matrices, or the singular matrices ... They are all sets of Lebesgue measure zero.
Orthogonal, diagonal, symmetric, and unit-determinant matrices are all sub-groups though, which makes them 'more special' then all shearing matrices.
Singular matrices are special in the sense that they keep the matrix monoid from being a group. My category theory isn't strong enough to characterize it, but this probably also has a name.
Edit: I think the singular matrices are the 'kernel' of the right adjoint of the forgetful functor from the category of groups to the category of monoids.
Though I must admit a lot of that sentence is my stringing together words I only vaguely know.
No, I wasn't, but I did confuse the terms. Shear can be done without the extra dimension. Skew transforms require the extra dimension, as does translation.
One of the things that always irked me about the term "linear transformation" is it doesn't include affline transformations, which is funny because back in elementary school, you learn that a "linear equation" looks like Mx + b. Of course, the article states the term "linearity" when talking vector spaces (or modules) means linearity in arguments, while the term linear for a child in school means "something like a line on graph paper", and this is yet another example of terminology in the way mathematics is taught, possibly for historical reasons, that leads to even more confusion.
PS. incase you didn't know, affline transformations are not linear:
f(x) = mx + b =>
f(x+y) = m(x+y) + b /= mx+b + my+b = f(x) + f(y),
f(cx) = c m x + b /= c(mx + b) = c f(x)
The matrix/function stuff is elementary enough that I understand it intuitively (I suck at math), although it's neat to be reminded that given a enough independent points you can reconstruct the function (this breaks a variety of bad ciphers, sometimes including ciphers that otherwise look strong).
The kernel post actually does some neat stuff with the kernel, which I found more intuitively accessible than (say) what Strang does with nullspaces.
I strongly disagree skipping determinants provides a more intuitive approach to linear algebra. I don't know your background, but I'd venture a guess you feel it does because the Laplace expansion formula for computing the determinant[1] feels uninspired and out of place.
The reason determinants are hard to teach (in my opinion) is because a rigorous derivation of their formula isn't possible without first teaching multilinear algebra and constructing the exterior algebra. Once you do those things, the natural geometric interpretation of the determinant basically falls onto your lap. But it's still very useful for e.g. computing eigenvalues and using the characteristic polynomial, so it's taught before that context can be formalized.
Professors shouldn't teach determinants in the context of matrices, at least not at first. That's heavily computation-focused, and the symbol pushing looks really unmotivated and strange to students. Instead they should teach the basis-free definition of determinants (i.e. focus on the linear map, not the matrix transformation representing the linear map for some basis). Then the determinant is "only" the volume of the image of the unit hypercube under the linear transformation, which is where the parallelepiped comes in. If the linear transformation is invertible, the unit hypercube is transformed from an n-dimensional cube into an n-dimensional parallelogram, from which you can geometrically see the way the linear map transforms the entire vector space it's defined over.
3Blue1Brown has a very good video on the geometry underlying the determinant[2]. For a more rigorous presentation which constructs the exterior algebra and derives the determinant formula using the wedge product, Noam Elkies has notes[3][4] for when he teaches Math 55A at Harvard. Incidentally Noam Elkies uses Axler's book, and while he obviously approves of it he's pretty upfront in asserting that the determinant should be taught anyway[5].
I agree, the way I still see determinant is as the 'volume scaling factor' of a linear transformation.
This means it makes sense that det(A) = 0 means A is non-invertible. It also makes a lot of sense when the jacobian pops up in the multi-dimensional chain rule.
Given the above, and the Cayley–Hamilton theorem, I never really had to know why the determinant was calculated the way it is. The above give enough of an interface to work with it.
It recently occurred to me that if you use that matrices represent linear functions, you don't have to do tedious math to prove that matrix multiplication is associative (that is, (A * B) * C = A * (B * C), which allows us to write A * B * C without brackets, since it doesn't matter how we place the brackets anyway).
For a matrix M, denote f_M(x) = M * x. Then f_{A * B} = f_A(f_B(x)) so that f_{(A * B) * C} = f_{A * B}(f_C(x)) = f_A(f_B(f_C(x))) and also f_{A * (B * C)} = f_A(f_{B * C}(x)) = f_A(f_B(f_C(x))).
So f_{(A * B) * C} = f_{A * (B * C)} = f_A(f_B(f_C(x)))
Nice! The illustrations + color coding for the vectors are very useful.
Here is a video tutorial that goes through some of the same topics (build up matrix product from the general principle of a linear function with vector inputs):
https://www.youtube.com/watch?v=WfrwVMTgrfc
Good, intuitive introduction to matrices. Next steps could be showing that there are infinitely many different matrix representations of a linear map (different from the polynomials) and they can be used for function spaces, too.
Thanks for the feedback. I go into this in the next post on eigenvectors here: https://www.dhruvonmath.com/2019/02/25/eigenvectors/. I start by discussing basis vectors which I believe is what you’re looking for in your comment.
I just skimmed the article quickly. Are there other ways to learn about matrices? If you don't treat them as linear applications, they are just boring grids of numbers and the matrices multiplication doesn't make any sense.
The basic equivalency is fine but what about all other things you can do with matrices but can’t do with functions? For example, what is the equivalent of transpose in functions? How about Eigen values or Gaussian elimination?
Nice article. That's how I learned matrices in high school in Germany. Maybe it's different here in the US, I'll have to take a look at my daughters' textbooks.
That would heavily depend on whether you are coming at it from a theoretical math p.o.v. or a more applied p.o.v.
Not that the applied approach should leave out the theory, because theoretical stuff like this article give a great and intuitive understanding of linear algebra. However, the more theoretical treatments should set up things like rings, modules, and even category theory that are much less useful from an applied perspective.
For the theoretical approach I've heard good things about 'linear algebra done right'. I imagine it is less appealing for the applied approach. All I can say is be wary of the 'shut up and calculate' mindset in linear algebra. Getting the ideas behind the concepts is essentially a shortcut to understanding linear algebra without any downsides.
So yes, this is equivalent to FinVect of the field of which entries in the matrices consist.
The difference is that here you construct the category from a simpler premise. To construct FinVect you need to include all set objects with structure satisfying some axioms.
The category of matrices is simply positive integers with as morphisms n x m matrices between the two integers. Composition is matrix multiplication.
Here [1] is a nice overview. If you can follow what is going on there, it is worth while looking at II, III and IV.
I suppose that technically, the 'arrows are matrices' definition rules out infinite dimensional vector spaces, but I'd guess that OP meant to include them.
An argument against would be to keep to a small category.
What could possibly be a more basic understanding of a matrix in mathematics? There’s a reason the first teach you Linear Algebra before anything else.
Hey! I wrote this article - “we” is referring to people who had a similar educational experience to me. I was introduced to matrices as a tool for solutions to systems of equations. I always wish I was taught the functional perspective from the beginning.