I, too, spent a long time staring at expressions like
“half-invert p(x, v) to get v(x, p) s.t.
p(x, v(x, q)) = q
then the Legendre transform is
H(x, p) = p v(x, p) – L(x, v(x, p))”
And I did come to one of the same conclusions as this article, which is that if we're talking pure mathematics, these “thermodynamic” expressions like (∂L/∂v)_x, (∂L/∂p)_x are deeply easy to get confused about and in fact you should just say “the derivative of the function with respect to its first argument holding the other arguments constant” and therefore introduce different functions which compute the same value under different symbols, say
Λ(x, p) = L(x, v(x, p))
∂₂Λ = ∂₂L ∂₂v
so that you're not scratching your head about “why is the derivative of L with respect to v showing up here, v is now a function isn't it?”
The formulation of first f
derivatives as inverse functions is new to me but makes sense.
However, I do think that we do even worse with linear algebra. I believe I could walk up to any college senior in physics and they wouldn't know that “the determinant is the product of the eigenvalues,” but this should be as well-known as “the mitochondria are the powerhouse of the cell.” I think this is because we introduce a complicated way to calculate determinants and then we use determinants to calculate the eigenvalues?
Agreed, the way thermodynamics is often taught is such a mess.
My personal and controversial [0] take is that the free energy should really be seen as the Legendre transform of the entropy, not of the energy.
I know it is ultimately semantics, but this viewpoint makes the passage from the micro-canonical to the canonical ensemble so much nicer. In particular, the saddle point approximation for the canonical partition function makes it natural that the ensembles are equivalent in the thermodynamic limit... through a Legendre transform!
Bonus corollary: the statement mentioned in the blog about the derivatives being each other's inverses is just saying that T(E) and E(T) in respectively the micro-canonical and the canonical ensemble define the same relation between E and T.
I'm not even sure if it makes sense to view it as a Legendre transform. Or well, it is one, I'm just not sure if it's a good definition.
You get the free energy for 'free' if you use a Lagrange multiplier to maximize entropy while keeping the energy fixed (temperature is the inverse of that Lagrange parameter). In one fell swoop this shows why temperature is a thing and why minimizing the free energy is important.
The Legendre transform just returns the value of the constraint from the minimized function, but at that point why bother?
I do agree that it makes more sense to see the fee energy as a Legendre transform of the entropy, that's kind of what you end up doing if you minimize entropy in this way.
I’ve taken an excellent graduate class in thermodynamics, and I’ve never seen a definition of enthalpy that is both coherent and involves Legendre transforms.
Here’s my definition: the internal energy of the stuff in a box is a useful quantity, and one can call it E. But E is the energy needed to assemble the stuff in the box if you start with an empty box of the appropriate volume. This makes physical sense, and it’s perfectly fine for calculating things related to, say, anything that happens in a vacuum. Or anything that happens in a rigid box.
But we live in a very large atmosphere, we mostly do experiments at constant pressure. If you take a flexible baggy and assemble its contents, you need the energy to make the contents (that’s E) and also some extra energy to displace air to make room for the contents, and the latter part requires extra energy equal to P (the constant atmospheric pressure) times V (the volume of the bag).
So we give E + PV the fancy name “enthalpy”, and it turns out to be useful. Maybe there’s a Legendre transform somewhere, but I’ve never seen any use for it other than to try, poorly, to convince someone of its existence. And it’s genuinely awkward — energy and enthalpy are fairly general physical quantities that one could, in principle, measure, and one already needs to start constraining the system to think of them as functions of anything sensible.
And then the HVAC industry seems to have borrowed the term “enthalpy” to mean, roughly, “temperature and humidity”. And “energy” means “temperature” or maybe “heat” but probably actually means “enthalpy, but only the thermal part and not the chemical part”. You’ll be lucky to find any math at all, let alone a Legendre transformation. Don’t get me started on “pressure”.
My ranting is how most of thermo starts with U(S, V, N) whereas I would prefer with S(U, V, N). Either way the differential reads:
dU = T dS - p dV + mu dN
which, if we follow the standard way, is just saying that
T = (dU/dS)_{V, N}
- p = (dU/dV)_{S, N}
mu = (dU/dN)_{S, V}
where the derivatives are really just partial derivatives so I really should have written them with a curly d. Mathematically enthalpy is the Legendre transform where we eliminate V in favor of p:
H(S, p, N) = (U(S, V, N) + p V)_{V = V*(p)}
where V* extremizes the term in parentheses, i.e.
(dU/dV) (S, V*(p), N) + p = 0
Of course this equation is just the above definition of the pressure at constant V, but now we are meant to solve this equation to find V* as a function of p (and S and N), and then plug that back in to get H as a function of p.
The Legendre transform property ensures that:
(dH/dp)_{S,N} = V*(p)
You should note that taking the derivative of H wrt p would in principle also induce a term proportional to dV* / dp since in the definition of H there is V* which depends on p. But that term cancels out because V* is an extremum! So that is why dH/dp gives just V*(p) which is the inverse function of p(V) that you get from (minus) dU/dV. This is the "inverse of derivatives" property of the Legendre transform mentioned in the original post.
Clearly the enthalpy is the nicer gadget to have if you work at constant pressure since then one of its arguments is simply held constant.
> I think this is because we introduce a complicated way to calculate determinants and then we use determinants to calculate the eigenvalues?
Yes, the determinant should be taught and defined as the volume of the parallelepiped in n-dimensions defined by the columns of the given square matrix. This perspective makes it immediately obvious that the eigenvalues scale the parallelepiped in each of its dimensions (a basis of eigenvectors makes it even simpler). Of course the volume (determinant) must be the product of these scaling factors (eigenvalues)! Since algebra is too convenient for solving problems, this geometric intuition is often an afterthought if it's even taught at all.
As anecdata this was taught in first year university mathematics for math, engineering, physics, chemistry, etc. students in 1981 in all three universities in Perth Western Australia aka "the most isolated city in the world" [1]
It never occurred to me that geometric parallels would not be given in linear algebra courses.
I know programmers like to blame mathematicians for writing functions with lots of one letter variable names, but it's the physicists who insist on doing so without defining any of them.
You want to know what V is? It's clearly the potential, we've defined it six papers ago! Oh you were wondering what it's type was, well it's usually a scalar field. No, don't write the parameter as t that changes the whole meaning!
Sussman has a great guest lecture that mentions exactly these sorts of issues, and that he found it much easier to verify work he and his grad students did in mathematical physics after developing the "mechanics programming" notation he explains in Structure and Interpretarion of Classical Mechanics and Functional Differential Geometry.
It’s ironic to see the link to that post which implicitly assumes that functions are defined on R^n, under a post about Legendre transform, whose point is that functions are defined on state spaces of systems, and only become represented by functions on R^n once we parameterize the state spaces by state variables. So the value of f at a given point doesn’t depend on what your favorite letter (aka state variable) is, but the value of f’ certainly does. And Legendre transform, as is actually explained, albeit cryptically, in Goldstein comes from the fact that we have 2d state space - phase space of 1d system with config variable y, velocity variable x, and momentum variable u, - on which we have have non-linearly related variables x and u.
In Legendre transform, what we have (the y variable is a red herring, and I will ignore it; everything happens "pointwise in y"), is curve in u-x plane, which we lift to u-x-z space in two ways -- that is, we find functions f and g defined for the points on that curve such that: 1) if the curve is parametrized by x, so that f is a function of x, then df/dx=u 2)if the curve is parametrized by u, so that g is a function of u, then dg/dx = u. (Why do we want this? Presumably because when x is velocity and f is energy, u is momentum, and we want g to have same property going back. And yes, there are conditions when one can parametrize a curve by one of the coordinates, either locally, or globally; one such is that u is monotone increasing function of x - that corresponds to convexity of f.) Of course now "derivatives of f and g are inverse" is tautological.
If we already know f(x), but don't know neither u nor g we could set u = df/dx and try to compute g. Or we could do it the way Goldstein does it: dg/du=x, so dg=xdu (this is an ODE), integrating it "by parts" g=int x du = xu - int u dx = xu - f.
(In advanced speak, u-x curve is a Lagrangian in the u-x plane, which is symplectic as every sum of vector space and its dual is; the functions f and g correspond to lifts of this Lagrangian to Legendrians based on choice of "canonical" 1-forms udx and xdu, respectively,so that df - udx=0 and g-xdu=0.)
that link is about something more fundamental wrt the underlying calculus and how the differential notation used in these explanations can often be confusing especially when you have differentials whose variable is a function of other variables. That's all its trying to clear up.
Thank you so much for this link. I was having trouble following some of the notation that came up with automatic differentiation and I think this clears it up.
Very nice article, kudos to the author. The inverse relation between Jacobians generalizes to a duality statement via the symplectic structure of the configuration space; the section https://en.wikipedia.org/wiki/Hamiltonian_mechanics#From_sym... on Wikipedia has some details. This symplectic duality is my preferred way of looking at Hamiltonian-Lagrangian transitions.
It’s neat! To be fair, as a physicist, I did not understand the Legrendre transform essentially until taking convex optimization (where it is known as the Fenchel conjugate).
Many sources, but all of them are reasonable and give a constructive definition that actually explains what it does: we can characterize a function either by its graph, or its supporting hyperplanes (when it is a closed, convex function).
While the observation is almost silly, it has very deep consequences for different characterizations of problems and other constructions!
Helliwell & Sahakian Modern Classical Mechanics at least seems to do a much better job of explaining the Legendre transform than Goldstein, but it still never mentions the convexity requirement on f.
I feel like understanding the general convex conjugate and then seeing the Legendre transform as a special case is almost more intuitive.
How beautiful that this blog post was made years after I was struggling in thermodynamics to understand these transforms. Now if someone could make a post for the Laplace Transform with the audience being people familiar with the fourier transform.
Really enjoyed this post, it is both understandable and revelatory. I had some introduction to the concepts here from calculus and physics classes but my mathematics interests are more along the branches of Abstract Algebra than Analysis so I didn't expect to enjoy it much, and I wonder if I would have hung on for as long if it didn't have the poor exposition first (and the promise of a better presentation).
I wonder if more math-related material was given in this "look how confusing, now wait look at it this way" would be more engaging, overall. Perhaps replacing the first part with a demonstration instead of mocking established representations. But maybe there is something to the "you're not alone, this way of looking at it is confusing and hand-wavy" even if done deliberately, just to give comfort to students making sense of a concept for the first time. Especially with math, I think mamy people would be more eager to learn it if that initial uncomfortable and confusing stage is considered normal for everyone.
Also, side question, is the content of this post considered Tropical Mathematics?
Legendre transform moves a ruler (tangent line) along a convex function, measuring how much "lag" the function accumulated while "accelerating" up to its velocity at a certain moment, relative to having constant velocity for its entire history.
The larger the function's 2nd derivative is, the smaller the transform value is.
And vice versa. Since the tranform is written in terms of the original function's derivative, not it's "x value", the derivative of the transform is inversely proportional to the derivative of the function
I found this explanation quite bad. Poorly motivated and making a priori regularity assumptions that are not necessary. The quoted explanation by Arnold is much better.
The formulation of first f derivatives as inverse functions is new to me but makes sense.
However, I do think that we do even worse with linear algebra. I believe I could walk up to any college senior in physics and they wouldn't know that “the determinant is the product of the eigenvalues,” but this should be as well-known as “the mitochondria are the powerhouse of the cell.” I think this is because we introduce a complicated way to calculate determinants and then we use determinants to calculate the eigenvalues?