With meager assumptions and
a standard set up,
we are in a vector space
and want a meaning of distance,
that is, a vector space norm.
The three norms people think of
first are L1, L2, and L-infinity:
L1 is from absolute values.
L2 is from squares.
And L3 is from the absolute value
of the largest value (i.e.,
the worst case).
But in addition it would be nice
if the vector space with a norm
was also an inner product space
and the norm from the inner product.
Then, right, bingo, presto, for
the standard inner product the norm
we get is L2.
Why an inner product space? Well,
with a lot of generality and
meager assumptions, we have a
Hilbert space, that is,
a complete inner product space.
The core of the proof of completeness
is just the Minkowski inequality.
Being in a Hilbert space has a
lot of advantages: E.g., we
get orthogonality and
can take projections and, thus,
get as close as possible in our L2
norm. We
like projections, e.g., the
Pythagorean theorem. E.g., in
regression in statistics,
we like that the
total sum of squares is the
sum of the regression sum of
squares and the error sum
of squares
right, the Pythagorean theorem.
We have some nice separation
results. We can use Fourier
theory. And there's more.
And there are some good convergence
results: If we converge in L2,
then we also converge in other
good ways.
One reason for liking a Hilbert
space is that the L2 real valued
random variables form a Hilbert
space, and there convergence in L2
means almost sure convergence
(the best kind) of at least
a subsequence and, often in practice,
the whole sequence. So, we
connect nicely with measure theory.
We have some representation
results: A linear operator
on a Hilbert space is just
a point in the space
applied with the inner product.
We like linear operators and
like knowing that on a Hilbert
space they are so simple.
Working with L1 and L-infinity
is generally much less pleasant.
That is, we really like Hilbert
space.
Net, we rush to a Hilbert space
and its L2 norm from its
inner product whenever we can.
You get to do projections
as in the Pythagorean theorem.
The coefficients you need in the
projections are just the
values of some inner products.
With random variables, those
coefficients are covariances,
that is, much the same as
correlations, that commonly
can estimate from data.
In the multivariate Gaussian
case, uncorrelated implies
independence.
Fourier theory is easier in
L2 than in L1. E.g., in
classic Fourier series, the
error in the approximation
is in L2 and is from the
L2 orthogonality of the
harmonics.
Yes, L-infinity can
also be nice: The
uniform limit of a sequence
of continuous functions
is continuous.
Or, with L2, often get a
Hilbert space but with
L1 or L-infinity usually
get at best just a Banach space --
that is, a complete, normed
vector space. Then, yes, can
get the Hahn-Banach theorem,
but the same thing in Hilbert
space is easier.
There is a sense in which L1 and
L-infinity are duals of
each other, but L2 is self-dual
which is nicer.
Filling in all these details and
more is part of functional analysis
101. There tough to miss at least
three books of W. Rudin:
Principles of Mathematical Analysis,
Real and Complex Analysis,
and Functional Analysis.
There's more, but I've
got some bugs to get out
of the software of my
Web pages!
I like the question -- asked
it myself at the NIST early in my career.
The answer I gave here is better
than what people told me then.
I've indicated likely most of the
main points, but my answer here is
rough and ready (I typed too fast),
and a quite polished answer is
also possible -- I just don't have
time today
to dig out my grad school
course notes, scan through Rudin,
Dunford and Schwartz,
Kolmogorov and Fomin, much of
digital filtering, much of
multi-variate statistics, etc.
I intuit that you're getting at the real answer with the self-dual. That makes a lot of sense. Also, from a practical perspective L2 is very nice because it causes the problem of error reduction to be quadratic, so it scales well.
No: If what you want is the L-infinity
norm, then go for it. A standard place
for that is numerical approximations of
special functions -- want guarantees on
the worst case error. And there is some
math to help achieve that. It's
sometimes called Chebyshev approximation.
But, in practice, the usual situation,
e.g., signal processing, multi-variate
statistics, there's no good reason
not to use L2 and many biggie reasons
to use it. E.g., for a given box
of data, commonly the better tools in
L2 just let you do better.
Or
to the customer: "If you will go for
a good L2 approximation, then we
are in good shape. If you insist
on L1 or L-infinity, then we will
need a lot more data and still
won't do as well.".
Again, a biggie example is just
classic Fourier series. Sure,
if you are really concerned about
the Gibbs phenomenon, then maybe
work on that. Otherwise, L2 is the
place to be.
E.g., L1 and L-infinity can commonly
take you into linear programming.
Generally you will be much happier
with the tools available to you
in L2.
Again, just now I just don't
have time for a more full,
complete, and polished explanation.
A really good explanation would
require much of a good ugrad
and Master's in math, with
concentration on analysis and
a wide range of applications.
I've been there, done that but
just don't have time to
write out even a good summary of
all that material here.
The three norms people think of first are L1, L2, and L-infinity: L1 is from absolute values. L2 is from squares. And L3 is from the absolute value of the largest value (i.e., the worst case).
But in addition it would be nice if the vector space with a norm was also an inner product space and the norm from the inner product. Then, right, bingo, presto, for the standard inner product the norm we get is L2.
Why an inner product space? Well, with a lot of generality and meager assumptions, we have a Hilbert space, that is, a complete inner product space. The core of the proof of completeness is just the Minkowski inequality.
Being in a Hilbert space has a lot of advantages: E.g., we get orthogonality and can take projections and, thus, get as close as possible in our L2 norm. We like projections, e.g., the Pythagorean theorem. E.g., in regression in statistics, we like that the
total sum of squares is the sum of the regression sum of squares and the error sum of squares
right, the Pythagorean theorem.
We have some nice separation results. We can use Fourier theory. And there's more.
And there are some good convergence results: If we converge in L2, then we also converge in other good ways.
One reason for liking a Hilbert space is that the L2 real valued random variables form a Hilbert space, and there convergence in L2 means almost sure convergence (the best kind) of at least a subsequence and, often in practice, the whole sequence. So, we connect nicely with measure theory.
We have some representation results: A linear operator on a Hilbert space is just a point in the space applied with the inner product. We like linear operators and like knowing that on a Hilbert space they are so simple.
Working with L1 and L-infinity is generally much less pleasant. That is, we really like Hilbert space.
Net, we rush to a Hilbert space and its L2 norm from its inner product whenever we can.