We want all that for probability theory. We want countably infinite so that we can discuss, say, the event that a coin never or always comes up heads. We don't want uncountably infinite because it would create a big mess in the theory.
Who should I read for an introduction to that "big mess"?
So many of the cases I'm actually interested in require real valued (continuous) inputs and outcomes. While one can quantize these to create an approximation of something countable, it seems like that a much simpler theory would be possible if it was built from the ground up to handle these common non-discrete real-world cases, rather than trying to shoe-horn them into standard probability theory. I was hoping this might be the direction that Terry was headed, with the emphasis on probabilistic methods.
I think you're misunderstanding. Standard basic (measure-theoretic) probability theory is designed to handle common non-discrete real-world cases: continuous random variables like height, temperature, etc. They're not approximated by something countable; instead theorems proving that they have the sort of behavior you'd want are established by proving them for a countable approximation, then taking limits. It is exactly like integration: prove things for step functions, then make the steps infinitely thin. The theory is clean and straightforward.
Here's where it gets less basic: say you want to look at temperature over time, but you don't want to model temperature as a variable that's measured daily, or hourly, or even every second (secondly?), but you want to model it as a process that evolves in continuous time. That's where the theory gets messy. Not necessarily at the level of a user of this theory, but definitely at the level of proving that the math you want to use is allowed.
If you actually need an introduction to that sort of probability, Lawrence Evans (Berkeley) has some old lecture notes aimed at undergrads [1] that he turned into a book [2]. If (more likely) you want standard measure-theoretic probability theory (as opposed to what's taught to undergraduates), David Pollard's book is pretty good [3].
I'm sure that @graycat will scoff at those recommendations, but his reading list would be considered excessively hardcore and time consuming even for a graduate student in math, which I'm assuming you're not.
ps: after looking at it again, the intro in Evans's notes is as gentle as it's going to get, so start there. And (as you'll find out, unless you're some sort of savant) this shit's hard. If you actually want to understand this stuff, graduate coursework is probably the only practical way to do it.
You're right, and I misunderstood. I'm a computer programmer trying to rapidly learn enough about probability theory to be able to communicate with some theoretical statisticians regarding causality, confounding, and longitudinal data analysis. I have a decent intuitive grasp of what's happening, but no ability to convey anything with proper terminology. I could certainly use a better grasp of the basics, and I'm trying to figure out where to start. Thanks for the links.
Okay, for that stuff probability theory is too abstract. For basic basics, Edward Tufte has a $2 ebook that's pretty good: Data Analysis For Politics And Policy[1] and for terminology in causality, Rubin has a short open access paper[2]. For a freshman-stats level treatment, OpenStax college's book looks legitimate but I haven't actually read it carefully[3].
Thanks for your references. I have always thought that many statistics/probability based explanations are adhoc. They are adhoc because they explain pre-selected facts; and their predictions are just a confirming instances (cf. positive vs confirming instance from Larry Laudan, a philosopher of science). Your point "definitely at the level of proving that the math you want to use is allowed" hints in that direction.
Probability's hard to teach. You can give informal statements and kind of wave your hands at the underlying theory, or you can give a rigorous well-founded treatment that's intellectually satisfying. But the rigorous foundation uses math that's a step or two beyond what undergraduate math majors learn. It's not necessarily harder than what math majors see, but it's a ton of extra material to teach, when the payoff is that you can now (after half a year) prove that the conditional probability is well-defined as
Pr(A | B) = Pr(A and B) / Pr(B)
instead of just telling it to students and drawing a few diagrams that drive the point home.
But I think it's more pragmatic than ad hoc. Any deep theory of probability that doesn't deliver
Pr(A | B) = Pr(A and B) / Pr(B)
is basically useless since that's how random phenomena seem to behave in real life. Having a deeper theory is useful because it allows you to derive other implications of that theory and makes certain calculations much easier. But if the theory disagrees with phenomena that we want to model, that can be a problem.
> I'm sure that @graycat will
scoff at those
recommendations, but his
reading list would be
considered excessively hardcore
and time consuming
even for a graduate student
in math, which I'm
assuming you're not.
Probability and stochastic
processes based on measure
theory are not very popular
in the US, even in graduate
math departments.
Uh, scoff, scoff. Okay?
The full measure theoretic details
of stochastic processes in
continuous time can be a bit
of a challenge. That topic can
be important, e.g., for Brownian
motion and stochastic differential
equations used in mathematical
finance. Of course, there is
Karatzas and Shreve,
Brownian Motion and Stochastic Calculus
and Chung and Williams,
Introduction to Stochastic Integration.
And there's much more, especially from
Russia and France.
But, otherwise, usually in practice,
what people are interested in is either
(1) second order stationary stochastic
processes, e.g., as in electronic or
acoustical signals and noise. There
are commonly interested in power spectral
estimation, digital filtering, maybe
Wiener filtering, the fast Fourier transform,
etc. or (2) what is in, say, Cinlar,
Introduction to Stochastic Processes.
In Cinlar,
for the continuous time case,
get a good introduction to
the Poisson process (the vanilla
arrival process, e.g.,
like clicks at a Geiger counter,
new sessions
at a Web site, and much more).
Also get
what else people are mostly interested
in in practice, Markov processes
in discrete time with a
discrete state space (that is, the
values are discrete).
The case of Markov processes in
continuous time and discrete state
space is not so tough if the
jumps are driven by just a Poisson
process. But there is still more
in Cinlar.
And there are other good texts on
stochastic processes.
For (1), look at some of the
texts used by EEs. The measure
theory approach is in
Doob, Stochastic Processes,
Loeve, Probability Theory,
and several more texts by quite
good authors.
E.g., without measure theory,
can just dive in via
Blackman and Tukey, The Measurement
of Power Spectra ....
With all these sources,
are able to get by without
measure theory.
Yes, without measure theory,
at some places will have to
not ask to understand too much
and just skip over some details
to get back to the applied stuff.
But for measure theory, the
Durrett text seems to get
a student to that unusually
quickly.
For more, at the MIT Web site,
there is an on-line course
in mathematical finance that
avoids measure theory.
They want to use the
Radon-Nikodym theorem
and Ito integration but
still avoid measure theory.
Uh, the Radon-Nikodym theorem
is a generalization of the
fundamental theorem of calculus.
Once see it, it's dirt simple,
but a good proof takes a bit
or follow von Neumann's proof
that knocks it all off in one
stroke (it's in
Rudin, Real and Complex Analysis).
All of this is fine for real-valued inputs and outcomes. It's the number of events (coin flips, measurements, etc) that we're restricting to be countable.
No, the number of events
is necessarily also finite or
uncountable.
Indeed, it is a nice exercise
that there are no countably
infinite sigma algebras
(extra credit for a solution!).
It's just can't take uncountably
many events, take their union,
and assume that the result is
also an event.
No problem. For the
foundations I outlined,
can work just fine with
continuous functions,
measurable functions,
stochastic processes,
random variables taking
values on the real line,
in the complex plane,
in finite dimensional
real or complex vector
spaces with, say,
the usual topology,
Hilbert and Banach spaces,
etc. Can do multi-dimensional
Markov processes, and much more.
And you can have each point on the
real line an event. Fine.
But you just can't take
the uncountable union of
any set of such events and assume
that the result is also an event.
As for the event a random variable
takes a value >= 0? Fine.
Or, let the Borel subsets of the
real line be the smallest sigma algebra
that contains all the open sets,
e.g., all the open intervals.
Then for Borel set A and
real valued random variable X,
can ask for the probability
X is in A.
I believe you will find that
you will have a solid foundation
for what you want.
To see all this stuff, need more
than just the sparse definitions
and, instead, need an actual
text and maybe a course. Recently
looked at the on-line materials
from MIT and didn't see such a course.
Graduate probability is not
all that popular in the US;
stochastic processes in continuous
time is still less popular.
To study graduate probability,
I'd recommend a good
undergraduate major in pure math
with good coverage of, say,
W. Rudin, Principles of Mathematical
Analysis. Then good coverage of
linear algebra from more than
one of the best known texts.
Likely also spend as much time
as you can in Halmos, Finite
Dimensional Vector Spaces.
E.g., at one time, Halmos,
Rudin, and Spivak, Calculus
on Manifolds were the three
main texts for Harvard's
famous Math 55.
Get good with proving the theorems.
I also recommend Fleming,
Functions of Several Variables.
Then, sure, Royden, Real Analysis.
Couldn't be prettier.
If not in a hurry, then
the real half of Rudin's
Real and Complex Analysis.
Especially if you like Fourier
theory!
Then of the probability books,
I believe that the nicest, first
book is L. Breiman, Probability.
He wrote that before he went
consulting and came back and
did CART and random forests.
Next, K. Chung,
A Course in Probability Theory.
Next, J. Neveu, Mathematical
Foundations of the Calculus of Probability.
Then, Loeve, Probability Theory.
Loeve is huge -- mostly just
use it for reference or browse.
E.g., it has sufficient statistics
and stationary stochastic processes
(the EEs love that) IIRC
not in the other
books.
IIRC, both Breiman and Neveu
were Loeve students at Berkeley.
If do well with Breiman, then
for graduate probability,
likely can stop there.
Else, Chung will then be
fast and easy reading and
reinforce what you learned in
Breiman. Neveu is elegant;
my favorite, but deserve extra
credit for each workable exercise
you can find, not actually work,
you understand, just find! Sure,
some of the exercises are terrific,
half a course in a few lines of
an exercise. E.g., he has one
of those on statistical decision
theory or some such. And see the
Tulcea material in the back.
Then there's more that you can
do on stochastic processes,
potential theory via Brownian
motion,
e.g., for mathematical finance,
stochastic optimal control,
and more.
Who should I read for an introduction to that "big mess"?
So many of the cases I'm actually interested in require real valued (continuous) inputs and outcomes. While one can quantize these to create an approximation of something countable, it seems like that a much simpler theory would be possible if it was built from the ground up to handle these common non-discrete real-world cases, rather than trying to shoe-horn them into standard probability theory. I was hoping this might be the direction that Terry was headed, with the emphasis on probabilistic methods.