The really cool thing about a lot of machine learning algorithms is their synchronicity. There are usually multiple, independent rationales for choosing or developing a particular algorithm that end up saying very deep things about information theory.
For instance, actuaries, statisticians, and "neural network guys" (a combination of computer scientists, applied mathematicians & biologists before it really concretized into a discipline) all independently invented logistic regressions (and within the discipline, they usually got invented in a couple of different contexts before they were unified within the same framework). They are all "trying" to do different things in terms of how they were thinking about the problem, but they end up with the same model structure.
For instance, actuaries, statisticians, and "neural network guys" (a combination of computer scientists, applied mathematicians & biologists before it really concretized into a discipline) all independently invented logistic regressions (and within the discipline, they usually got invented in a couple of different contexts before they were unified within the same framework). They are all "trying" to do different things in terms of how they were thinking about the problem, but they end up with the same model structure.