BMW is a for-profit business. MIT is a non-profit educational institution.
I'll agree that copyright in the digital era is a mess, but the line in this case is fairly clear. Yes, fair use (regarding parodies/satires) is convoluted, but the general rule makes intuitive sense to me: unless your parody is blatantly for-profit/self-advancement, you're covered.
There is a better alternative: surveying students. Standardized tests attempt to assign one number to teacher performance by quantifying student performance on one three-hour test. The survey examines dozens of metrics while allowing students to capture their over 100 hours of experience in the classroom watching their teacher.
We're discarding some of our most valuable data. In most schools, student evaluations of teachers aren't even administered, let alone analyzed and weighted in teacher assessment. Skeptical? See the research below. Properly-constructed surveys yield very accurate results; students are surprisingly honest in their responses, and students truly value a hard, fair teacher who actually teaches his students over an "easy A" teacher.
Take a look at Ronald Ferguson's work and the MET Project [1]. I can't find the original article, but this New York Times article [2] is a decent summary.
Student surveys could certainly contribute to the evaluation of teachers, but I doubt its a viable replacement for academic tests. We must also note that they are being compared to current test-based methods. They claim to get "good agreement" but without a real publication its hard to tell what that means. Does it mean that in most cases the students are honest? (i.e. that the disagreement is caused by dishonesty) Does it mean that in most cases students appreciate teachers that make them learn? (i.e. that the disagreement is caused by students who like their teachers even though the students themselves are not learning) Or is the disagreement inherent in the fact that one measure is better than the other at gauging teacher performance?
For the questions about relative ranking of contributions, sure. But the entropy results seem reasonable, since it might average over a large number per-developer preferences.