It sounds reversed to me- shouldn't the "cherry" be supervised learning and the ...

It sounds reversed to me- shouldn't the "cherry" be supervised learning and the "icing" be reinforcement learning? At least insofar as reinforcement learning is closer to the "cake" of unsupervised learning, as there is less feedback required for a reinforcement learning system to work (a binary correctness signal rather than an n-dimensional label signal.)

It might also be argued that most "unsupervised learning" in animals can be broken down into a relatively simple unsupervised segment (e.g., an "am I eating nice food" partition function) and a more complicated reinforcement segment (e.g. a "what is the best next thing to do to obtain nice food?" function.) I'm sure someone like Yann LeCun is familiar with such arguments, though.