This may be the achilles heel of the freenix world: since almost anything can be worked-around with some scripting, people actively argue against putting facilities like this at the right place in the system's architecture. Add 1 to the Cathedral's score.
... Except why exactly is kernel and libc the right place to put this in?
Remember that stuff in kernel works outside memory protection. Moving complex stuff away from the kernel is a win pretty much always when there are no pressing performance reasons to do otherwise. This code will only run once every time a program is started, so performance is certainly not a good reason to put it in kernel.
What I really want to reiterate here is that using fat_elf:s would in absolutely no way make shipping stuff to multiple different linux platforms easier. The reason it is hard is because when someone says linux, pretty much the only guarantee you have about the system is that it runs a linux kernel, and even that can potentially be so old or be so strange that you can't trust anything about it. All fat_elf gives us is stuffing multiple binaries in a single file, and we can do this already without much fuss. It does not in any way give us true "Universal Binaries", because to do so, we would have to either agree on a common subset of libraries a linux system should always ship (and agree on indefinite binary compability for those libraries), or ship a meaningful portion of the entire platform in every binary.
If you want what fat_elf gives, you can get it with a 20-line shell script you concatenate into a bunch of binaries, and it could be argued that it is the cleaner and better place to put it in, considering it's only an ugly hack anyway. It's just that people look at fat_elf and see something that isn't there.
The architectural location that I was alluding to is some fat binary format itself, because this has the greatest impact on customer and user experience. How you parcel out the supporting code between kernel and userland would flow from that invariant...if I were Linus for a day.
Or, perhaps even better, adopt a small, architecture-neutral IR like LLVM-BC and a workstation of any architecture could choose to compile it just-in-time at launch, or at install time.
Either way, the greatest tragedy here is small thinking. I would love to see Linux rise to the level where users don't need to know what an instruction-set architecture is.