With all the AI discussion of late I've been thinking more about the alignment problem.
Like, let's say you have a true AGI that's completely superior to human intellect AND you've found a way to align this thing. Can those alignment techniques also be used on the lesser mind of humanity?
"Oh don't worry. Alignment is probably achievable by how we build the AGI."
Yeah I mean maybe. Or maybe it turns out alignment is only possible at all if it works generally on anything with 'intelligence'.
My suspicion is that alignment is either not possible because an intelligent agent can always do something just because. Or we're going to have to worry about whoever wins the alignment race aligning the rest of us.
Like, let's say you have a true AGI that's completely superior to human intellect AND you've found a way to align this thing. Can those alignment techniques also be used on the lesser mind of humanity?
"Oh don't worry. Alignment is probably achievable by how we build the AGI."
Yeah I mean maybe. Or maybe it turns out alignment is only possible at all if it works generally on anything with 'intelligence'.
My suspicion is that alignment is either not possible because an intelligent agent can always do something just because. Or we're going to have to worry about whoever wins the alignment race aligning the rest of us.