> The orthogonality thesis is an unproven assumption
A safety mindset would suggest that rather than disregarding it until proven true, we should worry about it until proven false.
> meaning an intelligent being can pursue stupid goals.
It would pursue very intelligent instrumental goals, but the terminal goal is a free variable and I don't think there exists any measure by which terminal goals can be considered smart or stupid. It would be whatever is implied by its programming.
> States like that, it’s obviously wrong and laughable.
Perhaps not so obviously wrong nor so laughable as you think?
> There’s no clear delineation in real entities between instrumental and terminal goals.
In healthy individuals, yes there absolutely is. Terminal goals are the ones you pursue for their own sake; instrumental goals are the ones you pursue as part of a plan to pursue a terminal goal, or another instrumental goal which connects to a terminal goal. Most people go to work in the morning not as a terminal goal, but as an instrumental goal; employment is in service of another instrumental goal of earning money; earning money is in service of a terminal goal of not starving to death. This is not exactly controversial stuff here. Some people do get so focused on an instrumental goal like "earning money" that they develop tunnel-vision and forget what terminal goal that money was originally in service of, but that's something most of them will eventually realize and then write a self-help book about.
Anyway, it takes intelligence to decide what your instrumental goals should be, such as whether there's perhaps a cleverer way to make money than by going to work for your boss each morning, but there's no way in which intelligence will help you choose your terminal goals. For the most part they aren't something you can even consciously choose.
Yes, you can stretch your model to try to explain why humans go to work.
In reality, people do not need to go to work to “not starve to death” as you say. There are a myriad of ways to survive without working a daily job.
Humans have to be socialized and trained to work a 9 to 5 job - there’s an entire education system structured to help create humans who view that as an acceptable goal.
No what you are saying may not be controversial in your little community but the AI panic is mostly isolated to a small community in a small corner of the USA.
Between "I go to the grocery store because I've been socialized and trained that going to the grocery store regularly is a Good Thing, and I have adopted it as a terminal goal to which I know I should dedicate efforts", vs "I go to the grocery store because I'm out of carrots, my dinner recipe calls for carrots, and I think I can get some there", the latter model is not the one that strikes me as being stretched to explain human behaviour.
As for there being "a myriad of ways to survive without working a daily job", congratulations! Your intelligence has allowed you to identify alternative instrumental goals that provide a path to your terminal goal; now you can rank them and choose the best option. You can also grow carrots in the garden or ask your neighbour if they have any, or ask your spouse to pick some up on the way home. Your intelligence will do the work and find a way. But your intelligence isn't what will guide you toward preferring carrot soup over parsnip soup, and preferring parsnip soup over fasting.
You’re arguing again from a position that assumes that entities have clearly defined terminal versus instrumental goals - which is precisely the position I reject. For instance in your example of “terminal goal is groceries” versus “terminal goal is hunger” neither of these describe how actual humans make decisions. Instead there’s a process, part biological and part environmental. The human checks the fridge then thinks “oh I feel hungry”. Is hunger the terminal goal or the fridge? That question doesn’t even make sense - it’s an interaction between the agent and the environment. Do Pavlov’s dogs have a terminal goal of “salivating to bells” or “salivating to food”? Again the question doesn’t make sense- the agent has built a habit from within a certain environment and the salivating is not goal directed. That’s why training works for dogs, and putting the fridge out of site reduces hunger in humans.
Think more carefully about the implications of multiple ways to survive here. Why do people pick one over the other? In a terminal/instrumental goal model, agents would pick the instrumental route that maximizes the return on the terminal goal. In reality we see that instead humans adopt habits, processes, and heuristics that guide them through daily life even when those do not lead to any specific goal.
Yeah, that's because re-evaluating your entire life plan and belief structure every second is expensive, and heuristics are cheap. People certainly have flaws in their thinking, which is why we fall prey to pyramid schemes, gambling, responding to pointless comment chains on HN, and so on. I don't disagree with this, and again, it's why we publish so many self-help books. But I believe it's our weakness and stupidity, not our superior intelligence and clear thinking, that traps us in bad habits.
So remind me of your original point? I believe you said it's "obviously wrong and laughable" that "an intelligent being can pursue stupid goals". Now here you are trying to convince me that humans are the ones who, like Pavlov's dogs, "pursue habits, processes, and heuristics that guide them through daily life even when those do not lead to any specific goal". Even when those habits involve repeatedly re-opening a fridge that you already know has no carrots in it, or salivating at a bell when you already know no food is coming.
So I'm confused how that proves your point about AGI. If I accept your view, it seems that if an AGI does merely no better than a human on this metric, I should anticipate all sorts of strange and irrational behaviour, including the pursuit of goals that would appear stupid, such as addiction to a reward channel. That does not seem to undermine the orthogonality thesis.
And the smarter the AGI gets, presumable the less it should lean on Pavlovian heuristics and the more it should make use of clear thought, which puts it more in my camp.
So that would apparently put the lower bound at "the AGI takes unexpected and irrational actions because it's not a rational agent and doesn't think coherently", and the upper bound at "the AGI takes unexpected and dangerous actions as rational steps toward an unaligned terminal goal".
I'm not sure where in this chain of thought it becomes laughably obvious that intelligence and goals are correlated, such that an AGI's increasing intelligence will tend it toward actions that we humans approve of, because anything else would be a "stupid goal"?
A safety mindset would suggest that rather than disregarding it until proven true, we should worry about it until proven false.
> meaning an intelligent being can pursue stupid goals.
It would pursue very intelligent instrumental goals, but the terminal goal is a free variable and I don't think there exists any measure by which terminal goals can be considered smart or stupid. It would be whatever is implied by its programming.
> States like that, it’s obviously wrong and laughable.
Perhaps not so obviously wrong nor so laughable as you think?