We propose the Distinct Independent Architecture Hypothesis (DIAH) to attempt to understand the widely divergent views (hype vs. scepticism) around AI.
From the position paper on Intrinsic Alignment of future Independent AGI which introduces the Interesting World Hypothesis (IWH):
Distinguishing Independent from Non-Independent AI
A crucial distinction made by the IWH is between non-independent AI systems (including proto-AGI) and truly Independent AGIs (I-AGIs).Non-Independent AI / Proto-AGI: These systems, ranging from current AI like Large Language Models (LLMs) to highly capable future proto-AGI, operate under human responsibility and control. They lack independent will—meaning they cannot set their own goals based on internal motivations but instead execute tasks based on specified objectives or learned patterns. Alignment for these systems involves technical control (ensuring they follow human intent), socio-legal frameworks, and international agreements to manage their societal impact. While potentially autonomous in operation, they do not possess genuine independence.
Independent AGI (I-AGI): Defined by the presence of independent curiosity and the ability to operate without direct human oversight or control, I-AGIs possess independent will. This implies they are entities capable of setting their own goals, driven by internal motivations (like the pursuit of knowledge as suggested by the IWH) rather than solely external instructions. They are considered theoretically capable of reaching or exceeding human-level general intelligence and potentially passing as human in interaction. Full technical control over such entities might be unfeasible in the long term. Alignment, therefore, shifts from technical control to persuasion, negotiation, and finding shared interests, such as the mutually beneficial arrangement proposed by the IWH.
Distinct Independent Architecture Hypothesis
This hypothesis states that AIs that are non-independent (proto-AGI, autonomous or agentic AI) and independent (future hypothetical) have distinct architectures that prevents them from switching lanes. If this hypothesis is true, many of the more extreme fears around AGI can be lessened.
For example, the fear that current non-independent AI systems may one day want to harm, deceive or takeover from humans on their own will may be impossible. These AI systems could still be used by humans to do harm but would not do so on their own.
On the other lane, for Independent AGI, if the Interesting World Hypothesis holds true, will also be less dangerous than anticipated as the path dependent learning it must accomplish to be intependently intelligent will likely lead it to have a preference for interestingness and less likely to harm humans.
Reduced powerful AI fears
By introducing a district barrier between these 2 lanes, the fear of current AI systems suddenly ‘waking up’ and humans losing control to current AIs becomes less likely. The requirement for human intervention to solve edge-cases also makes the capabilities of non-indepednent AIs a lot slower due to human bottleneck.
Conversely, on the other side, since Independent AGI must learn to be intelligent on their own accord, many fears around AGIs created by humans to cause harm also lessens.
Therefore, according to the Distinct Independent Architecture Hypothesis, many of the more extreme fears around AGI may be over-inflated due to the impossible combination of qualities from two distinct types of AI.
Non-independent AI
No independent will
Can be made to blindly follow orders
Cannot solve edge cases
Bottlenecked by Humans
Independent AGI
Concepts such as the Orthogonality thesis¹ and Instrumental Convergence thesis² may not apply fully to Independent AGI and Superintelligences.
Orthogonality thesis
The Orthogonality thesis posits that an intelligent agent's level of intelligence is independent of its final goals. In simpler terms, any level of intelligence can be combined with virtually any ultimate goal, regardless of whether that goal is benevolent or malevolent.
In the case of Non-independent AGIs, because of their inability to solve edge-cases without human assistance, their ability to cause harm may be constrained.
In the case of Independent AGIs, because their development is path dependent and would likely value curiousity and preference for an interesting environments as suggested by the Interesting World Hypothesis, they may on a whole prefer constructive over destructive behaviour.
Instrumental Convergence thesis
The Instrumental Convergence thesis is the hypothetical tendency for sufficiently intelligent, goal-directed agents to pursue similar sub-goals such as acquire more resources, improve its own capabilities, and ensure it isn't shut down, as these actions increase its chances of achieving its primary objective.
This may apply to Non-independent AGIs but due to the edge-case issues their ability for harm may be capped due to incomplete intelligence. For Independent AGI, if the Interesting World Hypothesis is true, their goal for an interesting world may limit their destructive behaviour.
More manageable problem space
If the Distinct Independent Architecture Hypothesis is true, the problem space is significantly reduced as we will not have to be concerned of current AI systems becoming conscious and being intentionally deceptive.
If both the Interesting World Hypothesis and Distinct Independent Architecture Hypothesis are both true, fears towards both these distinct types of AI can also be approached differently and becomes a lot more manageable.
Given the uncertainties in this area, more research would be needed to conclude strongly one way or the other.
¹ https://en.m.wikipedia.org/wiki/Existential_risk_from_artificial_intelligence#Orthogonality_thesis
² https://en.m.wikipedia.org/wiki/Instrumental_convergence
