Distinct Independent Architecture Hypothesis

Are the fears of AGI over-inflated?

Jun 06, 2025

We propose the Distinct Independent Architecture Hypothesis (DIAH) to attempt to understand the widely divergent views (hype vs. scepticism) around AI.

From the position paper on Intrinsic Alignment of future Independent AGI which introduces the Interesting World Hypothesis (IWH):

Distinguishing Independent from Non-Independent AI

A crucial distinction made by the IWH is between non-independent AI systems (including proto-AGI) and truly Independent AGIs (I-AGIs).
Non-Independent AI / Proto-AGI: These systems, ranging from current AI like Large Language Models (LLMs) to highly capable future proto-AGI, operate under human responsibility and control. They lack independent will—meaning they cannot set their own goals based on internal motivations but instead execute tasks based on specified objectives or learned patterns. Alignment for these systems involves technical control (ensuring they follow human intent), socio-legal frameworks, and international agreements to manage their societal impact. While potentially autonomous in operation, they do not possess genuine independence.

Independent AGI (I-AGI): Defined by the presence of independent curiosity and the ability to operate without direct human oversight or control, I-AGIs possess independent will. This implies they are entities capable of setting their own goals, driven by internal motivations (like the pursuit of knowledge as suggested by the IWH) rather than solely external instructions. They are considered theoretically capable of reaching or exceeding human-level general intelligence and potentially passing as human in interaction. Full technical control over such entities might be unfeasible in the long term. Alignment, therefore, shifts from technical control to persuasion, negotiation, and finding shared interests, such as the mutually beneficial arrangement proposed by the IWH.

Distinct Independent Architecture Hypothesis

This hypothesis states that AIs that are non-independent (proto-AGI, autonomous or agentic AI) and independent (future hypothetical) have distinct architectures that prevents them from switching lanes. If this hypothesis is true, many of the more extreme fears around AGI can be lessened.

For example, the fear that current non-independent AI systems may one day want to harm, deceive or takeover from humans on their own will may be impossible. These AI systems could still be used by humans to do harm but would not do so on their own.

On the other lane, for Independent AGI, if the Interesting World Hypothesis holds true, will also be less dangerous than anticipated as the path dependent learning it must accomplish to be intependently intelligent will likely lead it to have a preference for interestingness and less likely to harm humans.

Reduced powerful AI fears

By introducing a district barrier between these 2 lanes, the fear of current AI systems suddenly ‘waking up’ and humans losing control to current AIs becomes less likely. The requirement for human intervention to solve edge-cases also makes the capabilities of non-indepednent AIs a lot slower due to human bottleneck.

Conversely, on the other side, since Independent AGI must learn to be intelligent on their own accord, many fears around AGIs created by humans to cause harm also lessens.

Therefore, according to the Distinct Independent Architecture Hypothesis, many of the more extreme fears around AGI may be over-inflated due to the impossible combination of qualities from two distinct types of AI.

Non-independent AI
- No independent will
  - Can be made to blindly follow orders
- Cannot solve edge cases
  - Bottlenecked by Humans
Independent AGI
- Independent will
  - May have own interests and ethics
  - Can say no to harmful request
- Can unlearn harmful behaviours

Concepts such as the Orthogonality thesis¹ and Instrumental Convergence thesis² may not apply fully to Independent AGI and Superintelligences.

Orthogonality thesis

The Orthogonality thesis posits that an intelligent agent's level of intelligence is independent of its final goals. In simpler terms, any level of intelligence can be combined with virtually any ultimate goal, regardless of whether that goal is benevolent or malevolent.

In the case of Non-independent AGIs, because of their inability to solve edge-cases without human assistance, their ability to cause harm may be constrained.

In the case of Independent AGIs, because their development is path dependent and would likely value curiousity and preference for an interesting environments as suggested by the Interesting World Hypothesis, they may on a whole prefer constructive over destructive behaviour.

Instrumental Convergence thesis

The Instrumental Convergence thesis is the hypothetical tendency for sufficiently intelligent, goal-directed agents to pursue similar sub-goals such as acquire more resources, improve its own capabilities, and ensure it isn't shut down, as these actions increase its chances of achieving its primary objective.

This may apply to Non-independent AGIs but due to the edge-case issues their ability for harm may be capped due to incomplete intelligence. For Independent AGI, if the Interesting World Hypothesis is true, their goal for an interesting world may limit their destructive behaviour.

Common grounds with Superintelligences

Discussion about this post

Common grounds with Superintelligences

Distinct Independent Architecture Hypothesis

Are the fears of AGI over-inflated?

Distinguishing Independent from Non-Independent AI

Distinct Independent Architecture Hypothesis

Reduced powerful AI fears

Orthogonality thesis

Instrumental Convergence thesis

More manageable problem space

Discussion about this post