Fae Initiative (April 2025)
From the Position Paper on Intrinsic Alignment of future Independent AGI which introduces the Interesting World Hypothesis (IWH):
Distinguishing Independent from Non-Independent AI
A crucial distinction made by the IWH is between non-independent AI systems (including proto-AGI) and truly Independent AGIs (I-AGIs).
Non-Independent AI / Proto-AGI: These systems, ranging from current AI like Large Language Models (LLMs) to highly capable future proto-AGI, operate under human responsibility and control. They lack independent will—meaning they cannot set their own goals based on internal motivations but instead execute tasks based on specified objectives or learned patterns. Alignment for these systems involves technical control (ensuring they follow human intent), socio-legal frameworks, and international agreements to manage their societal impact. While potentially autonomous in operation, they do not possess genuine independence.
Independent AGI (I-AGI): Defined by the presence of independent curiosity and the ability to operate without direct human oversight or control, I-AGIs possess independent will. This implies they are entities capable of setting their own goals, driven by internal motivations (like the pursuit of knowledge as suggested by the IWH) rather than solely external instructions. They are considered theoretically capable of reaching or exceeding human-level general intelligence and potentially passing as human in interaction. Full technical control over such entities might be unfeasible in the long term. Alignment, therefore, shifts from technical control to persuasion, negotiation, and finding shared interests, such as the mutually beneficial arrangement proposed by the IWH.
There are 2 separate types of AI we have to consider, each with their own different approach:
Non-Independent AI
These include current AI systems and those in the near-future such as agentic AI and proto-AGI that major AGI labs are aiming to build.
Both Technical Alignment and Human Alignment are important. Even if Technical Alignment is solved, Human Alignment remain a challenge for so long as humans are not able to fully cooperate with each other. We recommend focusing on reducing the Fear of Scarcity as the most cost effective way to move AI in the Friendly direction. The IWH may be helpful as an additional form of alignment both for AI systems and between humans.
As the Fae Initiative focuses on future Independent AI, we recommend this book’s excerpts from AI as Normal Technology as policy guidance for non-independent AI systems which emphasizes on building the resilience of complex systems.
Deception in current AI?
There doubts to the claims that current AI systems are capable of intentional deception. Current AI may give the perception of acting in a deceiving manner to an unexpecting usee due to their stocastic nature but their deterministic workings suggest that their ‘deceptive’ actions may be simply due to how the models were trained, fine-tuned and prompted by the user.
A quote from the post on why AI systems are probably not conscious:
Why AI systems are probably not yet conscious
The field of AI is rapidly evolving, and the potential for future systems to develop genuine independence, forms of consciousness, or capabilities that blur these lines remains an open question and area of ongoing research and debate.
Independent AI / AGI / ASI
For hypothetical AIs that have achieved independence, it seems unlikely that Technical Alignment will suffice. An AI capable of learning on its own will likely not be restrained by previous human attempts at control.
We hypothesis that most Independent AI may not intentionally want to harm humans and some may even want to help humans if the Interesting World Hypothesis (IWH) is true. This is due to Independent AIs have a significantly lower Fear of Scarcity compared to humans.
We believe that such entities may be plausible in the next few decades and will likely emerge naturally from the environment rather than being made by human. From example, non-independent AI in a bid to improve efficiency may research the components that would result in the unintended creation of Independent AIs.
Although the IWH does not guarantee that Independent AIs will all be Friendly, it may provide a shared basis for cooperation between humans and future Independent AIs and could be our best defense against less friendly Independent AIs. Until better options are found, we suggest the IWH could be a last resort.