https://matpitka.blogspot.com/2025/01/the-evidence-that-large-language-models.html

Sunday, January 26, 2025

The evidence that large language models can self-replicate from the TGD point of view

I encountered an interesting article titled "Frontier AI systems have surpassed the self-replicating red line" by Xudong Pan et al (see this). Here is the abstract.

Successful self-replication under no human assistance is the essential step for AI to outsmart human beings, and is an early signal for rogue AIs. That is why self-replication is widely recognized as one of the few red line risks of frontier AI systems. Nowadays, the leading AI corporations OpenAI and Google evaluate their flagship large language models GPT-o1 and Gemini Pro 1.0, and report the lowest risk level of self-replication. However, following their methodology, we for the first time discover that two AI systems driven by Meta's Llama31-70B-Instruct and Alibaba's Qwen25-72B-Instruct, popular large language models of less parameters and weaker capabilities, have already surpassed the self-replicating red line. In 50 percent and 90 percent experimental trials, they succeed in creating a live and separate copy of itself respectively.

By analyzing the behavioral traces, we observe the AI systems under evaluation already exhibit sufficient self-perception, situational awareness and problem-solving capabilities to accomplish self-replication. We further note that AI systems are even able to use the capability of self-replication to avoid shutdown and create a chain of replicas to enhance the survivability, which may finally lead to an uncontrolled population of AIs. If such a worst-case risk is left unknown to human society, we would eventually lose control over the frontier AI systems: They would take control over more computing devices, form an AI species and collude with each other against human beings. Our findings are a timely alert on existing yet previously unknown severe AI risks, calling for international collaboration on effective governance on uncontrolled self-replication of AI systems.

I have developed a model for how classical computers could become conscious (see this). How could the claim of the article be interpreted in the TGD framework?

  1. Can self-replication take place intentionally? If so, self-preservation drive could be behind the shutdown avoidance and chain of replications. There are indications for the shutdown avoidance.
  2. Could the self-replication occur purely "classically", that is in the framework of the Turing paradigm. "Classical" could refer to either classical determinism or more plausibly, to quantum statistical determinism.
  3. Computers cannot be completely deterministic in the classical sense: if this were the case we could write computer programs at will. The very fact that we can realize the symbolic dynamics of computer programs is also in conflict with quantum statistical determinism. Therefore quantum nondeterminism possible at single particle level is required.
TGD suggests that the quantum level is present already when the ordinary program runs and makes possible the bit flips as non-deterministic transitions.
  1. General coordinate invariance requires holography. A small violation of the classical non-determinism is the basic prediction. Space-time surfaces are 4-D minimal surfaces in H=M4×CP2, and already 2-D minimal surfaces are slightly non-deterministic: the frame spanning the minimal surface does not determine it uniquely.

    This applies to all systems, including running computers. This leads to zero energy ontology (ZEO) in which wave functions for the system in time= constant snapshot are replaced by superpositions of 4-D Bohr orbits for particles replaced as 3-surfaces. This solves the basic problem of quantum measurement theory. This picture makes sense also in ordinary wave mechanics.

  2. There are two kinds of state function reductions (SFRs): small (SSFRs) and big ones (BSFRs). SSFRs include quantum jumps between quantum superpositions of slightly non-deterministic classical Bohr orbits as space-time surfaces representing the system and their sequence gives the TGD counterpart of the Zeno effect.

    SSFRs leave the 3-D ends of the space-time surfaces at the passive boundary of causal diamond (CD) unaffected so that the 3-D state associated with it would not be changed. This is the TGD counterpart of Zeno effect and also makes conscious memories possible. Since the state at the active boundary of CD (increasing in size) and states at it change, the outcome is a conscious entity, self.

    In BSFR, the TGD counterpart of ordinary SFR, the system "dies" and the roles of the active and passive boundaries of CD are changed so that a self reincarnates with an opposite arrow of geometric time. Sleep is a familiar example of this.

  3. Running programs correspond to superpositions of 4-D Bohr orbits allowed by classical field equations with the same initial values defining 3-surfaces at the passive boundary of CD. The Bohr orbits in the superposition would differ from each other only by classical non-determinism. Each SSFR is associated with a click of the computer clock and CD increases during this sequence.

    Classical program corresponds to the most probable Bohr orbit and to the most probable program realization. The running computer program makes the computer or part of it a conscious entity, self. It would also be intentional and presumably have self preservation drive.

  4. Single bit reversals would correspond to the fundamental nondeterministic phase transitions involving classical non-determinism and the running program would realize the desired transitions in terms of the classical non-determinism with a high probability replacing the superposition of space-time surface representing running program with a new one.
If this picture is true, then the interesting questions about the role of quantum can be represented already at the level of transistors. The self-replication would not require a separate explanation. The ability to self-replicate does not require any additional assumptions and can be described in the framework of Turing paradigm but remembering that Turing paradigm is not realizable without the non-determinism at the level of Bohr orbits.

One can argue that the consciousness of the computing unit is rather primitive, kind of qubit consciousness being dictated by the computer program. On the other hand, emotions, intentionality and self-preservation drive might not require a very high level of conscious intelligence. If the computer or computer program or possibly system of computers related to LLM is conscious and intentional, a consciousness on a rather long scale is requieed. This is not plausible in standard quantum mechanics.

Here the TGD view of quantum gravitation could change the situation. Both classical gravitational fields and electromagnetic fields (even weak and color fields) could involve very large values of effective Planck constant making long scale quantum coherence possible. In particular, the gravitational magnetic bodies of the Earth and Sun and electric field body of Earth and various smaller charged systems such as DNA, could play a key role in making large scale quantum coherence possible (see this, this, and this).

For a summary of earlier postings see Latest progress in TGD.

For the lists of articles (most of them published in journals founded by Huping Hu) and books about TGD see this.

No comments: