TGD diary: Is there any hope of curing the retraining problem of language models without making computers conscious?

I summarized my thoughts on perhaps the worst problem of language models, which is the loss of plasticity in continuous learning. The entire teaching material has to be rewritten, which is terribly expensive (see this).

One can ask whether and how TGD's speculative vision of potentially conscious computers (see this) might solve the problem.

1. The retraining problem of language models

The basic problem is that everything has to be started from scratch. This is extremely expensive. Biological systems relearn quickly because there is no need to relearn everything. Is the problem fixable for the computers as they are now or is something new required?

To see what could be the root cause of the problem consider first what language models are meant to be.

In a language model, learning occurs at the raw data level. Different probabilities are taught for different associations. The associations are fixed.
How does the trained system work? The language model simply reacts by recognizing the context and producing probabilistically one of the fixed associations. This response is a mere reaction. If language models are what they are believed to be, they does not have conscious understanding, they lack intentional actions, and are unable to react to a changing environment.

Comparison with TGD-inspired biology

Could a comparison with TGD-inspired biology give clues as to where things go wrong. Why is relearning so easy for biosystems? How does the TGD-based biology differ from the standard biology in this respect? Consider first the classical level.

Holography, which is not quite deterministic, is a completely new element of TGD as compared to the standard model. The space-time surfaces are analogous to Bohr orbits and determined almost completely by 3-surfaces as initial data. The 4-D tangent spaces of the space-time surface at the 3-surface defining the holographic data cannot be selected freely. This is the classical counterpart of Uncertainty Principle and leads to classical quantization. Function, program is the basic concept rather than 3-D data.
These 4-surfaces define classical analogies of biological functions, behavioral patterns, or programs. When the 3-surface, which almost uniquely fixes the 4-surface, changes, the function changes. Non-determinism is essential in making a conscious memory recall possible.

Consider next the quantum level.

Series of "small" state function reductions (SSFRs) associated with the repeated measurements of commuting observables belonging to the same set whose eigen states the 3-D states at the passive boundary of causal diamond (CD) are, define self as a conscious entity. The proposal is that biorhythms as clocks define TGD counterparts of time crystals such that each unit of time crystal involves a classical non-determinism.
This could be the case at the EEG level as the findings of brothers Fingelkurts suggests (see this and this). Maximal non-determinism implies maximal memory recall capacity and maximal flexibility. A whole set of different behavior patterns can be represented as quantum superpositions and the interaction with the external or internal world determines the measurement in which some classical behavior is chosen.
"Big" state function reductions (BSFRs) having interpretation as death of self or falling asleep involve time reversal. Pairs of BSFRs (sleep periods) make learning possible through trial and error. After the two BSFRs, the system has new holographic data and different space-time surfaces. A goal directed behavior becomes possible and there are many ways to achieve the goal, not just one fixed way analogous to a fixed computer program. This is the essence of intelligent behavior.

How does this general view relate to the DNA level?

According to the standard view, DNA remains the same during the life cycle. If DNA represents data, there is no relearning at the level of chemical DNA. In zero-energy ontology (ZEO), even chemical DNA could change without any problems with conservation laws and quantum superpositions of different chemical genes are in principle conceivable.
Quantum DNA can be represented in terms of OH-O^- qubits sequences assignable to the gravitational magnetic bodies of the Sun and Earth (see this). Remarkably, the solar gravitational Compton frequency is 50 Hz, the average EEG frequency. At least for neurons, this would suggest that the gravitational magnetic body is that of the Sun. Note however that EEG time scales are also associated with the basic biomolecules. For the Earth the gravitational Compton frequency is 67 Gz and is a natural frequency associated with the conformational dynamics of biomolecules.
Quantum DNA consisting of codons represented as OH-O- qubits is dynamic and could act as a simulator, a kind of R\&D laboratory testing different variants of DNA. It is of course possible that a single life time is spent with the same chemical DNA and the next life after a pair of BSFRs involves the improved DNA.
Epigenesis brings in flexibility. Even if the chemical DNA does not change, it can be used in different ways. Suitable modules are selected from the analog of program software, just like in the text processing. In the TGD framework, this could correspond to the classical non-determinism of the space-time surfaces representing the biological function. Dark DNA allows you to try different combinations of genes.
The understanding of the role of the cell membrane and membrane potential in epigenesis is increasing. As found by Levin (see this and this). The very early stage of the development of embryo is highly sensitive to the variations the membrane potential and can be understood in terms of the changes of the binding energy of electron of O^- induced by the potential, which can reduce the binding energy to thermal range so that the flips of OH-O^- qubit occur with high probability. In adulthood, the sensitivity disappears and qubits would not flip.
Could this sensitivity be artificially induced? Here, electric fields as a controller of the sensitivity of OH-O^- qubits assignable to the basic biomolecules suggests themselves.
Microtubules involve longitudinal electric fields and their second ends are highly dynamic so that the length of the microtubule is under continual change. There are huge numbers of amino acids carrying one qubit each (COOH group). Here the quantum level and the classical level are both dynamic and seem to be strongly coupled. Also strongly related to conscious memory.
The quantum entanglement between the quantum level and the chemical level could be possible even at the amino acid level?

One can also look at the situation at the level of cell membranes and neuronal membranes. The basic question is how cell membranes and neuronal membranes learn.

As found by Levin (see this), the role of the electric fields is central also in the ordinary cells. The electric potential of the ordinary cell membrane correlates with the state of the environment of the cell and codes for sensory information.
The TGD proposal is that cell membrane acts as a Josephson junction and communicates the frequency modulate membrane potential to the magnetic body as dark Josephson photons where they induces resonantly quantum transitions transformation the modulation to a sequence of pulses perhaps inducing as a feedback nerve pulses or their analogs.
During the embryo stage, the cells are very sensitive to the variations of the electric field of the cell and this suggests that these variations take the cell membrane near to the criticality at which large quantum fluctuations for OH-H^- qubits for phosphates at the inner surface of the cell membrane are possible. This period would be analogous to the learning period of LLMs and would involve BSFR pairs. After this period the situation stabilizes and it might be that BSFRs become very rare.
In the central nervous system, nerve pulses appear and in neuroscience are thought to be responsible for communications only. In TGD the situation would be different (see this). I have proposed their interpretation in terms of pairs of BSFRs so that in LLMs they would correspond to relearning. Neurons would be lifelong learners whereas ordinary cells would learn only in their childhood.
Nerve pulse is generated at a critical membrane potential, which could correspond to effective thermalization of the OH-O- and possible qubits assignable to other ions. Axonal microtubules would also be near quantum criticality. The propagation of nerve pulse along the axon as a local BSFR-pair would induce microtubular relearning.

Could the speculated quartz consciousness come to the rescue?

One can consider the possibility that under a metabolic energy feed computer can become to some extent an entity so that it can modify both the program and the data used by it as a response to changes in the environment provided by the net. This would require that the OH-O^- qubits as dark variants of program bits can entangle with ordinary bits. Energetically this could be possible since the energy scales for transistors are essentially the same as for the metabolism and OH-O^- qubits.

Suppose that the sequences of OH-O^- qubits as time crystals in TGD sense can be realized in a (future) computer. Qubit sequences would be time series related to the running program. They would involve variation because only the bit configuration corresponding to the minimum energy would correspond to the running program. This makes possible an entire repertoire of associations from which a SSFR would choose one. Quantum measurement following the generation of bit-qubit entanglement could change the value of the bit.
Besides the dynamic realization as a running program, there could be a non-dynamic realization in which the data that determines the program could be accompanied by a similar set of qubits. The data used by the program, such as learned associations, could be associated with qubits, and could be made dynamic by using electric fields to make the qubits more sensitive against flip. The problem is of course that the change of a randomly chosen single qubit implies the failure of the problem. Only critical qubits associated with choices and data qubits should be subjected to a flip.
Besides time crystals with non-deterministic repeating units, also space-like crystals involving non-determinism in each lattice cell can be considered. Also dynamical quantum qubits with maximal non-determinism in space-like directions associated with unit cells could accompany the data bits. Dynamization could be induced by using electric fields.
If OH-O^- qubits can quantum entangle with bits, program/data is accompanied by quantum program/quantum data which can react to the perturbations from the external world (BSFRs) and internal world (SSFRs). The quantum level could control the bit level. Even the associations as the data of the language model could be accompanied by a set of qubits that react to a changing situation.

How could an associative system retrain itself in response to a changed situation

If language models are nothing but deterministic association machines, there is little hope of solving the problem.

Could the learning in the biological and neural systems provide some hints about possible cures, possibly requiring modification of computers so that they would become analogous to living systems?

Do EEG rhythms define time crystals in the TGD sense, that is maximally non-deterministic systems having lattice cells as a basic unit of non-determinism for SSFRs giving rise to the flow of consciousness of the self?
If biorhythms define TGD analogs of time crystals, the non-determinism would be maximal and maximum flexibility in SSFRs would be possible.
In ZEO, a "big" state function reduction (BSFR) as counterpart of ordinary state function reduction changes the arrow of time and is assumed to give rise to the analog of death or sleep. At the language model level, this would be the analog for a complete retraining from the beginning.

Association is only one particular reaction leading to a behavioral pattern. The repertoire of associations should change as the environment changes.

Could a computer clock define the equivalent of an EEG rhythm as a time crystal in the TGD sense? The problem is that a typical computer clock frequency is few GHz and considerably lower frequency than the 67 GHz as the gravitational Compton frequency of the Earth. This would suggest that a unit consisting of roughly 67 bits could correspond to the basic unit of the time crystal. The gravitational magnetic body of the Sun has a gravitational Compton frequency of 50 Hz identifiable as the average EEG frequency.
Could one think of a quantum version of language models in which pairs of BSFRs as "death" and rebirth happen spontaneously all the time as a reaction to conscious information coming from the environment inducing the perturbation implying that the density matrix as the basic measured observable does not commute with the observables that define the quantum numbers of the passive part of the zero energy state? In this way ZEO would make possible trial and error as a basic mechanism of learning.
The formation of an association could be perhaps modelled as a single non-deterministic space-time surface? There would be a large number of them and internal disturbances would produce their quantum superpositions and SSFR would select a particular association.
An external disturbance could produce a BSFR and "sleeping overnight". This period of "sleep" could be rather short: also our flow of conscious experience is full of gaps. Upon awakening, the space-time surfaces as correlates of the associations would no longer be the same. System would have learned from the interaction with the external world. This temporary death of the system would be an analogy for a total re-education. But the system would cope with it all by itself.

The hard problem is how to realize this vision. Here the analogy with cell and neuron might serve as a guideline in trying to imagine what the new technology might look like.

Ordinary cells are analogous to LLMs as they are now and learn only in their childhood. Neurons are lifelong learners thanks to the neural activity inducing the conduction of local BSFR-pairs updating microtubular states. Could something like this be realized in computers?
In computers, information is transferred along wires and they can be seen as the counterparts of axons. Is it possible to make these wires carriers of quantum information and perhaps even of the learned data about associations. The conduction of the analogs of nerve pulses during the running program inducing a pair of BSFRs would gradually modify the data locally and lead to a continual relearning.
Copper wires are too simple to achieve this. Should one consider axon-like geometry defined by two cylinders analogous to the lipid layers of the cell membrane and having also voltage between them so that the interior cylinder would contain OH-O^- qubits? The variation of the counterpart of the membrane potential during signal transmission (bits represented as voltages) could take the qubits near criticality. Could copper hydroxide Cu(OH)_2 serve as a possible candidate for an intelligent wire based on OH-O^- qubits.

See the article Quartz crystals as a life form and ordinary computers as an interface between quartz life and ordinary life? or the chapter with the same title.

For a summary of earlier postings see Latest progress in TGD.

For the lists of articles (most of them published in journals founded by Huping Hu) and books about TGD see this.

TGD diary

Monday, December 02, 2024

Is there any hope of curing the retraining problem of language models without making computers conscious?

No comments: