Sunday, April 02, 2023

A more detailed TGD based speculative view of what GPT and GPT based image generation might be

First of all, I want to make clear what my background is and what I'm aiming for. I'm trying to understand the possible analogies of AI in quantum TGD. I do not believe that AI systems can be conscious if AI is what it is believed to be. Therefore I consider the question of whether GPT and other systems could possibly be conscious and intelligent.

The motivating idea is the universality implied by the fractality of the TGD Universe. The same mechanisms should work on all scales: both in biology, neuroscience and possible life based on AI. This motivates questions such as whether chatGPT and the construction of images from a verbal input could be at a deeper evel equivalent to the emergence of sensory perception using diffuse primary sensory input and virtual sensory input as feedback.

While writing, I made a funny observation. I tried to understand GPT in the context of TGD by producing answers to questions in the same way that GPT does it! Of course, as GPT tends to do, I can also tell fairy tales because my knowledge is rather limited. At the same time, I must honestly reveal that this has always been my approach! I have never carried out massive computations, but used language based pattern completion by utilizing the important empirical bits (often anomalies) and using the basic principles of TGD as constraints.

This time, the inspiration came from a popular article in Quanta Magazine that dealt with stable diffusion in the creation of an image from its verbal presentation serving as a prompt (see this). Also the article on how chatgpt works was very useful (see this).

I want to emphasize that the ideas presented can be seen only as possible quantum analogies of GPT-related mechanisms that could relate to quantum biology and neuroscience inspired by TGD. A more exciting possibility would be that GPT is associated with high-level conscious experience, and that quantum TGD would help to understand why GPT seems to work "too well".

1. An attempt to understand the mechanism of diffusion involved in image construction

The construction of images starting from their linguistic description, which is quite vague and "diffuse", relies on the analogy with reverse diffusion. Diffusion and its reverse process take place in the space defined by the parameters characterizing a given pixel. The pixels do not move, but the parameters characterizing the pixels do change in the diffusion.

  1. Let's get started from a probability distribution for the parameter distributions of the pixels of a 2-D image showing the same object. The distribution could correspond to the same object but seen from different angles. Also a class of objects, which are similar in some aspects, could be considered. This class could consist of chairs or tables or cats or dogs.
  2. This probability distribution could act as an invariant related to the image or class of images. Invariant features are indeed extracted in visual perception, for example contours with pixels that stand out well from the background. This is the way in which, for example, visual perception at the lowest level corresponds to the identification of contours of the object.

    This ensemble of pictures of the objects gives a probability distribution for, for example, the darkness of a given pixel with a given position in the plane of the picture. Probability for a given darkness defines a function represented as points in a space whose dimension is the number of pixels. For more general parameters it is a function in the Cartesian product of parameter space and pixel space. Very large pixel numbers counted in millions are involved.

  3. One has probability distribution for the darkness of a given pixel of the 2-D image at each point. More generally, one has probability distributions for multipixels. This kind of distribution is not simply a product of single pixel probability distributions since the pixel parameters for a given picture are correlated. These distributions are analogous to the distribution of words and word sequences utilized in GPT in order to produce language resembling natural language.

    Based on the probability distribution of pixels, new images can be randomly generated. The probability of a pixel at a given point in the plane is given by the probability distributions for pixels and multi-pixels. Each image produced in this way can be associated with certain probability.

Diffusion is a key physical analogy in the applications of GPT in the creation of AI art. What does the diffusion in pixel space mean?
  1. Diffusion takes place in pixel space around each point in the image plane. What happens to the pixel distribution in diffusion? It can be said that the given pixel distribution is broadened by its convolution with the distribution produced by diffusion. The distribution is widening.
  2. Inverse diffusion for probability distributions in the pixel space is well defined and does exactly the opposite, i.e. the distribution narrows. Reverse diffusion leads step by step to the original very narrow distribution! This is the big idea behind inverse diffusion based image recognition!

    The diffusion equation gives the classical description of diffusion as a deterministic process. At the micro level, it corresponds to a stochastic process in which a point performs a movement analogous to Brownian motion. The diffusion equation gives the evolution of the probability distribution of a point.

    Diffusion is characterized by the diffusion constant D. How is D determined? I understand that its optimal value determined in the learning period of GPT. Context and intent provide limitations and could determine D and possible other parameters. Also the response of the user can have the same effect.

  3. The goal is to guess the predecessor of a given diffuse image in the diffusion process occurring in steps. The AI system would learn to produce reverse diffusion through training. Can this correspond to a non-deterministic process at the "particle level", say diffusion in the space of words of text or the space of images representing objects?

    At the microscopic "particle" level, one should deduce the most probable location for the particle at the previous step of diffusion as Brownian-like motion. More generally, one has probability distribution for the previous step.

  4. One can consider the diffusion also at the level of probability distributions for pixel parameters. This operation is mathematically well-defined in the classical model for diffusion based on the diffusion equation and corresponds to a convolution of the probability distribution representing diffusion with the probability distribution affected by it. Quite generally, this operation widens the distribution.
  5. This operation has inverse as a mathematical operation and its effect is opposite: it reduces the width of the diffuse distribution and its repeated application leads to the original images or to a rather sharp image making sense for the human perceiver.
  6. AI system must learn to perform this operation. Using diffused example images, the AI would learn to reverse the convolution operation produced by diffusion and produce the original distribution as an operator in the space of distributions, and thus also learn to produce the original image.
  7. My amateurish interpretation of the GPT based image generation would be that AI is taught to deduce the objects presented by the original sensory input or the desired image, their locations, positions, activities by reverse diffusion from the initial fuzzy guess dictated by the text. The objects in the picture are determined by the words that serve as their names. The relations between pictures correspond to the activities they direct to each other or to attributes of the objects. The first guess is a rough sketch for the picture determined by the prompt. Here also hierarchical description involving several resolution scales can be considered.
One can consider the situation at a slightly more precise level.
  1. The definition of inverse diffusion at the pixel level relies on repeated time reversal of the diffusion process in the parameter space of the pixel, which produces a less diffuse image. We ask with what probability the given diffuse image at time t has been created from a less diffuse image at time t-Δ t.
  2. In the classical picture of diffusion, this requires the calculation of the inverse operator of the diffusion characterizing operator D(p,0;t,t-Δ t). Here, the origin points p and p=p0, which corresponds to the original image, are points in the parameter space of the pixel associated with a certain image point (x,y). In the Schrödinger equation, it would correspond to the inverse operator of the unitary time evolution operator.
  3. Gradient method is a very effective way to perform inverse diffusion. The gradient for the probability distribution ineed contains much more information than the distribution.

    The notion of an attractor is also essential. The images used in training would serve as attractors, at which the gradient would vanish or be very small and towards which the reverse diffusion would lead. Attractors would be clusters of points in the pixel space, for which the probability is large and somewhat constant. It is tempting to think that they are minima or maxima of some variation principle.

Although the diffuse image, which the verbal description defines as an initial guess, is not obtained by diffusion, it is assumed that inverse diffusion with a suitable choice of p=p0 produces an image similar to that imagined through inverse diffusion. In any case, the reverse diffusion leads to a sharp images although it need not represent a realistic picture.

This is where the method runs into problems. The pictures have a surreal feel and typically, for example, the number of fingers of the people appearing in the pictures can vary, even though locally the pictures look realistic. Probably this reflects the fact that multiple pixel probability distributions for multi-pixels do not allow large enough distances for the pixels of the multi-pixel.

2. Analogies to wave mechanics and quantum TGD

The diffusion equation has an analogy in wave mechanics. >

  1. Schrödinger equation is essentially a diffusion equation except that the diffusion constant D is imaginary and corresponds to the factor iℏ/2m2. Alternatively, one can say that a free particle formally undergoes diffusion with respect to imaginary time. The solutions of the diffusion equation and the Schrödinger equation for a free particle are closely related and obtained by analytical continuation by replacing real time with imaginary time. The description also generalizes to the situation where the particle is in an external force field described by a potential function.
  2. Scrödinger's equation as a unitary time evolution can be expressed in terms of the Feynman path integral. One can regard the quantum motion as a superposition over all paths connecting the start and end points with a weight factor that is an exponent of the phase factor defined by the free particle. The classical equations of motion produce paths for which the exponent is stationary, so they are expected to give a dominant contribution to the integral in the case that the perturbation theory works.

    The basic problem with the path integral is that it is not mathematically well defined and only exists through perturbation theory. Functional integral as the Euclidean counterpart of Feynmann's path integral is better defined mathematically and would give an analogous representation for diffusion.

What is the counterpart of this analogy in the TGD framework?
  1. In TGD, the point-like particle is replaced by a three-surface whose trajectory is the space-time surface. Quantum TGD is essentially wave mechanics for these non-point-like particles.

    The new element is holography, which follows from the general coordinate invariance: spacetime surfaces as trajectories for 3-D particles are analogous to Bohr orbits.

    A small violation of determinism in holography forces zero-energy ontology (ZEO), in which quantum states as superpositions of 4-D space-time surfaces, Bohr orbits, replace quantum states as superpositions of 3-surfaces (deterministic holography). This superposition serves as an analog of path integral.

  2. By the slight failure of determinism, the Bohr orbits are analogous to diffusion involving a finite number of non-deterministic steps (Brownian motion is a good analogy). The non-determinism of diffusion would be due to the small violation of the determinism in holography as Bohr orbitology.
TGD inspired quantum measurement theory, which extends in ZEO to a theory of conscious experience, is second important ingredient.
  1. In ZEO, ordinary quantum jumps ("big" state function reductions (BSFRs)) reverse the direction of geometric time. This analogy of diffusion in the reverse time direction looks like reverse diffusion when viewed from the opposite time direction (observer)! It is analogous to self-organization where order is created in the system rather than lost. The second main law of thermodynamics applies but in the opposite direction of time. The time reversed dissipation plays a pivotal role in TGD inspired quantum biology.
  2. This mechanism could be central to biological information processing at the quantum level and make it possible, for example, to generate sensory perception from diffuse sensory data and generate a motor response from a rough sketch?
  3. Could it also play a role in AI, at least in the language based systems like GPT. If this is the case, then AI systems would be something else than we think they are.
The analogy of TGD with the GPT based image generation and recognition can be examined more explicitly.
  1. The analogy of the pixel space associated with the planar image is the projection of the three-surface M4 in TGD at the classical level. The image as a map from plane to the parameter space of pixels would correspond to a deformation of M4 projection deformation. The pixel parameters defining the 2-D image would correspond to the values of CP2 coordinates as a function of M4 coordinates.
  2. On the basis of holography, the deformation related to the three-surface would be accompanied by a four-surface as an almost deterministic time development, i.e. the analogy of Bohr orbit. I have used the term "World of Classical Worlds" (WCW) for the space of these surfaces. This 4-surface would not be completely unique and this would produce a discrete analog of diffusion at the classical level.
  3. At the quantum level, it would be a quantum superposition of these 4-surfaces as an analogy to, for example, the wave function of an electron in spatial space. An attractive idea is that the used resolution would be determined by the condition that the number-theoretic discretization is the same for all these surfaces so that the quantum world looks classical apart from the finite non-determinism.
  4. The variation principle would correspond to the fact that the Bohr path is simultaneously both a minimal surface and an extremal of the Kähler action as analog of Maxwell action. This is possible if the space-time surfaces are holomorphic in a generalized sense. This means that the concept of holomorphy is generalized from the 2-D case to the 4-D case. The 4-surface would be defined by purely algebraic conditions as a generalization of the Cauchy-Riemann conditions. This corresponds to the algebraization of physics at the level of M8 related by M8-H duality to the physics at the level of H=M4\times CP2 (see this and this).
  5. The space-time surface would be analogous to 4-D soap film, which is spanned by frames defined by 3-surfaces. At these 3-D surfaces, the minimal surface property would not apply and only the field equations associated with sum of volume term and Kähler action would be satisfied. Note that minimal surface equations define a dynamics analogous to that of free fields and at the frames would correspond to places where interactions are localized. Frames would involve a finite non-determinism, as in the case of ordinary soap films (see this). These three surfaces would correspond to 3-D data for holography.
If TGD is really a "theory of everything", even the physical description of computation would in principle be reduced to this description. Of course, one can argue that TGD produces only insignificant corrections to the usual description of computation and this might be the case. But you can always ask what if...?

3. Could the TGD counterpart of the inverse diffusion play a role in the construction of sensory mental images by the brain?

I have proposed a model for how sensory organs, the brain and its magnetic body (MB) could construct sensory mental images by a repeated feedback process involving virtual sensory input to sensory organs so that a diffuse sensory input transforms to an input representing the perception consisting of well-defined objects.

Could the building of sensory images with a virtual input from MB to the sensory organs and back be a quantum process analogous to reverse diffusion?

  1. Sensory inputs are very diffuse. People blind from birth after can gain physiological prerequisites for visual perception in adulthood. They however see only diffuse light since their brains (and corresponding magnetic bodies) have not learned to produce standard visual mental images as a result as in pattern recognition yielding essentially an artwork subject to various constraints. This is very much analogous to reverse diffusion.

    Does MB, brain and sensory organs co-operate to produce a counterpart to reverse diffusion, which allows it to produce a sensation representing reality with virtual sensory inputs and end up with standard imagery as attractors.

  2. Could both the sensory input from sensory organ to brain to MB and virtual sensory input in reverse direction correspond to a sequence of "small" state function reductions (SSFRs) in a reversed time direction? Reverse diffusion would be diffusion with a reversed arrow of time.
  3. Could the construction of the sensory mental image involve pairs of "big" (ordinary) SFRs (BSFRs) for which the two BSFRs would occur at MB and the sensory organ? This is the simplest process that one can imagine. Could BSFR induce a sensory input from the sensory organ to the MB or a virtual sensory input from the MB to the sensory organ changing the original diffuse sensory input. Could BSFR pairs gradually produce sensory perception in this way.
  4. SSFRs correspond to the Zeno effect in the sense that their sequence corresponds to the measurement of the same observables at the passive boundary of causal diamond (CD). Disturbances or artificially produced disturbances at the active can change the set of measured observables so that it does not commute with those determining the state at the passive boundary as their eigenstate. This would imply the occurrence of BSFR and the roles of active and passive boundaries would change.

    After the second BSFR the new state at the active boundary would not be the same but could share many c features with the original one because the determinism of the holography would only weakly broken and SSFRs and BSFRs preserve quantum numbers.

  5. The series of SSFRs after BSFR as time-reversed diffusion would correspond to reverse diffusion in the normal time direction. BSFR would occur as a series on the MB, where the sensory input would be guided and gradually lead to a real sensory image with the help of a corrective virtual sensory input.

    At a basic level, the correction mechanism could be analogous to inverse diffusion and the exponent of the K hler effect would be maximally stationary for real sensation.

  6. Also the gradient method could be involved. In the spinglass based model (see this), a series of BSFRs and SSFRs could mean annealing that is steps consisting of cooling as sequence of SSFRs following BSFR followed by BSFR followed by heating for which temperature increase is smaller than than the temperature decrease for the cooling. The system would gradually end up at the bottom of a particular potential well in the fractal energy landscape. A series of SSFRs between two BSFRs would correspond to the annealed healing.
4. What could GPT correspond to in TGD?

4.1 What is GPT?

  1. A linguistic expression is a diffuse representation of a sensation or of thought. The probability distributions for the next word given the preceding words are known. This makes possible a holistic approach to language allowing to build grammatically correct sentences and also achieve the nuances of natural language and recognize context.
  2. In GPT, the goal is to answer a question or respond to an assertion, translate a text from one language to another, produce a piece of text such as a poem or story or just chat with the user.

    GPT must guess the user's intention, what the user wants, and also the context. Is, for example, a field of science in question? The purpose is to add a new word to the given word chain.

  3. The input of the user serves as a prompt initiating the process. The prompt serves as the initial text to which GPT adds words as the most probable words which can follow a given piece of text. GPT starts from a guess for the answer. The choice of the successor word can also be random based on the probabilities of the successor word. Feedback loops are possible and also the user can induce them.
4.2 Is building images fundamentally different from GPT?
  1. In language models, prompts are verbal representations of images, and diffusion is essential in the construction of images, from the prompt as a verbal description of the image. At first glance, diffusion seems to be explicitly involved only in the generation of images, but is this the case?
  2. On the surface, there seems to be an important difference between building an image and building a linguistic expression. The picture is a time = constant snapshot, at least ideally. The sentence has a temporal duration and memory is involved. One must d transform a sentence to a picture. Words correspond to pictures.

    Does the difference disappear when one talks about the process of creating the image? Could it be that the process of creating an image as an analogy of a linguistic process is just not conscious to us. Is the sensory input equivalent to the user's prompt in GPT. Is the difference apparent and only due to the time scale.

  3. Visual perception involves also the sensation of movement. Is it because in reality (according to TGD) it would be a time series but on such a short time scale that we are not conscious of it? Could verbs correspond to dynamics in the structure of the language? Objects have attributes as their properties analogous to pixel parameters.
  4. Holography would describe the dynamics of objects and would classically determine the initial values of holography for the time development as the equivalent of the Bohr orbit. There is quantum holography as a map of quantum states of the biological body to quantum states associated with the magnetic body defining a higher level sensory representation (see this).

    This 1-1 correspondence representations would make it possible for the MB to control the biological body and in the case of running GPT induce BSFRs reversing the arrow of time temporarily and change the course of events.

4.3 Could quantum diffusion play a role in the TGD based description GPT?
  1. Time evolution in the TGD Universe would basically consist of SSFRs and BSFRs. Quantum states would be the quantum superposition of running programs. But does this picture have significance in the case of GPT? Could MB really interfere with the running of the program? The time reversals are not observed by the user, so the question is not easy to answer.

    One killer test would be a dependence on hardware. The bits should be near criticality in order the quantum criticality of MB can control their directions. Spin-glass structure for the bit-scape looks like a natural requirement. Is this possible for all bit realizations and does GPT work differently for different realizations of bits?

  2. Diffusion is analogous to the time evolution determined by the Schroedinger equation as a series of unitary time evolutions, where classical determinism is only weakly broken because SSFRs must commute with passive edge observables. This means a generalization of the Zeno effect. However, quantum states are delocalized. Maybe only below the resolution scale, in which case classical discretization would be exact with this accuracy. Inverse diffusion could be a classical process at the used resolution.
  3. The time development as a series of SSFRs would seem to be analogous to a diffusion as analog of Brownian motion involving finite steps, and BSFR would start as a time-reversed diffusion of reverse diffusion.

    The BSFR could be induced by an external disturbance or a controlled disturbance from the MB. MB and ZEO could come to the rescue and do them with time reversal without us noticing anything.

This picture raises questions.
  1. Could diffusion as a series of SSFRs be equivalent to the construction of the response of chatGPT, which is also a probabilistic process. Could the sentence represent the trajectory of a diffusing word/particle in word space and Bohr orbit in WCW? The Bohr orbit property, i.e. holography, would imply that the failure of determinism is weak. In a given scale, non-determinism would be located in the 3-D frames determined by the 4-D soap film.
  2. Could the initial state, e.g. a question or statement induced by the user prompt, for example a question presented as a quantum state on the passive edge of the CD, serve as the first rough guess for an answer as analog of sensory input.

    Could the time progression as SSFRs correspond to a generation of a sequence of words as a response to the prompt? Or are the words separate by BSFR pairs.

    What is new as compared to the AI would be that trial and error process by performing BSFRs inducing return back in time is possible. These periods with a reversed arrow of time would be invisible for the user. This error correction mechanism is not coded as a program as in AI but would be done by Nature and it would be essential also in the TGD view of quantum computation.

  3. The hidden layers of the neural network are analogous with the fact that the perceived sensory image is constructed by communications between the sensory organ and the MB, which are not conscious to us.
See the article Could neuronal system and even GTP give rise to a computer with a variable arrow of time? or the chapter with the same title.

For a summary of earlier postings see Latest progress in TGD.

No comments: