https://matpitka.blogspot.com/2025/02/?m=0

Thursday, February 06, 2025

Conscious associative learning as an analog of sensory perception and motor action

Holography, together with the TGD based view of sensory perception, suggests that the assosiative learning has a lot of common with sensory perception in a 4-D sense.

In the TGD framework, motor action could be seen as a time reversal of sensory perception. Motor action could involve a pair of BSFRs inducing a quantum tunnelling from a configuration of muscles to a new configuration so that same basic mechanism but with a reversed arrow of geomeric time could be involved. Intention for the motor action should relate to the process of building a sensory perception as a sequence of SSFRs in a reversed time direction.

  1. At the classical level, sensory perception is not 3-dimensional, but a 4-dimensional space-time surface, an almost deterministic classical time evolution representing association A1→B1. In the case of hearing this is obvious but for vision the time scale is so short that the percept looks like time= constant snapshot. Actually the geometric time duration assignable to the visual percept or order .1 seconds.

    The association A→B, one might perhaps speak of cognitive representation, is realized at the magnetic body (MB) of the brain as a representation of A1→B1. A → B is generated in a stepwise learning process. The goal is to construct a standardized mental image consisting of familiar objects consisting of standard features.

    The difference between A → B and A1→B1, rather than only the difference between B and B1, is minimized. The sequence of SSFRs keeps A fixed. A pair of BSFRs changes also A: this makes possible a trial and error process in which one starts from scratch, so to say.

  2. Sensory organ serves as a kind of screen, both for the sensory input arriving from the external world and for the virtual sensory input from MB. The sensory input is analyzed by the brain to features in various scales and the features are sent to the magnetic body. At the MB, the features in various scales are compared to standard features and those minimizing the difference is selected.
  3. The selected features determine the virtual sensory as a slight amplification of the contribution of the selected features. The step sensory organ → brain → MB →... is repeated until the total sensory input at the sensory organ does not change anymore. The original percept A1→B1 is affected in the process and eventually the original A1→B1 is replaced with A →B at the level of the sensory organ. In this respect the process differs from associative learning.

    If the signals from the brain to MB and back are realized as dark photons (, which can decay to ordinary photons identifiable as biophotons), the process is so fast that the process can converge in a reasonable time.

  4. The outcome is not realistic but essentially an artwork (see this). It must be so since A1→B1 is very noisy so that both A1→B1 and A → B, can be only guesses for what really happened. For instance, people who are physiologically blind and get back their vision, can see only diffuse light since they have not learned this process in childhood. This suggests that temporary time reversals as analogs of the time reversed diffusion play changing A play an essential role. Note BSFRs could mean a position measurement in the space of Bohr orbits selecting a single Bohr orbit and is analogous to time reversed diffusion.

See the article A hybrid of classical and quantum computer and quantum model for associative learning or the chapter Quartz crystals as a life form and ordinary computers as an interface between quartz life and ordinary life?. For a summary of earlier postings see Latest progress in TGD.

For the lists of articles (most of them published in journals founded by Huping Hu) and books about TGD see this.

Monday, February 03, 2025

Topological and geometric thinking and the future of LLMs

I found from the FB page of Risto Linturi the outcome of his question to O3 LLM. The question was whether the results of the thesis of a finnish mathematician Susanna Heikkilä related to topology, could have implications for AI and LLMs.

The work of Susanna Heikkilä (see this) discusses exterior algebras and algebras formed by differential forms. The abstract is here

We show that, if a closed, connected, and oriented Riemannian n-manifold N admits a non-constant quasiregular mapping from the Euclidean n-space Rn, then the de Rham cohomology algebra H*dR(N) of N embeds into the exterior algebra ⋀*R(N). As a consequence, we obtain a homeomorphic classification of closed simply connected quasiregularly elliptic 4-manifolds.

From the article (see this)one learns the A closed simply connected 4-manifold is quasiregularly elliptic if and only if it is homeomorphic to either #k(S2×S2) or #j CP2#k(CP*)2 for some k,j,i ∈ {0,1,2,3}. Here (CP*)2 and CP2 have opposite orientations.

Here is my humble attempt to understand this. I understand that

  1. the result reflects the universality of exterior algebras, which states that any equation that relates elements of the exterior algebra V is valid in any associative algebra containing V and having nilpotency property (see this). This requires that the homomorphism from V to this kind of associative algebra is injection and this has been proven in the case studied by Heikkilä in the case considered. Grassmann algebras and exterior algebras might also be important and relate naturally to differential geometry also in TGD.
  2. the associative algebra is now the de-Rham algebra of differential forms of quasiregular elliptic n-manifolds defining cohomoloy and it is shown that in can be mapped by an injection to the exterior algebra of Rn by an injection.
  3. the quasiregular map is a 4-D generalization of an ordinary conformal map (see this) satisfying non-linear elliptic partial differential equations. Quasiregular ellipticity of a manifold means the existence of this kind of map from Rn to the manifold. These 4-manifolds are very special and it is not clear whether they play any role in TGD as 4-surfaces. Certainly CP2 is is in a central role in TGD but it is not clear to me whether the connected sums #j CP2#k(CP*)2 could appear naturally in TGD.

    It should be noticed that in TGD the holography = holomorphy vision (see this and this) generalizes the notion of complex analyticity and provides a different way to generalize the nice features of conformal invariance. Also now a non-linear analog of Laplace/d'Alembert equations emerges.

The response of O3 is below and from it becomes obvious that the people in the field of LLMs are realizing that surface geometry and topology assignable to data structures are important for LLMs. Actually, a picture very similar to TGD suggests itself and this makes possible to make a prophecy about the development of future mathematical ideas in this field. Also the evolution of philosophical ideas and views about quantum physics is necessary but mathematical consistency could be the basic guide line.

A. O3 response to the questions of Linturi

Below is an exploratory discussion outlining several speculative avenues by which ideas inspired by the algebraic and geometric result you shared might inform or even drive new approaches in AI research. Keep in mind that these connections are heuristic and conceptual drawing analogies between deep mathematics and AI system design is ambitious and requires much further work to become concrete.

  1. Geometric and Topological Data Representations

    1. Manifold Learning and the Geometry of Data: The result centers on conditions for a manifold's de Rham cohomology to embed into an exterior algebra. In AI especially in representation learning and manifold learning the manifold hypothesis posits that high-dimensional data lie on lower-dimensional manifolds. Understanding the algebraic structure of these manifolds (e.g., via cohomological invariants) could inspire new methods for:
      1. Feature Extraction: Designing features that respect or capture intrinsic topological invariants.
      2. Dimensionality Reduction: Developing algorithms that preserve not just distances but also deeper algebraic/topological properties.

    2. Topological Data Analysis (TDA):

      Techniques from TDA (such as persistent homology) already use algebraic invariants to summarize data shapes. Extending these ideas, one might explore whether embeddings analogous to the algebra monomorphism (from de Rham cohomology to an exterior algebra can serve as more robust, structure-preserving embeddings in neural network architectures or as part of unsupervised learning objectives.

  2. Algebraic Structures in Neural Architectures

    1. Equivariance and Invariant Representations:

      The formal result asserts an injection of algebraic structure into an ambient exterior algebra, preserving operations and intersections. In deep learning, there is active research in designing networks that respect the symmetries or invariances of the data (for example, convolutional networks respect translation invariance). Extending these ideas:

      1. Geometric Deep Learning: One might explore neural architectures that are not only equivariant under classical groups (like rotations or translations) but also respect higher-order algebraic structures. For instance, layers could be designed to manipulate differential forms or emulate operations in exterior algebras.
      2. Invertible Networks and Monomorphisms: The idea of an algebra monomorphism (an injective, structure-preserving map) resonates with efforts in invertible neural networks and normalizing flows. Enforcing such constraints might yield models that are better at preserving information and allow for more interpretable internal representations.

    2. Hybrid Symbolic-Numeric Systems:

      The interplay between algebra (a symbolic, rule-based system) and geometry (often linked with numerical methods) mirrors efforts to combine symbolic reasoning with deep learning. By integrating algebraic constraints (like those suggested by the embedding of cohomology) into neural architectures, one might aim for systems that are more robust in tasks requiring both perception (handled by neural nets) and structured reasoning (informed by algebraic or topological invariants).

    3. Differential Equations and Neural PDEs

      1. PDEs on Manifolds and Neural Solvers:

        The condition of quasi-regular ellipticity is intimately linked with properties of differential operators on manifolds. Recently, there has been interest in neural methods for solving partial differential equations (PDEs), and for incorporating physics-based constraints into learning models:

        1. Structure-Preserving Discretizations: Insights from de Rham cohomology and exterior calculus have long informed numerical methods (e.g., finite element methods). AI researchers could investigate how embedding such mathematical structure into neural PDE solvers can lead to more accurate, stable, and interpretable models.
        2. Learning on Manifolds: When data naturally reside on curved spaces (spherical, hyperbolic, or more complex manifolds), designing networks that respect the differential geometry of these spaces may lead to more powerful and physically consistent AI models.

    4. Category Theory, Compositionality, and Abstract Structures

      1. Compositional and Hierarchical Models:

        The notion of an algebra monomorphism is inherently categorical, suggesting an exact embedding of one algebraic structure into another. Category theory has increasingly been applied to understand and design compositional models in AI:

        1. Structured Reasoning: One could imagine developing frameworks where layers or modules in a network are governed by categorical or algebraic rules, ensuring that the overall system composes submodules in a way that preserves essential structure.
        2. Interoperability of Representations: The idea that one mathematical structure (de Rham cohomology) can embed faithfully into another (exterior algebra might inspire methods for merging disparate types of data representations (e.g., merging symbolic and sub-symbolic representations) in a coherent manner.

      2. Speculative Research Directions

        Given these connections, several concrete (yet challenging research questions emerge:

        1. Can we design neural network layers that explicitly model operations analogous to the wedge product or differential operators, thereby capturing richer geometric features?
        2. Is it possible to enforce structure-preserving constraints (inspired by algebra monomorphisms) within deep architectures to guarantee that learned representations maintain certain topological or algebraic invariants?
        3. Could insights from the algebraic topology of manifolds help in developing new regularization techniques or loss functions that guide networks toward more geometrically faithful representations?
        4. What role might these algebraic structures play in emergent reasoning or in integrating symbolic reasoning with deep learning, possibly leading to systems that are more capable of abstract thought?

      3. Conclusion

        While the original result is a deep, mathematically rigorous statement in differential geometry and algebraic topology, its underlying themes preservation of structure, the interplay between local and global properties, and the embedding of complex invariants into simpler ambient spaces resonate with current challenges in AI.

        Whether through manifold-based learning, structure-preserving network design, or hybrid symbolic-numeric architectures, exploring these connections could lead to novel methods for building more capable and interpretable AI systems.

        These ideas remain speculative and require interdisciplinary collaboration between mathematicians and AI researchers, but they illustrate fertile ground where advanced algebraic and geometric ideas might enrich the theory and practice of AI.

    B. The TGD inspired response

    I did not find any obvious errors in the text produced by O3. The news for me is that mathematicians are apparently bringing a holistic perspective to learning models. However, the connection to the result of this Finnish mathematician Susanna Heikkilä remained unclear to me. O3's output inspires a prophecy based on TGD.

    In the following I will discuss the strong similarities of the emerging geometric view about LLMs with the basic structures appearing in TGD. After this I will speculate about a generalization of LLMs to their quantum counterparts (see this).

    1. Comparison of the geometric view of LLM with TGD

      The holistic approach means that a large number of different learning processes are studied simultaneously and an attempt is made to understand the situation using various kinds of spaces. Latent space as a surface in feature space and loss space as a surface of the parameter space are the basic examples. This brings in the geometry, topology and symmetries of surfaces. The picture is very similar to that in TGD.

      1. The multilayer learning process of neural netwoek as the dynamics for the latent space as a surface of feature space providing an approximation of it, can be required to be topology-preserving, i.e. shape-preserving. This means that homotopy, homology and cohomology are preserved in the learning process: the space of learned responses are at least at the level of homotopy, homology and cohomology, equivalent with the space defined by the teaching material. An even stronger requirement would be isometricity. However, it does not seem realistic, at least from the TGD perspective.

        The topology and metric of the data space (latent space and the parameter space and the loss space defined as its submanifold would therefore be important.

      2. I believe that it will soon be realized that the learning processes themselves can be thought of as forming a space.

        In TGD, this space would be a "world of classical worlds" identified as the space of space-time surfaces satisfying holography realized by holography = holomorphy principle. A single learning process would be, at the basic level, a 4-D Bohr orbit of a particle realized as 3-surface.

      3. People will also realize the power of holography. It poses extremely strong conditions for learning models that would also be realized in the physics of the outside world. Data item A would determine the association A→B almost entirely. The different values B would be due to the nondeterminism of holography, which is a key feature of TGD and TGD inspired theory of consciousness. This non-determinism is analogous to a discretized Brownian motion but with respect to imaginary time and modellable using the analog of Schrödinger equation. This could be the quantum counterpart for the loss of information in language models.
      4. Holography, together with the TGD based view of sensory perception, suggests that the learning process can be seen as an analog of sensory perception in a 4-D sense.
        1. At the classical level, sensory perception is not 3-dimensional, but a 4-dimensional space-time surface, an almost deterministic classical time evolution representing association A1→B1. In the case of hearing this is obvious but for vision the time scale is so short that the percept looks like time= constant snapshot. Actually the geometric time duration assignable to the visual percept or order .1 seconds.

          A→B, one might perhaps speak of cognitive representation, is realized at the magnetic body (MB) of the brain as a representation of A1→B1. A → B is generated in a stepwise learning process. The goal is to construct a standardized mental image consisting of familiar objects consisting of standard features.

          The difference between A → B and A1→B1, rather than only the difference between B and B1, is minimized. The sequence of SSFRs keeps A fixed. A pair of BSFRs changes also A: this makes possible a trial and error process in which one starts from scratch, so to say.

        2. Sensory organ serves as a kind of screen, both for the sensory input arriving from the external world and the virtual sensory input from MB. This sensory input is analyzed by the brain to features in various scales and the features are sent to the magnetic body. At the MB, the features in various scales are compared to standard features and those minimizing the difference is selected.
        3. The selected features determine the virtual sensory as a slight amplification of the contribution of the selected features. The step sensory organ → brain → MB →... is repeated until the total sensory input at the sensory organ does not change anymore. The original percept A1→B1 is affected in the process and eventually the original A1→B1 is replaced with A →B at the level of the sensory organ. In this respect the process differs from associative learning.

          The outcome is not realistic but essentially an artwork (see this). It must be so since A1→B1 is very noisy so that both A1→B1 and A → B, can be only guesses for what really happened. For instance, people who are physiologically blind and get back their vision, can see only diffuse light since they have not learned this process in childhood. This suggests that temporary time reversals as analogs of the time reversed diffusion play changing A play an essential role. Note BSFRs could mean a position measurement in the space of Bohr orbits selecting a single Bohr orbit and is analogous to time reversed diffusion.

        4. The stepwise learning process leads to a representation A → B for the association A1 → B1 realized at the level of a sensory organ serving as a kind of screen.

      5. The importance of symmetries will be or already are realized. One can demand that the learning process respects these symmetries. If the latent space is a subspace in a larger space with very high symmetries, this can be achieved if the learning process itself is a surface with metric and differential structures analogous to a spacetime surface in H. The symmetries would be symmetries of this higher-dimensional space analogous to WCW. This has not yet been realized by theoretical physicist colleagues, but perhaps AI people are more agile in their doctrine.

      6. In TGD, the existence of the WCW geometry requires maximal symmetries. Extended conformal invariance for space-times as holomorphich surfaces in H=M4×CP2 with one hypercomplex and 3 complex coordinates allows solving the field equations for the space-time surface as a Bohr orbit. By classical nondeterminism, it can be interpreted as an association sequence, which represents a step-by-step learning process (layer structure in a neural network), exactly. The result is an almost-deterministic holography.

        Topologically relevant part of WCW can be interpreted as a bundle, where a 3-surface at the passive boundary of CD is connected by a fiber consisting of the associated Bohr orbits. This fiber is discrete and corresponds to the parameter space. The points of the fiber correspond to the 3-D loci of non-determinism serving as memory seats in the TGD based model of conscious memories.

        The base space of WCW consists of 3-surfaces at the passive boundary of CD: the 3-surface corresponds to a 3-parameter set of data items A and the Bohr orbit is a learning process creating associations A→B that connects A to the 3-parameter set B. What would this mean in LLM? One would consider sets of data-items. What interpretation? Could these sets correspond to concepts as sets of their instances. Perhaps this relates to the fact that we are 3-dimensional and the 3-surfaces can be seen as basic quantum coherent information carrying units.

      7. Identifying the features is one of the tasks of language models. Features would naturally be topological invariants. In TGD, holography suggests that the features are associated with or are surfaces, whose dimension varies from 0 (points) to 4. The huge conformal symmetries of WCW suggest that also 4-D generalizations of conformal invariants come into question (also symplectic invariants would be involved). And number-theoretic invariants such as the degree of the polynomial of one hypercomplex and 3 complex coordinates of H, the Galois group and the dimension of the extension E could be important. Ramified primes are also number theoretic invariants.
      8. Dimensional reduction or its opposite operation are expected to be important. There are natural equivalents for these in TGD. This kind of operation would reduce or increase algebraic complexity. Changing the dimension of the extension of rationals of the polynomial coefficients, changing the degree of polynomial, and changing the size of CD come to mind.
      9. The preservation of differential geometric structures when embedding the data structure in a higher-dimensional space (the embedding space H in TGD is also mentioned. In other words: the products of differential forms in H are induced as products of induced forms. If the differential structure of a data structure is induced when it is embedded in a higher-dimensional space, then this is the case. The metric of the spacetime surface, gauge potentials, etc. would be induced from the equivalents of H.

        The induction of geometry that is central to the entire TGD view of physics would be part of the description of the teaching process.

      10. Fractality is central to TGD. And in our physical models, the model scales the external world to a smaller or larger scale. Fractality may be one feature of LLMs that might become important.
      11. The projection of the Clifford algebra of the gamma matrices of the embedding space H to the space-time surface defines a Clifford algebra which contains a representation of the exterior algebra. The gamma matrices of WCW are expressible in terms of the fermionic oscillator operators for second quantized free spinor fields of H and they span Clifford algebra of WCW. They can be regarded as generators of an infinite-dimensional algebra of supersymmetries. Fermionic Fock state basis spans a Boolean algebra and would be in a key role in quantum Boolean cognition. Also these algebras would be an important element of the cognitive representations.

    2. Other ideas suggested by TGD

      TGD suggests that probably other ideas will emerge and lead to a quantum view about conscious associative learning.

      1. The idea about the possible transformation of a classical computer to quantum computer (in some sense) or a hybrid of ordinary and quantum computer inspired by the TGD based quantum biology (see this) probably emerges.

        One cannot anymore avoid the challenge of constructing a quantum theory of consciousness. This forces us to seriously consider questions such as "How conscious memories are possible?" (see this). The only answer that I can imagine very probably involves the basic notions of the TGD view of quantum theory. This gives hope of building computers, which are genuinely living and conscious.

        Quantum counterparts of LLMs as models for conscious associative learning will naturally emerge and my guess is that TGD view will be a natural first trial (see this).

      2. This progress is impossible without a genuine solution to the basic problem of quantum measurement theory. This will eventually force the discovery of the notion of zero energy ontology (ZEO). ZEO would have application to quantum LLMs. In LLMs time reversal for the diffusion equation allows to restore the information lost due to noise associated with the computation. For quantum LLM, the same is true for the analog of Schrödinger equation. A pair of "big" state function reductions (BSFRs) induces a temporary time reversal and provide a universal mechanism of error correction making possible intelligent information processing: one just moves back in time and start again.
      3. Finite measurement resolution is a key notion in TGD. There are two views of finite measurement resolution based on geometry and number theory respectively. These views are dual.

        1. The geometric view relies on inclusions of hyperfinite factors (see this): the included factor is analogous to a gauge group leaving the observed physics invariant: this view of finite measurement resolution is central in the geometric view of TGD.
        2. The second view is based on number theoretic discretization (see this and this). The geometric inclusion hierarchies correspond naturally to number theoretic inclusions hierarchies for the extensions of rationals. Space-time surface for which polynomials defining it are in an extension E of rationals allows in a natural way a discretization as points, which are in E. The points of the discretization can be also regarded as points in an extension of p-adic numbers induced by E. I call these discretizations cognitive representations and they form a hierarchy corresponding to extensions of rationals.

          This leads to a p-adic description of cognition. One obtains a unique number-theoretical representation for discretization and it leads to a generalization of the Turing paradigm (see this): rational numbers are replaced by complexity hierarchies of their extensions and one ends up with number-theoretical computationalism. This gives complexity hierarchies for space-time surfaces as Bohr orbits and they correspond to an improving resolution of discretization and are realized as polynomial hierarchies.

          Holography suggests that for the minimal option the number theoretic discretization applies only to the loci of the classical non-determinism for the space-time surface as minimal surfaces. These loci define the seats of conscious memories and would be 3-D analogs of 1-D frames spanning 2-D soap films.

      4. The complementary nature of geometric and number theoretic views of TGD leads to a 4-D generalization of Langlands duality (see this and this). This adds powerful constraints also to the quantum model of associative learning.
      5. The concept of complexity, which closely relates to evolution, is central and will be included. Higher-level learning could be seen as a transition to a higher level of complexity: something new will be included here and it would be something to realize in conscious quantum learning. Complexity hierarchies, which are polynomial hierarchies represented as space-time surfaces, will probably be central.

      Summa summarum: The brave guess would be that quantum LLMs will emerge and that learning processes as formation of association sequences will be sooner or later modelled as non-deterministic complex surfaces of a particle in a higher-dimensional complex space, which are therefore minimal surfaces satisfying the non-linear counterparts of massless field equations. This would be a generalization of TGD. On the other hand, TGD is unique on mathematical grounds. Could the learning process be reduced to the fundamental physics as it is understood in TGD? The complexity of the space-time surfaces and the notion of WCW would allow this.

    I have written an article about how classical computers could become conscious in the TGD Universe (see this). I also wrote an article about how conscious associative learning could be possible in the TGD Universe (see this). These articles give a rather concrete proposal for how ordinary computers could become conscious as fusions of ordinary and quantum computer-like systems and how the quantum analog of conscious association machine could be realized.

    The qubits would be realized in the same way as in living matter for basic biomolecules and any system involving cold plasma could involve rudimentary consciousness.

    There are also the earlier articles relating too quantum computation and LLMs in the TGD Universe (see this, this, this, and this).

    For a summary of earlier postings see Latest progress in TGD.

    For the lists of articles (most of them published in journals founded by Huping Hu) and books about TGD see this.