## Sunday, February 02, 2020

### Why AI pattern recognition can differ dramatically from human pattern recognition?

I heard an interesting lecture "On Adversial Attacks" by Anna-Mari Rusanen in Mind and Mater Workshop Friday 2020. There is a popular article in finnish about the adversial attacks by Anna-Mari Rusanen.
I did not not find in the web any article in english by Anna-Mari Rusanen but one can find an article "Towards Deep Learning Models Resistant to Adversarial Attack" about the topic in english.

The problem of pattern recognition is that using a rather small perturbation of the input the recognized pattern can differ radically from the real pattern. For instance, small modifications of the traffic sign chan change the recognized sign dramatically. This makes possible adversial attacks, which are obviously a safety threat. Consider for instance self-driving cars.

It is clear that pattern recognition does not work like human brain. The origin of the problem is poorly understood. Some-one in the audience noticed, that in pattern recognition the difference between input and the recognized standard pattern is minimized. One has variational principle. This principle can be however selected in several manners. Could the variational principles used in pattern recognition (PR) differ for artificial PR and PR by human brain?

I know next to nothing about pattern recognition in practice but dare to ask whether the variational principle for artificial PR is completely local. One compares the proposal of PR program to a set of standard patterns, say photos about faces, as such. The function considered for black-white picture has value 1 or 0 for given pixel depending on whether the pixel is black or white. The sum over the differences of this function for input and standard picture is minimized.

Human brain however functions also holistically. Should the variational principle be (more) non-local. For instance, could one use Fourier transform for the standard patterns and for the input and do the his local minimization for the difference of functions defined in the space of wave vectors. The function would not be anymore simple since Fourier transform would have continuous values and this would require more memory resources.

Very probably this method is used in the recognition of speech, where frequency space is more natural. Human brain could of course use both methods as suggested by left/right -- reductionistic/holistic dichotomy. In music "right" would correspond to harmony and melody and "left" to rhythm and note durations. Could this principle could apply also to vision, perhaps all perception?

For a summary of earlier postings see Latest progress in TGD.