Language as a Universal Learning Machine
LANGUAGE PROCESSING IN HUMANS AND COMPUTERS: Part 4
Machine-learned language models have transformed everyday life: they steer us when we study, drive, manage money. They have the potential to transform our civilization. But they hallucinate. Their realities are virtual. This 4th part of the series on language processing provides a high-level overview of low-level details of how the learning machines work. It turns out that, even after they become capable of recognizing Hallucinations and dreaming safely, as humans tend to be, the learning machines will proceed to form broader systems of false beliefs and self-confirming theories, as humans tend to do.
[I tried to make this text readable for all. Skipping the math underpinnings provided with some claims shouldn't impact the later claims. Even just the pictures at the beginning and at the end are hoped to convey the main message. Suggestions for improvements are welcome :)]
Part 1 was:
Who are chatbots (and what are they to you)? Afterthoughts: Four elephants in a room with chatbots
Part 2 was:
Part 3 was:
Semantics: The Meaning of Language
THIS IS Part 4:
- 2.1. Learning causes and superstitions
- 2.2. General learning framework
- 2.3. Examples: From pigeons to perceptrons
- 3.1. Why learning is possible
- 3.2. Decomposing continuous functions: Kolmogorov-Arnold⁶
- 3.3. Wide learning
- 3.4. Approximating continuous functions: Cybenko et al
- 3.5. Deep learning
- 4.1 Channeling through concepts
- 4.2 Static channel learning: RNN, LSTM…
- 4.3 Dynamic channel learning: Attention, Transformer…
1. Language models, celebrities, and steam engines
Anyone can drive a car. Most people even know what the engine looks like. But when you need to fix it, you need to figure out how it works.
Anyone can chat with a chatbot. Most people know that there is a Large Language Model (LLM) under the hood. There are lots and lots and lots of articles describing what an LLM looks like. Lots of colorful pictures. Complicated meshes of small components, as if both mathematical abstraction and modular programming still wait to be invented. YouTube channels with fresh scoops on LLM celebrities. We get to know their parts and how they are connected, we know their performance, we even see how each of them changes a heat map of inputs to a heat map of outputs. One hotter than the other. But do we understand how they work? Experts say that they do, but they don't seem to be able to explain it even to each other, as they continue to disagree about pretty much everything.
Every child, of course, knows that it can be hard to explain what you just built. Our great civilization built lots of stuff that it couldn't explain. Steam engines have been engineered for nearly 2000 years before scientists explained how they extract work from heat. There aren't many steam engines around anymore, but there are lots of language engines and a whole industry of scientific explanations how they extract sense from references. The leading theory is that Santa Claus descended from the mountain and gave us the transformer architecture carved in a stone tablet.

Transformers changed the world, spawned offspring and competitors. . . Just like steam engines. Which may be a good thing, since steam engines did not exterminate their creators just because the creators didn't understand them.
I wasn't around in the times of steam engines, but I was around in the times of bulky computers, and when the web emerged and everything changed, and when the web giants emerged and changed the web. Throughout that time, AI research seemed like an effort towards the intelligent design of intelligence. It didn't change anything, because intelligence, like life, is an evolutionary process, not a product of intelligent design¹. __ But now some friendly learning machines and chatbot AIs evolved and everything is changing again. Having survived and processed the paradigm shifts of the past, I am trying to figure out the present one. Hence this course and these writings. On one hand, I probably stand no chance to say anything that hasn't been said before. Even after a lot of honest work, I remain a short-sighted non-expert. On the other hand, there are some powerful tools and ideas that evolved in the neighborhood of AI that AI experts don't seem to be aware of. People clump into research communities, focus on the same things, and ignore the same things. Looking over the fences, neighbors sometimes understand neighbors better than they understand themselves. This sometimes leads to trouble. An ongoing temptation. Here is a view over the fence.
2. Evolution of learning
2.1. Learning causes and superstitions
Spiders are primed to build spider webs. Their engineering skills to weave webs are programmed in their genes. They are pretrained builders and even their capability to choose and remember a good place for a web is automated.
Dogs and pigeons are primed to seek food. Their capabilities to learn sources and actions that bring food are automated. In a famous experiment, physiologist Pavlov studied one of the simplest forms of learning, usually called conditioning.

Continuing in the same vein, psychologist Skinner showed that pigeons could even develop a form of superstition, also by trying to learn where the food comes from.

Skinner fed pigeons at completely random times, with no correlation with their behaviors. About 70% of them developed beliefs that they could conjure food. If a pigeon happened to be pecking on the ground, or ruffling feathers just before the food arrived, this would make them engage in this action more frequently, which increased the chance that the food would arrive while they were performing that action. If one of the random associations, say of food and pecking, after a while prevails, then it gets promoted into a ritual dance for food. Each time, the food eventually arrives and confirms that the ritual works.
Humans are primed to seek causes and predict effects. Like pigeons, they associate coinciding events as correlated and develop superstitions, promoting coincidences into causal theories. While pigeons end up pecking empty surfaces to conjure grains, humans build monumental systems of false beliefs, attributing their fortunes and misfortunes, say, to the influence of stars millions of light years away, or to their neighbor's evil eye, or to pretty much anything that can be seen, felt, or counted².
But while our causal beliefs are shared with pigeons, our capabilities to build houses and span bridges are not shared with spiders. Unlike spiders, we are not primed to build but have to learn our engineering skills. We are primed to learn.
2.2. General learning framework
A bird's eye view of the scene of learning looks something like this:

The inputs come from the left. The main characters are:
- a process F, the supervisor in supervised learning (Turing called it a "teacher") processing input data x of type X to produce output classes or parameters y of type Y;
- an a-indexed family of functions