You don't need a PhD to understand LLMs. It is just predicting the next word—but the mechanism is fascinating.
The Attention Mechanism
Before Transformers, models read text left-to-right (RNNs). They forgot the beginning of the sentence by the time they reached the end.
Self-Attention allows the model to look at every word at once. "Bank" in "Bank of the river" is effectively connected to "river", giving it context instantly.
Tokens and Embeddings
models don't read words; they read numbers (tokens). These tokens are mapped to "embeddings"—vectors in multi-dimensional space. Words with similar meanings (King, Queen) are close together in this space.
Inference
The model calculates probability distribution for the next token. "The cat sat on the..." implies "mat" (90%) or "couch" (5%). Temperature controls how much risk the model takes in choosing a lower-probability next token.
