Large Language Model

A neural network trained on massive text corpora to generate and understand natural language, forming the foundation of modern AI assistants.

A Large Language Model (LLM) is a type of neural network trained on vast corpora of text to predict the next token in a sequence. Through scale — both in parameters (often hundreds of billions) and training data (trillions of tokens) — these models develop emergent capabilities including reasoning, translation, code generation, and conversation.

It is a truth universally acknowledged, that a single transformer in possession of a good attention mechanism, must be in want of a context window.

— paraphrased (in the spirit of Jane Austen, about LLMs)

How they work

LLMs are built on the Transformer architecture, which uses self-attention to process sequences in parallel. Training proceeds in two broad phases:

  1. Pre-training — the model learns statistical patterns of language by predicting the next token over a large corpus. This is unsupervised and extremely compute-intensive.
  2. Post-training — techniques like RLHF (Reinforcement Learning from Human Feedback) align the model with human preferences, instruction following, and safety.

Key properties

Applications

LLMs power a wide range of applications:

Limitations

Despite their capabilities, LLMs have well-known limitations:

The trajectory

The field has moved rapidly from GPT-2 (1.5B parameters, 2019) through GPT-3 (175B, 2020), ChatGPT (2022), and into a multi-model ecosystem where frontier labs release models with increasing capabilities every few months. The dominant architectural paradigm — the Transformer — has remained constant, though training techniques, post-training methods, and inference strategies have evolved dramatically.

See also