A neural network trained on massive text corpora to generate and understand natural language, forming the foundation of modern AI assistants.
From: LLM WikiURL: llm-wiki.pages.dev/concepts/large-language-modelCreated: January 15, 2024Updated: December 20, 2024Read time: 2 min
A Large Language Model (LLM) is a type of neural network trained on vast corpora of text to predict the next token in a sequence. Through scale — both in parameters (often hundreds of billions) and training data (trillions of tokens) — these models develop emergent capabilities including reasoning, translation, code generation, and conversation.
It is a truth universally acknowledged, that a single transformer in possession of a good attention mechanism, must be in want of a context window.
— paraphrased(in the spirit of Jane Austen, about LLMs)
How they work
LLMs are built on the Transformer architecture, which uses self-attention to process sequences in parallel. Training proceeds in two broad phases:
Pre-training — the model learns statistical patterns of language by predicting the next token over a large corpus. This is unsupervised and extremely compute-intensive.
Post-training — techniques like RLHF (Reinforcement Learning from Human Feedback) align the model with human preferences, instruction following, and safety.
Key properties
Tokenization — text is split into discrete units (tokens) that the model processes. A token is roughly 0.75 words in English.
Context window — the maximum amount of text the model can attend to at once. Modern models range from 8K to 2M tokens. See Context Window.
Embeddings — the internal vector representations the model uses. See Embedding.
Few-shot learning — at sufficient scale, models can perform new tasks from a handful of examples provided in the prompt.
Applications
LLMs power a wide range of applications:
Chat assistants — conversational interfaces like ChatGPT, Claude, Gemini.
AI Agents — systems that use LLMs as a reasoning engine to take actions in the world.
Code generation — tools like GitHub Copilot and Cursor.
Translation, summarization, content generation — replacing or augmenting traditional NLP pipelines.
Limitations
Despite their capabilities, LLMs have well-known limitations:
Hallucination — confidently generating plausible but incorrect information.
Knowledge cutoff — no awareness of events after training.
Context limitations — forgetting or confusing information in long contexts.
Lack of grounding — no direct connection to truth or the physical world.
Computational cost — training and inference require substantial energy and hardware.
The trajectory
The field has moved rapidly from GPT-2 (1.5B parameters, 2019) through GPT-3 (175B, 2020), ChatGPT (2022), and into a multi-model ecosystem where frontier labs release models with increasing capabilities every few months. The dominant architectural paradigm — the Transformer — has remained constant, though training techniques, post-training methods, and inference strategies have evolved dramatically.
See also
Transformer — the architecture underlying most LLMs