Large Language Model

/larj lang-gwij mo-del/

A neural network trained on massive text corpora to generate and understand natural language, forming the foundation of modern AI assistants.

The dominant class of AI system in 2026 — transformer-based, token-predicting, instruction-tuned. Underpins every modern AI assistant.

Also known as: LLM, Large Language Models, language model
Print

A Large Language Model (LLM) is a type of neural network trained on vast corpora of text to predict the next token in a sequence. Through scale — both in parameters (often hundreds of billions) and training data (trillions of tokens) — these models develop emergent capabilities including reasoning, translation, code generation, and conversation.

It is a truth universally acknowledged, that a single transformer in possession of a good attention mechanism, must be in want of a context window.

— paraphrased (in the spirit of Jane Austen, about LLMs)

How they work

LLMs are built on the Transformer architecture, which uses self-attention to process sequences in parallel. Training proceeds in two broad phases:

  1. Pre-training — the model learns statistical patterns of language by predicting the next token over a large corpus. This is unsupervised and extremely compute-intensive.
  2. Post-training — techniques like RLHF (Reinforcement Learning from Human Feedback) align the model with human preferences, instruction following, and safety.

Key properties

  • Tokenization — text is split into discrete units (tokens) that the model processes. A token is roughly 0.75 words in English.
  • Context window — the maximum amount of text the model can attend to at once. Modern models range from 8K to 2M tokens. See Context Window.
  • Embeddings — the internal vector representations the model uses. See Embedding.
  • Few-shot learning — at sufficient scale, models can perform new tasks from a handful of examples provided in the prompt.

Applications

LLMs power a wide range of applications:

  • Chat assistants — conversational interfaces like ChatGPT, Claude, Gemini.
  • AI Agents — systems that use LLMs as a reasoning engine to take actions in the world.
  • Retrieval-Augmented Generation — combining LLMs with external knowledge bases.
  • Code generation — tools like GitHub Copilot and Cursor.
  • Translation, summarization, content generation — replacing or augmenting traditional NLP pipelines.

Limitations

Despite their capabilities, LLMs have well-known limitations:

  • Hallucination — confidently generating plausible but incorrect information.
  • Knowledge cutoff — no awareness of events after training.
  • Context limitations — forgetting or confusing information in long contexts.
  • Lack of grounding — no direct connection to truth or the physical world.
  • Computational cost — training and inference require substantial energy and hardware.

The trajectory

The field has moved rapidly from GPT-2 (1.5B parameters, 2019) through GPT-3 (175B, 2020), ChatGPT (2022), and into a multi-model ecosystem where frontier labs release models with increasing capabilities every few months. The dominant architectural paradigm — the Transformer — has remained constant, though training techniques, post-training methods, and inference strategies have evolved dramatically.

See also

Connected to

References

  1. Attention Is All You Need
    Vaswani et al.
    The original Transformer paper.
  2. Language Models are Few-Shot Learners
    Brown et al.
    GPT-3 paper demonstrating emergent capabilities.

Type at least 2 characters to search.

Press to navigate, to open, esc to close.