AI Agent

An LLM-powered system that can perceive its environment, reason about goals, and take actions through tools — going beyond single-prompt generation.

From: LLM Wiki URL: llm-wiki.pages.dev/concepts/ai-agent Created: March 10, 2024 Updated: December 15, 2024 Read time: 3 min

An AI Agent is a system that uses a Large Language Model as a reasoning engine to autonomously take actions in pursuit of a goal. Rather than responding to a single prompt with a single answer, an agent operates in a loop: it observes, thinks, acts, and re-observes.

The agent loop

Most agents follow a pattern like this:

Observe — receive input from the user, the environment, or the output of a previous action.
Reason — the LLM decides what to do next. Often expressed as a “thought” step.
Act — call a tool (search, code execution, file write, API call) or respond to the user.
Loop — feed the action’s result back into observation; continue until the goal is reached or a stopping condition triggers.

This loop is sometimes formalized as the ReAct pattern (Reason + Act) or the function-calling pattern exposed by OpenAI, Anthropic, and others.

Tools: the agent’s interface to the world

A bare LLM can only generate text. To be useful, agents need tools — typed functions the model can choose to call. Common categories:

Information — web search, document retrieval, RAG over a knowledge base.
Computation — code execution (Python via sandbox), calculator, SQL query.
File operations — read, write, edit files in a workspace.
External APIs — email, calendar, GitHub, Slack, custom services.
Browser automation — navigate, click, fill forms, scrape.

The LLM doesn’t execute these — it generates a structured call request, the host system runs it, and returns the result.

Agent architectures

Several patterns have emerged:

Single-agent with tools — one LLM loop, many tools. The simplest and most common.
Multi-agent — multiple specialized agents collaborate, often via a shared message bus. Examples: AutoGen, CrewAI, LangGraph.
Hierarchical — a “manager” agent delegates to worker agents. Used for complex multi-step tasks.
Reactive (no planning) — act on the immediate observation. Simple, fast, brittle.
Plan-and-execute — first generate a full plan, then execute step by step. Slower startup, more reliable.

Memory

Agents have a fundamental memory problem: the context window is finite. Solutions include:

Scratchpad — short notes the agent writes to itself within a session.
Episodic memory — log of past actions and outcomes, retrieved when relevant.
Semantic memory — embeddings of past events, queried like a small RAG system.
External store — files, databases, or vector stores that persist across sessions.

Failure modes

Agents are powerful but fragile:

Hallucinated tool calls — calling a function with arguments that don’t match the schema.
Looping — repeating the same failing action forever without progress.
Goal drift — losing sight of the original objective in a long trajectory.
Overconfidence — taking destructive actions (deleting files, sending emails) without confirmation.
Cost runaway — burning tokens in an infinite loop.

Production systems add guardrails: max steps, max cost, required user confirmation for high-impact actions, retry logic, and human-in-the-loop review.

The current state