Tools and Function Calling

Agents need tools because language models alone can only generate outputs from their internal parameters and current context. They do not automatically know the latest state of the world, the contents of a private database, the files in a repository, or the result of running a command.

A tool is an external capability exposed to the agent. It might search the web, query a database, send an email, retrieve a calendar event, run code, create a ticket, or update a document. Function calling is the mechanism that allows a model to request one of these capabilities using structured arguments.

A typical function-calling lifecycle has five steps:

The developer defines available tools and their schemas.
The user gives the agent a task.
The model decides whether a tool is needed.
The application executes the tool call and returns the result.
The model uses the result to continue or produce a final answer.

Common tool types include:

Search tools
Database tools
File system tools
Code execution tools
Messaging tools
Calendar tools
Browser tools
Retrieval tools
Domain-specific business APIs

A concrete tool schema

A tool schema tells the model what the tool does and what arguments it accepts. The schema must be clear enough for the model to choose the right tool and provide valid arguments.

Here is a simplified JSON schema for a weather lookup tool:

{
  "name": "get_weather",
  "description": "Get the current weather for a specific city and country.",
  "parameters": {
    "type": "object",
    "properties": {
      "city": {
        "type": "string",
        "description": "The city name, such as Boston or Tokyo."
      },
      "country": {
        "type": "string",
        "description": "The country name or ISO country code."
      },
      "units": {
        "type": "string",
        "enum": ["metric", "imperial"],
        "description": "Temperature units to use in the result."
      }
    },
    "required": ["city", "country"]
  }
}

A schema does two things at once. First, it constrains the shape of the model’s tool request. Second, it communicates intent. The descriptions are not decoration. They strongly influence whether the model selects the tool and how it fills the arguments.

Worked request-response cycle

Suppose the user asks:

What is the weather in Paris right now? Use Celsius.

The model may decide it needs fresh external data. Instead of answering directly, it emits a structured tool call:

{
  "tool_name": "get_weather",
  "arguments": {
    "city": "Paris",
    "country": "France",
    "units": "metric"
  }
}

The application receives this request, executes the actual weather API call, and returns a tool result:

{
  "city": "Paris",
  "country": "France",
  "temperature_c": 18,
  "condition": "Cloudy",
  "humidity_percent": 64,
  "observed_at": "2026-06-04T14:10:00Z"
}

The model then receives the tool result as additional context and produces the final answer:

The current weather in Paris is cloudy, about 18°C, with humidity around 64%.

Notice that the model did not perform the weather lookup itself. It selected the tool and supplied arguments. The application executed the trusted external operation. This separation is essential. Models decide; software executes.

Tool calls versus ordinary text

Function calling is safer and more reliable than asking the model to write informal instructions such as “call the weather API for Paris.” Structured calls allow the application to validate arguments before execution.

For example, the application can reject:

{
  "city": 123,
  "country": null,
  "units": "kelvin"
}

The schema says city must be a string, country is required, and units must be either metric or imperial. This gives developers a boundary between probabilistic model behavior and deterministic software validation.

Parallel tool calling

Some tasks require multiple independent tool calls. An agent preparing a travel briefing might need weather, hotel availability, flight prices, and calendar availability. If these calls do not depend on each other, the model or orchestration layer may request them in parallel.

Example:

[
  {
    "tool_name": "get_weather",
    "arguments": { "city": "Seattle", "country": "US", "units": "imperial" }
  },
  {
    "tool_name": "search_hotels",
    "arguments": { "city": "Seattle", "check_in": "2026-07-10", "nights": 2 }
  },
  {
    "tool_name": "check_calendar",
    "arguments": { "date": "2026-07-10" }
  }
]

Parallel calls can reduce latency, but they also introduce coordination problems. The agent must merge results, detect conflicts, and avoid taking irreversible actions too early. Reading data in parallel is usually safer than writing data in parallel.

Tool descriptions shape behavior

Models are sensitive to tool names, descriptions, and parameter descriptions. A vague tool description can produce poor behavior.

Weak description:

search: Search things.

Better description:

search_documents: Search the company knowledge base for policy documents, technical guides, and internal FAQs. Use this when the user asks about company-specific information that may not be in the model's general knowledge.

The better version tells the model when to use the tool and what kind of information it contains. In agent design, tool descriptions are part of the prompt surface. They should be specific, honest, and concise.

It is also important not to overstate tool capabilities. If a tool only searches titles and summaries, do not describe it as reading full documents. If a tool cannot modify records, do not imply that it can.

OpenAI tools versus Anthropic tools

Different AI APIs expose tool use in different ways, but the core pattern is similar: developers define tools, the model requests tool calls, the application executes them, and results are returned to the model.

In OpenAI-style tool calling, tools are commonly described with JSON-schema-like parameter definitions. The model may return one or more tool calls with structured arguments. The developer’s application is responsible for executing those calls and sending results back in the conversation or response flow.

Anthropic’s tool use follows a similar conceptual pattern, with tool definitions and structured tool-use blocks. The message format and naming differ, but the engineering responsibility is the same: define tools clearly, validate inputs, execute safely, and feed results back to the model.

The important lesson is not to memorize one vendor’s syntax. APIs evolve. The durable concept is that tool use creates a contract between a probabilistic model and deterministic software.

Design guidelines for tool use

Good tool design starts with minimality. Give the agent the tools it needs, not every tool available. A support agent may need to look up an order and create a refund request. It may not need permission to delete customer records.

Prefer tools that are:

Narrow in scope
Clearly named
Well validated
Observable in logs
Safe by default
Reversible when possible

For high-impact actions, use confirmation steps. An agent can draft an email before sending it. It can prepare a database update before applying it. It can create a pull request instead of pushing directly to production.

Function calling is one of the foundations of agentic AI because it gives models a controlled way to act. Without tools, an agent can only talk. With tools, it can participate in real workflows. That power is useful only when paired with careful schemas, validation, permissions, logging, and evaluation.

Learning objectives