Few-Shot Learning and Structured Output

Few-shot prompting gives a model examples of the task before asking it to complete a new case. Instead of only describing what you want, you show the model the pattern.

This is especially useful when the task has a custom format, subtle categories, domain-specific labels, or edge cases that are difficult to explain in abstract instructions.

For example, suppose you want to extract the topic and sentiment from product reviews. A few-shot prompt might look like this:

Extract the topic and sentiment from each review.
Sentiment must be Positive, Neutral, or Negative.

Review: "The battery lasts all day, but the screen scratches easily."
Topic: battery life, screen durability
Sentiment: Neutral

Review: "Customer support solved my issue in five minutes."
Topic: customer support
Sentiment: Positive

Review: "The app crashes every time I upload a file."
Topic: app stability, file upload
Sentiment: Negative

Review: "Setup took a while, but now it works as expected."
Topic:
Sentiment:

Structured output means asking the model to return data in a predictable format, such as JSON, XML, Markdown tables, or tool-call arguments. In software applications, structured output is often more valuable than prose because it can be parsed, validated, stored, and passed to other systems.

Common structured output techniques include:

Explicit format instructions
JSON mode
Function calling or tool use
Schema validation
Output parsers such as Pydantic models

Choosing good few-shot examples

Few-shot examples are training data inside the prompt. Their quality matters.

Good examples should be:

Representative — they match the real inputs the application will receive.
Diverse — they cover different categories, lengths, tones, and formats.
Edge-aware — they include ambiguous or difficult cases.
Consistent — labels and formatting are applied the same way every time.
Minimal — they teach the pattern without wasting tokens.

For classification tasks, include at least one example for each label. If the model must distinguish between Neutral and Mixed, include examples that show the difference. If the model must extract missing values as null, include an example where a field is absent.

Ordering can also matter. Models often pay strong attention to recent examples. Put the most important or most similar examples near the end, but avoid making the final example so dominant that the model copies it too closely.

Label bias and how to avoid it

Label bias occurs when the examples or prompt wording make the model prefer one label too often. For example, if your few-shot prompt contains five Positive examples and one Negative example, the model may over-predict Positive.

Bias can also come from label wording. A label like Critical may sound more urgent than High, even if your business rules define them differently.

To reduce label bias:

- Balance examples across labels.
- Use clear label definitions.
- Include borderline cases.
- Randomize or test example order.
- Evaluate with a labeled test set.
- Avoid emotionally loaded labels unless intended.

For production classification, prompts alone are not enough. You should measure accuracy, confusion patterns, and failure cases.

Function calling for structured extraction

Function calling is often more robust than asking the model to write JSON manually. You define a tool schema, and the model returns structured arguments.

Example: extract a contact record from unstructured text.

{
  "model": "example-llm",
  "temperature": 0.1,
  "messages": [
    {
      "role": "system",
      "content": "Extract contact information from user-provided text. Use null for missing fields. Do not invent information."
    },
    {
      "role": "user",
      "content": "Maria Chen from Northstar Labs can be reached at maria.chen@example.com. She is interested in an enterprise demo next week."
    }
  ],
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "create_contact_record",
        "description": "Create a structured contact record from unstructured text.",
        "parameters": {
          "type": "object",
          "properties": {
            "full_name": {
              "type": "string",
              "description": "The person's full name."
            },
            "company": {
              "type": ["string", "null"],
              "description": "The company or organization, if provided."
            },
            "email": {
              "type": ["string", "null"],
              "description": "The email address, if provided."
            },
            "interest": {
              "type": ["string", "null"],
              "description": "The person's stated interest or reason for contact."
            },
            "follow_up_timeframe": {
              "type": ["string", "null"],
              "description": "Any mentioned follow-up timing."
            }
          },
          "required": ["full_name", "company", "email", "interest", "follow_up_timeframe"]
        }
      }
    }
  ]
}

A structured tool call might return:

{
  "full_name": "Maria Chen",
  "company": "Northstar Labs",
  "email": "maria.chen@example.com",
  "interest": "enterprise demo",
  "follow_up_timeframe": "next week"
}

The application can validate this result before saving it.

Pydantic-based output parsing

Frameworks such as LangChain and LlamaIndex often use Pydantic models to define expected output schemas in Python. Pydantic provides validation, type checking, and helpful error messages.

Example:

from pydantic import BaseModel, Field
from typing import Optional

class ContactRecord(BaseModel):
    full_name: str = Field(description="The person's full name")
    company: Optional[str] = Field(default=None, description="Company if provided")
    email: Optional[str] = Field(default=None, description="Email if provided")
    interest: Optional[str] = Field(default=None, description="Stated interest")
    follow_up_timeframe: Optional[str] = Field(default=None, description="Mentioned follow-up timing")

A framework can use this schema to generate format instructions, parse the model output, and raise validation errors if required fields are missing or types are wrong.

The key advantage is that the schema lives in code. This makes structured prompting easier to maintain as your application evolves.

Format robustness

Models sometimes ignore format instructions, especially when prompts are long, conflicting, or overly complex. If you ask for JSON but also ask for an explanation, the model may include prose around the JSON.

Weak prompt:

Extract the fields and explain your reasoning. Return JSON.

Better prompt:

Return only valid JSON matching this schema.
Do not include Markdown, comments, or explanatory text.
Use null for missing fields.

If format reliability matters, add layers of defense:

1. Use JSON mode or function calling when available.
2. Validate output with a schema.
3. Retry with the validation error included.
4. Keep the prompt short and non-conflicting.
5. Avoid asking for reasoning inside machine-readable output.
6. Log failures and add test cases.

A repair prompt can be useful:

The previous output was invalid JSON because: missing comma after "email".
Return a corrected JSON object only. Do not add explanation.

However, repeated repair loops can increase cost and latency. Prefer stronger constraints up front.

JSON mode versus function calling versus explicit instructions

Explicit format instructions are the simplest approach. They work well for prototypes and low-risk tasks.

Return the result as JSON with keys: title, summary, tags.

Advantages: easy, flexible, no special API features required. Disadvantages: weaker guarantees, more parsing failures.

JSON mode constrains the model to produce valid JSON. This improves parseability, but you still need to ensure the JSON matches your expected schema.

Advantages: better JSON validity, useful for structured responses. Disadvantages: may not enforce business-level schema perfectly.

Function calling defines a schema for tool arguments. It is often the best choice when the output will trigger code, update records, or enter a workflow.

Advantages: strong structure, clear contract, easier validation. Disadvantages: requires tool/schema setup and orchestration.

Decision guide

Use explicit format instructions when:

- You are prototyping.
- The output is read by humans.
- Minor formatting variation is acceptable.

Use JSON mode when:

- You need machine-readable JSON.
- The output is structured but not an external action.
- You can validate and retry if needed.

Use function calling when:

- The model output will trigger application logic.
- You need a clear schema contract.
- You want to separate natural language responses from structured actions.

Use Pydantic or schema validation when:

- Your application depends on typed fields.
- You need maintainable parsing logic.
- You want validation errors that can drive retries or debugging.

Final practical advice

Few-shot prompting teaches patterns. Structured output makes model responses usable by software. Together, they turn language models from conversational assistants into components of reliable applications.

For serious systems, do not rely on the prompt alone. Combine clear instructions, representative examples, low sampling randomness, schema constraints, validation, retries, and test cases. The goal is not to make the model perfect. The goal is to build a system that detects and handles imperfection gracefully.

Learning objectives