Site icon Maverick Studios

Generate structured output from LLMs with Dottxt Outlines in AWS

This post is cowritten with Remi Louf, CEO and technical founder of Dottxt.

Structured output in AI applications refers to AI-generated responses conforming to formats that are predefined, validated, and often strictly entered. This can include the schema for the output, or ways specific fields in the output should be mapped. Structured outputs are essential for applications that require consistency, validation, and seamless integration with downstream systems. For example, banking loan approval systems must generate JSON outputs with strict field validation, healthcare systems need to validate patient data formats and enforce medication dosage constraints, and ecommerce systems require standardized invoice generation for their accounting systems.

This post explores the implementation of .txt’s Outlines framework as a practical approach to implementing structured outputs using AWS Marketplace in Amazon SageMaker.

Structured output: Use cases and business value

Structured outputs elevate generative AI from ad hoc text generation to dependable business infrastructure, enabling precise data exchange, automated decisioning, and end-to-end workflows across high‑stakes, integration-heavy environments. By enforcing schemas and predictable formats, they unlock use-cases where accuracy, traceability, and interoperability are non-negotiable, from financial reporting and healthcare operations to ecommerce logistics and enterprise workflow automation. This section explores where structured outputs create the most value and how they translate directly into reduced errors, lower operational risk, and measurable ROI.

What is structured output?

The category structured output combines multiple types of requirements for how models should produce outputs that follow specific constraints mechanisms. The following are examples of constraint mechanisms.

Critical components that benefit from structured output

In modern applications, AI models are integrated with non-AI types of processing and business systems. These integrations and junction points require consistency, type safety, and machine readability, because parsing ambiguities or format deviations would break workflows. Here are some of the common architectural patterns where we see critical interoperability between LLMs and infrastructure components:

Business applications: Where structured output provides the most value

Across high-stakes, integration-heavy domains, structured outputs transform generative models from flexible text engines into reliable business infrastructure that delivers predictability, auditability, and end‑to‑end automation.

The common thread is operational complexity, integration requirements, and risk sensitivity. Structured outputs transform AI from text generation into reliable business infrastructure where predictability, auditability, and system interoperability drive measurable ROI through reduced errors, faster processing, and seamless automation.

Introducing .txt Outlines on AWS to produce structured outputs

Structured output can be achieved in several ways. Most frameworks will, at the core, focus on validation to identify if the output adheres to the rules and requirements requested. If the output doesn’t conform, the framework will request a new output, and keep iterating as such until the model achieves the requested output structure.

Outlines offers an advanced approach called generation-time validation, meaning that the validation happens as the model is producing tokens, which shifts validation to early in the generation process instead of validating after completion. While not integrated with Amazon Bedrock, understanding Outlines provides insight into cutting-edge structured output techniques that inform hybrid implementation strategies.

Outlines, developed by the .txt team, is a Python library designed to bring deterministic structure and reliability to language model outputs—addressing a key challenge in deploying LLMs for production applications. Unlike traditional free-form generation, developers can use Outlines to enforce strict output formats and constraints during generation, not just after the fact. This approach makes it possible to use LLMs for tasks where accuracy, predictability, and integration with downstream systems are required.

How Outlines works

Outlines enforces constraints through three main mechanisms:

During generation, Outlines follows a precise workflow:

  1. The language model processes the input sequence and produces token logits
  2. The Outlines logits processor sets the probability of illegal tokens to 0%
  3. A token is sampled only from the set of legal tokens according to the defined structure
  4. This process repeats until generation is complete, helping to ensure that the output conforms to the required format

For example, with a pattern like ^d*(.d+)?$for decimal numbers, Outlines converts this into an automaton that only allows valid numeric sequences to be generated. If 748 has been generated, the system knows the only valid next tokens are another digit, a decimal point, or the end of sequence token.

Performance benefits

Enforcing structured output during generation offers significant advantages for reliability and performance in production environments. It helps to increase the validity of the output’s structure and can significantly improve performance:

Benchmark advantages

Here are some of the proven benefits of the Outlines library:

Getting started with Outlines

Outlines can be seamlessly integrated into existing Python workflows:

from pydantic import BaseModel

# Define your data structure
class Patient(BaseModel):
    id: int
    name: str
    diagnosis: str
    age: int

# Load model and create structured generator
model = models.transformers("microsoft/DialoGPT-medium")
generator = generate.json(model, Patient)

# Generate structured output
prompt = "Create a patient record for John Smith, 45, with diabetes"
result = generator(prompt)  # Returns valid Patient instance
print(result.name)  # "John Smith"
print(result.age)   # 45

For more complex schemas:

from enum import Enum

class Status(str, Enum):
    ACTIVE = "active"
    INACTIVE = "inactive"
    PENDING = "pending"

class User(BaseModel):
    username: str
    email: str
    status: Status
    created_at: datetime

# Generator enforces enum values and datetime format
user_generator = generate.json(model, User)

Using .txt’s dotjson in Amazon SageMaker

You can directly deploy .txt’s Amazon SageMaker real-time inference solution for generating structured output by deploying one of .txt’s models such as DeepSeek-R1-Distill-Qwen-32B through AWS Marketplace. The following code assumes that you have already deployed an endpoint in your AWS account.

A Jupyter Notebook that walks through deploying the endpoint end-to-end is available in the product repository.

import json
import boto3
# Set this based on your SageMaker endpoint
endpoint_name = "dotjson-with-DeepSeek-R1-Distill-Qwen-32B"
session = boto3.Session()
structured_data = {
    "patient_id": 12345,
    "first": "John",
    "last": "Adams",
    "appointment_date": "2025-01-27",
    "notes": "Patient presented with a headache and sore throat",
}
payload = {
    "messages": [
        {
            "role": "system",
            "content": "You are a helpful, honest, and concise assistant.",
        },
        {
            "role": "user",
            "content": f"Create a medical record from the following visit data: {structured_data}",
        },
    ],
    "response_format": {
        "type": "json_schema",
        "json_schema": {
            "name": "Medical Record",
            "schema": {
                "properties": {
                    "patient_id": {"title": "Patient Id", "type": "integer"},
                    "date": {"title": "Date", "type": "string", "format": "date-time"},
                    "diagnosis": {"title": "Diagnosis", "type": "string"},
                    "treatment": {"title": "Treatment", "type": "string"},
                },
                "required": ["patient_id", "diagnosis", "treatment"],
                "title": "MedicalRecord",
                "type": "object",
            },
        },
        "max_tokens": 1000,
    },
}
runtime = session.client("sagemaker-runtime")
response = runtime.invoke_endpoint(
    EndpointName=endpoint_name,
    ContentType="application/json",
    Accept="application/json",
    Body=json.dumps(payload).encode(),
)
body = json.loads(response["Body"].read().decode("utf-8"))
# View the structured output produced by the model
msg = body["choices"][0]["message"]
content = msg["content"]
medical_record = json.loads(content)
medical_record

This hybrid approach removes the need for retries compared to validation after completion.

Alternative structured output options on AWS

While Outlines offers generation-time consistency, several other approaches provide structured outputs with different trade-offs:

Alternative 1: LLM-based structured output strategies

When using most modern LLMs, such as Amazon Nova, users can define output schemas directly in prompts, supporting type constraints, enumerations, and structured templates within the AWS environment. The following guide shows different prompting patterns for Amazon Nova.

# Example Nova structured output
import boto3

bedrock = boto3.client('bedrock-runtime')

response = bedrock.invoke_model(
    modelId='amazon.nova-pro-v1:0',
    body=json.dumps({
        "messages": [{"role": "user", "content": "Extract customer info from this text..."}],
        "inferenceConfig": {"maxTokens": 500},
        "toolConfig": {
            "tools": [{
                "toolSpec": {
                    "name": "extract_customer",
                    "inputSchema": {
                        "json": {
                            "type": "object",
                            "properties": {
                                "name": {"type": "string"},
                                "email": {"type": "string"},
                                "phone": {"type": "string"}
                            },
                            "required": ["name", "email"]
                        }
                    }
                }
            }]
        }
    })
)

Alternative 2: Post-generation validation OSS frameworks

Post-generation validation open-source frameworks have emerged as a critical layer in modern generative AI systems, providing structured, repeatable mechanisms to evaluate and govern model outputs before they are consumed by users or downstream applications. By separating generation from validation, these frameworks enable teams to enforce safety, quality, and policy constraints without constantly retraining or fine-tuning underlying models.

LMQL

Language Models Query Language (LMQL) has a SQL-like interface and provides a query language for LLMs, so that developers can specify constraints, type requirements, and value ranges directly in prompts. Particularly effective for multiple-choice and type constraints.

Instructor

Instructor provides retry mechanisms by wrapping LLM outputs with schema validation and automatic retry mechanisms. Tight integration with Pydantic models makes it suitable for scenarios where post-generation validation and correction are acceptable.

import boto3
import instructor
from pydantic import BaseModel
# Create a Bedrock client for runtime interactions
bedrock_client = boto3.client('bedrock-runtime')
# Set up the instructor client with Bedrock runtime
client = instructor.from_bedrock(bedrock_client)
# Define the structured response model
class User(BaseModel):
    name: str
    age: int
# Invoke the Claude Haiku model with the correct message structure
user = client.chat.completions.create(
    modelId="global.anthropic.claude-haiku-4-5-20251001-v1:0",
    messages=[
        {"role": "user", "content": [{"text": "Extract: Jason is 25 years old"}]},
    ],
    response_model=User,
)
print(user)
# Expected output:
# User “name='Jason' age=25”

Guidance

Guidance offers fine-grained template-driven control over output structure and formatting, allowing token-level constraints. Useful for consistent response formatting and conversational flows.

Decision factors and best practices

Selecting the right structured output approach depends on several key factors that directly impact implementation complexity and system performance.

Conclusion

Organizations can use the structured output paradigm in AI to reliably enforce schemas, integrate with a wide range of models and APIs, and balance post-generation validation versus direct generation methods for greater control and consistency. By understanding the trade-offs in performance, integration complexity, and schema enforcement, builders can tailor solutions to their technical and business requirements, facilitating scalable and efficient automation across diverse applications.

To learn more about implementing structured outputs with LLMs on AWS:


About the Authors

Clement Perrot

Clement Perrot is a Senior GenAI Strategist in the GenAI Innovation Center, where he helps early-stage startups build and use AI on the AWS platform. Prior to AWS, Clement was an entrepreneur, whose last two AI and consumer hardware startups were acquired.

Remi Louf

Remi Louf is the CEO and technical founder of Dottxt. Before founding dottxt, Remi was a Senior Research Engineer at Normal Computing, a Research Engineer at Ampersand, an early Research Engineer at Hugging Face, a Research Fellow at Harvard and Chief Science Officer at Vimies. Remi has a Doctorate in Statistical Physics, Masters in the Philosophy of Physics, Undergraduate degree in fundamental Physics and a Eleve Normalien (French research degree) in Quantum Physics.

Max Elfrink

Max Elfrink is an Account Manager on the AWS Startup’s Team, where he helps early-stage startups build, scale and grow their AI + Infrastructure on AWS. Prior to AWS, Max worked in startups for 6 years Supporting early stage startups in CDN, HCLS Tech, and Unicorn Tech-Enabled Freight Forwarder, Flexport.

Exit mobile version