Outlines

Outlines：结构化 JSON/正则表达式/Pydantic 大语言模型生成。

技能元数据


来源	可选 — 通过 `hermes skills install official/mlops/outlines` 安装
路径	`optional-skills/mlops/inference/outlines`
版本	`1.0.0`
作者	Orchestra Research
许可证	MIT
依赖	`outlines`, `transformers`, `vllm`, `pydantic`
平台	linux, macos, windows
标签	`提示工程`, `Outlines`, `结构化生成`, `JSON Schema`, `Pydantic`, `本地模型`, `基于语法的生成`, `vLLM`, `Transformers`, `类型安全`

---
title: Outlines：结构化文本生成
description: 了解如何使用 Outlines 库在推理过程中保证输出结构（如有效 JSON、XML、代码），并获得零开销结构化生成的优势。本指南涵盖从快速入门到高级模式（如 Pydantic 集成和批处理）的全部内容。
slug: outlines-structured-text-generation
---

:::info
以下是当此技能被触发时，Hermes 加载的完整技能定义。这是技能激活时智能体看到的指令。
:::

# Outlines：结构化文本生成

## 何时使用此技能

在以下情况使用 Outlines：
- 需要在生成过程中**保证有效的 JSON/XML/代码**结构
- 需要**使用 Pydantic 模型**来获得类型安全的输出
- 需要**支持本地模型**（Transformers、llama.cpp、vLLM）
- 希望**以零开销结构化生成来最大化推理速度**
- 需要**根据 JSON schema 自动生成**
- 需要在语法级别**控制令牌采样**

**GitHub 星标**：8,000+ | **来源**：dottxt.ai（前身为 .txt）

## 安装

```bash
# 基础安装
pip install outlines

# 搭配特定后端使用
pip install outlines transformers  # Hugging Face 模型
pip install outlines llama-cpp-python  # llama.cpp
pip install outlines vllm  # 用于高吞吐量的 vLLM

快速入门

基础示例：分类

import outlines
from typing import Literal

# 加载模型
model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct")

# 使用类型约束生成
prompt = "Sentiment of 'This product is amazing!': "
generator = outlines.generate.choice(model, ["positive", "negative", "neutral"])
sentiment = generator(prompt)

print(sentiment)  # "positive"（保证是其中之一）

使用 Pydantic 模型

from pydantic import BaseModel
import outlines

class User(BaseModel):
    name: str
    age: int
    email: str

model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct")

# 生成结构化输出
prompt = "Extract user: John Doe, 30 years old, john@example.com"
generator = outlines.generate.json(model, User)
user = generator(prompt)

print(user.name)   # "John Doe"
print(user.age)    # 30
print(user.email)  # "john@example.com"

核心概念

1. 约束令牌采样

Outlines 使用有限状态机（FSM）在 logit 级别约束令牌生成。

工作原理：

将模式（JSON/Pydantic/正则表达式）转换为上下文无关文法（CFG）
将 CFG 转换为有限状态机（FSM）
在生成过程的每一步过滤无效令牌
当只有一个有效令牌存在时，快速前进

优势：

零开销：过滤发生在令牌级别
速度提升：通过确定性路径快速前进
保证有效性：无效输出不可能出现

import outlines

# Pydantic 模型 -> JSON schema -> CFG -> FSM
class Person(BaseModel):
    name: str
    age: int

model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct")

# 幕后过程：
# 1. Person -> JSON schema
# 2. JSON schema -> CFG
# 3. CFG -> FSM
# 4. FSM 在生成过程中过滤令牌

generator = outlines.generate.json(model, Person)
result = generator("Generate person: Alice, 25")

2. 结构化生成器

Outlines 为不同的输出类型提供了专门的生成器。

选择生成器

# 多选一选择
generator = outlines.generate.choice(
    model,
    ["positive", "negative", "neutral"]
)

sentiment = generator("Review: This is great!")
# 结果：三个选项之一

JSON 生成器

from pydantic import BaseModel

class Product(BaseModel):
    name: str
    price: float
    in_stock: bool

# 生成匹配 schema 的有效 JSON
generator = outlines.generate.json(model, Product)
product = generator("Extract: iPhone 15, $999, available")

# 保证是有效的 Product 实例
print(type(product))  # <class '__main__.Product'>

正则表达式生成器

# 生成匹配正则表达式的文本
generator = outlines.generate.regex(
    model,
    r"[0-9]{3}-[0-9]{3}-[0-9]{4}"  # 电话号码模式
)

phone = generator("Generate phone number:")
# 结果："555-123-4567"（保证匹配模式）

整数/浮点数生成器

# 生成特定数值类型
int_generator = outlines.generate.integer(model)
age = int_generator("Person's age:")  # 保证为整数

float_generator = outlines.generate.float(model)
price = float_generator("Product price:")  # 保证为浮点数

3. 模型后端

Outlines 支持多种本地和基于 API 的后端。

Transformers (Hugging Face)

import outlines

# 从 Hugging Face 加载
model = outlines.models.transformers(
    "microsoft/Phi-3-mini-4k-instruct",
    device="cuda"  # 或 "cpu"
)

# 与任何生成器一起使用
generator = outlines.generate.json(model, YourModel)

llama.cpp

# 加载 GGUF 模型
model = outlines.models.llamacpp(
    "./models/llama-3.1-8b-instruct.Q4_K_M.gguf",
    n_gpu_layers=35
)

generator = outlines.generate.json(model, YourModel)

vLLM（高吞吐量）

# 用于生产部署
model = outlines.models.vllm(
    "meta-llama/Llama-3.1-8B-Instruct",
    tensor_parallel_size=2  # 多 GPU
)

generator = outlines.generate.json(model, YourModel)

OpenAI（有限支持）

# 基本的 OpenAI 支持
model = outlines.models.openai(
    "gpt-4o-mini",
    api_key="your-api-key"
)

# 注意：API 模型某些功能受限
generator = outlines.generate.json(model, YourModel)

4. Pydantic 集成

Outlines 具有对 Pydantic 的一流支持，可自动进行 schema 转换。

基础模型

from pydantic import BaseModel, Field

class Article(BaseModel):
    title: str = Field(description="文章标题")
    author: str = Field(description="作者姓名")
    word_count: int = Field(description="字数", gt=0)
    tags: list[str] = Field(description="标签列表")

model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct")
generator = outlines.generate.json(model, Article)

article = generator("Generate article about AI")
print(article.title)
print(article.word_count)  # 保证 > 0

嵌套模型

class Address(BaseModel):
    street: str
    city: str
    country: str

class Person(BaseModel):
    name: str
    age: int
    address: Address  # 嵌套模型

generator = outlines.generate.json(model, Person)
person = generator("Generate person in New York")

print(person.address.city)  # "New York"

枚举和字面量

from enum import Enum
from typing import Literal

class Status(str, Enum):
    PENDING = "pending"
    APPROVED = "approved"
    REJECTED = "rejected"

class Application(BaseModel):
    applicant: str
    status: Status  # 必须是枚举值之一
    priority: Literal["low", "medium", "high"]  # 必须是字面量之一

generator = outlines.generate.json(model, Application)
app = generator("Generate application")

print(app.status)  # Status.PENDING（或 APPROVED/REJECTED）

常见模式

模式 1：数据提取

from pydantic import BaseModel
import outlines

class CompanyInfo(BaseModel):
    name: str
    founded_year: int
    industry: str
    employees: int

model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct")
generator = outlines.generate.json(model, CompanyInfo)

text = """
Apple Inc. was founded in 1976 in the technology industry.
The company employs approximately 164,000 people worldwide.
"""

prompt = f"Extract company information:\n{text}\n\nCompany:"
company = generator(prompt)

print(f"Name: {company.name}")
print(f"Founded: {company.founded_year}")
print(f"Industry: {company.industry}")
print(f"Employees: {company.employees}")

模式 2：分类

from typing import Literal
import outlines

model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct")

# 二元分类
generator = outlines.generate.choice(model, ["spam", "not_spam"])
result = generator("Email: Buy now! 50% off!")

# 多类分类
categories = ["technology", "business", "sports", "entertainment"]
category_gen = outlines.generate.choice(model, categories)
category = category_gen("Article: Apple announces new iPhone...")

# 带置信度
class Classification(BaseModel):
    label: Literal["positive", "negative", "neutral"]
    confidence: float

classifier = outlines.generate.json(model, Classification)
result = classifier("Review: This product is okay, nothing special")

模式 3：结构化表单

class UserProfile(BaseModel):
    full_name: str
    age: int
    email: str
    phone: str
    country: str
    interests: list[str]

model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct")
generator = outlines.generate.json(model, UserProfile)

prompt = """
Extract user profile from:
Name: Alice Johnson
Age: 28
Email: alice@example.com
Phone: 555-0123
Country: USA
Interests: hiking, photography, cooking
"""

profile = generator(prompt)
print(profile.full_name)
print(profile.interests)  # ["hiking", "photography", "cooking"]

模式 4：多实体提取

class Entity(BaseModel):
    name: str
    type: Literal["PERSON", "ORGANIZATION", "LOCATION"]

class DocumentEntities(BaseModel):
    entities: list[Entity]

model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct")
generator = outlines.generate.json(model, DocumentEntities)

text = "Tim Cook met with Satya Nadella at Microsoft headquarters in Redmond."
prompt = f"Extract entities from: {text}"

result = generator(prompt)
for entity in result.entities:
    print(f"{entity.name} ({entity.type})")

模式 5：代码生成

class PythonFunction(BaseModel):
    function_name: str
    parameters: list[str]
    docstring: str
    body: str

model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct")
generator = outlines.generate.json(model, PythonFunction)

prompt = "Generate a Python function to calculate factorial"
func = generator(prompt)

print(f"def {func.function_name}({', '.join(func.parameters)}):")
print(f'    """{func.docstring}"""')
print(f"    {func.body}")

模式 6：批处理

def batch_extract(texts: list[str], schema: type[BaseModel]):
    """从多个文本中提取结构化数据。"""
    model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct")
    generator = outlines.generate.json(model, schema)

    results = []
    for text in texts:
        result = generator(f"Extract from: {text}")
        results.append(result)

    return results

class Person(BaseModel):
    name: str
    age: int

texts = [
    "John is 30 years old",
    "Alice is 25 years old",
    "Bob is 40 years old"
]

people = batch_extract(texts, Person)
for person in people:
    print(f"{person.name}: {person.age}")

后端配置

Transformers

import outlines

# 基本用法
model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct")

# GPU 配置
model = outlines.models.transformers(
    "microsoft/Phi-3-mini-4k-instruct",
    device="cuda",
    model_kwargs={"torch_dtype": "float16"}
)

# 流行模型
model = outlines.models.transformers("meta-llama/Llama-3.1-8B-Instruct")
model = outlines.models.transformers("mistralai/Mistral-7B-Instruct-v0.3")
model = outlines.models.transformers("Qwen/Qwen2.5-7B-Instruct")

llama.cpp

# 加载 GGUF 模型
model = outlines.models.llamacpp(
    "./models/llama-3.1-8b.Q4_K_M.gguf",
    n_ctx=4096,         # 上下文窗口
    n_gpu_layers=35,    # GPU 层数
    n_threads=8         # CPU 线程数
)

# 完整 GPU 卸载
model = outlines.models.llamacpp(
    "./models/model.gguf",
    n_gpu_layers=-1  # 所有层置于 GPU
)

vLLM (生产环境)

# 单 GPU
model = outlines.models.vllm("meta-llama/Llama-3.1-8B-Instruct")

# 多 GPU
model = outlines.models.vllm(
    "meta-llama/Llama-3.1-70B-Instruct",
    tensor_parallel_size=4  # 4 张 GPU
)

# 带量化
model = outlines.models.vllm(
    "meta-llama/Llama-3.1-8B-Instruct",
    quantization="awq"  # 或 "gptq"
)

最佳实践

1. 使用具体类型

# ✅ 好：使用具体类型
class Product(BaseModel):
    name: str
    price: float  # 不是 str
    quantity: int  # 不是 str
    in_stock: bool  # 不是 str

# ❌ 差：全部使用字符串
class Product(BaseModel):
    name: str
    price: str  # 应该是 float
    quantity: str  # 应该是 int

2. 添加约束

from pydantic import Field

# ✅ 好：带约束
class User(BaseModel):
    name: str = Field(min_length=1, max_length=100)
    age: int = Field(ge=0, le=120)
    email: str = Field(pattern=r"^[\w\.-]+@[\w\.-]+\.\w+$")

# ❌ 差：无约束
class User(BaseModel):
    name: str
    age: int
    email: str

3. 为类别使用枚举

# ✅ 好：使用枚举处理固定集合
class Priority(str, Enum):
    LOW = "low"
    MEDIUM = "medium"
    HIGH = "high"

class Task(BaseModel):
    title: str
    priority: Priority

# ❌ 差：使用自由格式的字符串
class Task(BaseModel):
    title: str
    priority: str  # 可以是任何值

4. 在提示词中提供上下文

# ✅ 好：清晰的上下文
prompt = """
从以下文本中提取产品信息。
文本：iPhone 15 Pro 售价 $999，目前有货。
产品：
"""

# ❌ 差：最小上下文
prompt = "iPhone 15 Pro 售价 $999，目前有货。"

5. 处理可选字段

from typing import Optional

# ✅ 好：为不完整数据使用可选字段
class Article(BaseModel):
    title: str  # 必需
    author: Optional[str] = None  # 可选
    date: Optional[str] = None  # 可选
    tags: list[str] = []  # 默认空列表

# 即使缺少作者/日期也能成功

与其他替代方案的比较

功能	Outlines	Instructor	Guidance	LMQL
Pydantic 支持	✅ 原生支持	✅ 原生支持	❌ 不支持	❌ 不支持
JSON Schema	✅ 支持	✅ 支持	⚠️ 有限支持	✅ 支持
正则约束	✅ 支持	❌ 不支持	✅ 支持	✅ 支持
本地模型	✅ 完整支持	⚠️ 有限支持	✅ 完整支持	✅ 完整支持
API 模型	⚠️ 有限支持	✅ 完整支持	✅ 完整支持	✅ 完整支持
零开销	✅ 是	❌ 否	⚠️ 部分	✅ 是
自动重试	❌ 否	✅ 是	❌ 否	❌ 否
学习曲线	低	低	低	高

何时选择 Outlines：

使用本地模型（Transformers、llama.cpp、vLLM）
需要最大推理速度
想要 Pydantic 模型支持
需要零开销的结构化生成
控制令牌采样过程

何时选择其他方案：

Instructor：需要带有自动重试的 API 模型
Guidance：需要令牌修复和复杂工作流
LMQL：偏好声明式查询语法

性能特征

速度：

零开销：结构化生成速度与无约束生成一样快
快进优化：跳过确定性的令牌
比生成后验证方法快 1.2-2 倍

内存：

每个模式只编译一次有限状态机 (FSM)（已缓存）
运行时开销最小
与 vLLM 结合使用时高效，可实现高吞吐量

准确性：

100% 有效输出（由有限状态机 (FSM) 保证）
无需重试循环
确定性的令牌过滤

资源

文档：https://outlines-dev.github.io/outlines
GitHub：https://github.com/outlines-dev/outlines (8k+ stars)
Discord：https://discord.gg/R9DSu34mGd
博客：https://blog.dottxt.co

另请参阅

references/json_generation.md - 全面的 JSON 和 Pydantic 模式
references/backends.md - 后端特定的配置
references/examples.md - 生产就绪的示例

技能元数据​

快速入门​

基础示例：分类​

使用 Pydantic 模型​

核心概念​

1. 约束令牌采样​

2. 结构化生成器​

选择生成器​

JSON 生成器​

正则表达式生成器​

整数/浮点数生成器​

3. 模型后端​

Transformers (Hugging Face)​

llama.cpp​

vLLM（高吞吐量）​

OpenAI（有限支持）​

4. Pydantic 集成​

基础模型​

嵌套模型​

枚举和字面量​

常见模式​

模式 1：数据提取​

模式 2：分类​

模式 3：结构化表单​

模式 4：多实体提取​

模式 5：代码生成​

模式 6：批处理​

后端配置​

Transformers​

llama.cpp​

vLLM (生产环境)​

最佳实践​

1. 使用具体类型​

2. 添加约束​

3. 为类别使用枚举​

4. 在提示词中提供上下文​

5. 处理可选字段​

与其他替代方案的比较​

性能特征​

资源​

另请参阅​

技能元数据

快速入门

基础示例：分类

使用 Pydantic 模型

核心概念

1. 约束令牌采样

2. 结构化生成器

选择生成器

JSON 生成器

正则表达式生成器

整数/浮点数生成器

3. 模型后端

Transformers (Hugging Face)

llama.cpp

vLLM（高吞吐量）

OpenAI（有限支持）

4. Pydantic 集成

基础模型

嵌套模型

枚举和字面量

常见模式

模式 1：数据提取

模式 2：分类

模式 3：结构化表单

模式 4：多实体提取

模式 5：代码生成

模式 6：批处理

后端配置

Transformers

llama.cpp

vLLM (生产环境)

最佳实践

1. 使用具体类型

2. 添加约束

3. 为类别使用枚举

4. 在提示词中提供上下文

5. 处理可选字段

与其他替代方案的比较

性能特征

资源

另请参阅