Skip to main content

TokenLimiterProcessor

TokenLimiterProcessor 限制消息中的令牌数量。它可以用作输入处理器和输出处理器:

🌐 The TokenLimiterProcessor limits the number of tokens in messages. It can be used as both an input and output processor:

  • 输入处理器:过滤历史消息以适应上下文窗口,优先考虑最近的消息
  • 输出处理器:通过流式或非流式方式限制生成的响应令牌,并提供处理超出限制的可配置策略

使用示例
Direct link to 使用示例

🌐 Usage example

import { TokenLimiterProcessor } from "@mastra/core/processors";

const processor = new TokenLimiterProcessor({
limit: 1000,
strategy: "truncate",
countMode: "cumulative"
});

构造函数参数
Direct link to 构造函数参数

🌐 Constructor parameters

options:

number | Options
Either a simple number for token limit, or configuration options object

选项
Direct link to 选项

🌐 Options

limit:

number
Maximum number of tokens to allow in the response

encoding?:

TiktokenBPE
Optional encoding to use. Defaults to o200k_base which is used by gpt-5.1

strategy?:

'truncate' | 'abort'
Strategy when token limit is reached: 'truncate' stops emitting chunks, 'abort' calls abort() to stop the stream

countMode?:

'cumulative' | 'part'
Whether to count tokens from the beginning of the stream or just the current part: 'cumulative' counts all tokens from start, 'part' only counts tokens in current part

返回
Direct link to 返回

🌐 Returns

id:

string
Processor identifier set to 'token-limiter'

name?:

string
Optional processor display name

processInput:

(args: { messages: MastraDBMessage[]; abort: (reason?: string) => never }) => Promise<MastraDBMessage[]>
Filters input messages to fit within token limit, prioritizing recent messages while preserving system messages

processOutputStream:

(args: { part: ChunkType; streamParts: ChunkType[]; state: Record<string, any>; abort: (reason?: string) => never }) => Promise<ChunkType | null>
Processes streaming output parts to limit token count during streaming

processOutputResult:

(args: { messages: MastraDBMessage[]; abort: (reason?: string) => never }) => Promise<MastraDBMessage[]>
Processes final output results to limit token count in non-streaming scenarios

getMaxTokens:

() => number
Get the maximum token limit

错误行为
Direct link to 错误行为

🌐 Error behavior

当用作输入处理器时,TokenLimiterProcessor 在以下情况下会抛出 TripWire 错误:

🌐 When used as an input processor, TokenLimiterProcessor throws a TripWire error in the following cases:

  • 空消息:如果没有消息需要处理,将触发 TripWire,因为无法发送没有消息的 LLM 请求。
  • 系统消息超出限制:如果仅系统消息就超过了令牌限制,将会触发 TripWire,因为你不能只发送系统消息而不包含用户/助理消息来发起 LLM 请求。
import { TripWire } from "@mastra/core/agent";

try {
await agent.generate("Hello");
} catch (error) {
if (error instanceof TripWire) {
console.log("Token limit error:", error.message);
}
}

扩展使用示例
Direct link to 扩展使用示例

🌐 Extended usage example

作为输入处理器(限制上下文窗口)
Direct link to 作为输入处理器(限制上下文窗口)

🌐 As an input processor (limit context window)

使用 inputProcessors 来限制发送给模型的历史消息,这有助于保持在上下文窗口限制内:

🌐 Use inputProcessors to limit historical messages sent to the model, which helps stay within context window limits:

src/mastra/agents/context-limited-agent.ts
import { Agent } from "@mastra/core/agent";
import { Memory } from "@mastra/memory";
import { TokenLimiterProcessor } from "@mastra/core/processors";

export const agent = new Agent({
name: "context-limited-agent",
instructions: "You are a helpful assistant",
model: "openai/gpt-4o",
memory: new Memory({ /* ... */ }),
inputProcessors: [
new TokenLimiterProcessor({ limit: 4000 }) // Limits historical messages to ~4000 tokens
]
});

作为输出处理器(限制响应长度)
Direct link to 作为输出处理器(限制响应长度)

🌐 As an output processor (limit response length)

使用 outputProcessors 来限制生成响应的长度:

🌐 Use outputProcessors to limit the length of generated responses:

src/mastra/agents/response-limited-agent.ts
import { Agent } from "@mastra/core/agent";
import { TokenLimiterProcessor } from "@mastra/core/processors";

export const agent = new Agent({
name: "response-limited-agent",
instructions: "You are a helpful assistant",
model: "openai/gpt-4o",
outputProcessors: [
new TokenLimiterProcessor({
limit: 1000,
strategy: "truncate",
countMode: "cumulative"
})
]
});

🌐 Related