TokenLimiterProcessor
TokenLimiterProcessor 限制消息中的令牌数量。它可以用作输入处理器和输出处理器:
🌐 The TokenLimiterProcessor limits the number of tokens in messages. It can be used as both an input and output processor:
- 输入处理器:过滤历史消息以适应上下文窗口,优先考虑最近的消息
- 输出处理器:通过流式或非流式方式限制生成的响应令牌,并提供处理超出限制的可配置策略
使用示例Direct link to 使用示例
🌐 Usage example
import { TokenLimiterProcessor } from "@mastra/core/processors";
const processor = new TokenLimiterProcessor({
limit: 1000,
strategy: "truncate",
countMode: "cumulative"
});
构造函数参数Direct link to 构造函数参数
🌐 Constructor parameters
options:
选项Direct link to 选项
🌐 Options
limit:
encoding?:
strategy?:
countMode?:
返回Direct link to 返回
🌐 Returns
id:
name?:
processInput:
processOutputStream:
processOutputResult:
getMaxTokens:
错误行为Direct link to 错误行为
🌐 Error behavior
当用作输入处理器时,TokenLimiterProcessor 在以下情况下会抛出 TripWire 错误:
🌐 When used as an input processor, TokenLimiterProcessor throws a TripWire error in the following cases:
- 空消息:如果没有消息需要处理,将触发 TripWire,因为无法发送没有消息的 LLM 请求。
- 系统消息超出限制:如果仅系统消息就超过了令牌限制,将会触发 TripWire,因为你不能只发送系统消息而不包含用户/助理消息来发起 LLM 请求。
import { TripWire } from "@mastra/core/agent";
try {
await agent.generate("Hello");
} catch (error) {
if (error instanceof TripWire) {
console.log("Token limit error:", error.message);
}
}
扩展使用示例Direct link to 扩展使用示例
🌐 Extended usage example
作为输入处理器(限制上下文窗口)Direct link to 作为输入处理器(限制上下文窗口)
🌐 As an input processor (limit context window)
使用 inputProcessors 来限制发送给模型的历史消息,这有助于保持在上下文窗口限制内:
🌐 Use inputProcessors to limit historical messages sent to the model, which helps stay within context window limits:
import { Agent } from "@mastra/core/agent";
import { Memory } from "@mastra/memory";
import { TokenLimiterProcessor } from "@mastra/core/processors";
export const agent = new Agent({
name: "context-limited-agent",
instructions: "You are a helpful assistant",
model: "openai/gpt-4o",
memory: new Memory({ /* ... */ }),
inputProcessors: [
new TokenLimiterProcessor({ limit: 4000 }) // Limits historical messages to ~4000 tokens
]
});
作为输出处理器(限制响应长度)Direct link to 作为输出处理器(限制响应长度)
🌐 As an output processor (limit response length)
使用 outputProcessors 来限制生成响应的长度:
🌐 Use outputProcessors to limit the length of generated responses:
import { Agent } from "@mastra/core/agent";
import { TokenLimiterProcessor } from "@mastra/core/processors";
export const agent = new Agent({
name: "response-limited-agent",
instructions: "You are a helpful assistant",
model: "openai/gpt-4o",
outputProcessors: [
new TokenLimiterProcessor({
limit: 1000,
strategy: "truncate",
countMode: "cumulative"
})
]
});
相关Direct link to 相关
🌐 Related