Skip to main content

评分工具

🌐 Scorer Utils

Mastra 提供了实用功能,帮助从评分器运行的输入和输出中提取和处理数据。这些工具在自定义评分器的 preprocess 步骤中特别有用。

🌐 Mastra provides utility functions to help extract and process data from scorer run inputs and outputs. These utilities are particularly useful in the preprocess step of custom scorers.

导入
Direct link to 导入

🌐 Import

import {
getAssistantMessageFromRunOutput,
getReasoningFromRunOutput,
getUserMessageFromRunInput,
getSystemMessagesFromRunInput,
getCombinedSystemPrompt,
extractToolCalls,
extractInputMessages,
extractAgentResponseMessages,
} from "@mastra/evals/scorers/utils";

消息提取
Direct link to 消息提取

🌐 Message Extraction

getAssistantMessageFromRunOutput
Direct link to getAssistantMessageFromRunOutput

从运行输出的第一个助理消息中提取文本内容。

🌐 Extracts the text content from the first assistant message in the run output.

const scorer = createScorer({
id: "my-scorer",
description: "My scorer",
type: "agent",
})
.preprocess(({ run }) => {
const response = getAssistantMessageFromRunOutput(run.output);
return { response };
})
.generateScore(({ results }) => {
return results.preprocessStepResult?.response ? 1 : 0;
});

output?:

ScorerRunOutputForAgent
The scorer run output (array of MastraDBMessage)

返回值: string | undefined - 助手消息文本,如果未找到助手消息,则返回 undefined。

getUserMessageFromRunInput
Direct link to getUserMessageFromRunInput

从运行输入的第一个用户消息中提取文本内容。

🌐 Extracts the text content from the first user message in the run input.

.preprocess(({ run }) => {
const userMessage = getUserMessageFromRunInput(run.input);
return { userMessage };
})

input?:

ScorerRunInputForAgent
The scorer run input containing input messages

返回值: string | undefined - 用户的消息文本,如果未找到用户消息,则返回 undefined。

extractInputMessages
Direct link to extractInputMessages

将所有输入消息的文本内容提取为数组。

🌐 Extracts text content from all input messages as an array.

.preprocess(({ run }) => {
const allUserMessages = extractInputMessages(run.input);
return { conversationHistory: allUserMessages.join("\n") };
})

返回值: string[] - 来自每条输入消息的文本字符串数组。

extractAgentResponseMessages
Direct link to extractAgentResponseMessages

将所有助手回应消息的文本内容提取为一个数组。

🌐 Extracts text content from all assistant response messages as an array.

.preprocess(({ run }) => {
const allResponses = extractAgentResponseMessages(run.output);
return { allResponses };
})

返回值: string[] - 来自每条助手消息的文本字符串数组。

推断提取
Direct link to 推断提取

🌐 Reasoning Extraction

getReasoningFromRunOutput
Direct link to getReasoningFromRunOutput

从运行输出中提取推断文本。这在评估像 deepseek-reasoner 这样的推断模型生成的连锁思维推断时特别有用。

🌐 Extracts reasoning text from the run output. This is particularly useful when evaluating responses from reasoning models like deepseek-reasoner that produce chain-of-thought reasoning.

推断可以存储在两个地方:

🌐 Reasoning can be stored in two places:

  1. content.reasoning - 消息内容上的字符串字段
  2. content.parts - 作为包含 detailstype: 'reasoning' 部分
import { 
getReasoningFromRunOutput,
getAssistantMessageFromRunOutput
} from "@mastra/evals/scorers/utils";

const reasoningQualityScorer = createScorer({
id: "reasoning-quality",
name: "Reasoning Quality",
description: "Evaluates the quality of model reasoning",
type: "agent",
})
.preprocess(({ run }) => {
const reasoning = getReasoningFromRunOutput(run.output);
const response = getAssistantMessageFromRunOutput(run.output);
return { reasoning, response };
})
.analyze(({ results }) => {
const { reasoning } = results.preprocessStepResult || {};
return {
hasReasoning: !!reasoning,
reasoningLength: reasoning?.length || 0,
hasStepByStep: reasoning?.includes("step") || false,
};
})
.generateScore(({ results }) => {
const { hasReasoning, reasoningLength } = results.analyzeStepResult || {};
if (!hasReasoning) return 0;
// Score based on reasoning length (normalized to 0-1)
return Math.min(reasoningLength / 500, 1);
})
.generateReason(({ results, score }) => {
const { hasReasoning, reasoningLength } = results.analyzeStepResult || {};
if (!hasReasoning) {
return "No reasoning was provided by the model.";
}
return `Model provided ${reasoningLength} characters of reasoning. Score: ${score}`;
});

output?:

ScorerRunOutputForAgent
The scorer run output (array of MastraDBMessage)

返回值: string | undefined - 推断文本,如果没有推断则为未定义。

系统消息提取
Direct link to 系统消息提取

🌐 System Message Extraction

getSystemMessagesFromRunInput
Direct link to getSystemMessagesFromRunInput

从运行输入中提取所有系统消息,包括标准系统消息和带标签的系统消息(如内存指令等特殊提示)。

🌐 Extracts all system messages from the run input, including both standard system messages and tagged system messages (specialized prompts like memory instructions).

.preprocess(({ run }) => {
const systemMessages = getSystemMessagesFromRunInput(run.input);
return {
systemPromptCount: systemMessages.length,
systemPrompts: systemMessages
};
})

返回值: string[] - 系统消息字符串数组。

getCombinedSystemPrompt
Direct link to getCombinedSystemPrompt

将所有系统消息合并为一个单独的提示字符串,并用两个换行符连接。

🌐 Combines all system messages into a single prompt string, joined with double newlines.

.preprocess(({ run }) => {
const fullSystemPrompt = getCombinedSystemPrompt(run.input);
return { fullSystemPrompt };
})

返回: string - 合并的系统提示字符串。

工具调用提取
Direct link to 工具调用提取

🌐 Tool Call Extraction

extractToolCalls
Direct link to extractToolCalls

从运行输出中提取所有工具调用的信息,包括工具名称、调用 ID 以及它们在消息数组中的位置。

🌐 Extracts information about all tool calls from the run output, including tool names, call IDs, and their positions in the message array.

const toolUsageScorer = createScorer({
id: "tool-usage",
description: "Evaluates tool usage patterns",
type: "agent",
})
.preprocess(({ run }) => {
const { tools, toolCallInfos } = extractToolCalls(run.output);
return {
toolsUsed: tools,
toolCount: tools.length,
toolDetails: toolCallInfos,
};
})
.generateScore(({ results }) => {
const { toolCount } = results.preprocessStepResult || {};
// Score based on appropriate tool usage
return toolCount > 0 ? 1 : 0;
});

返回:

{
tools: string[]; // Array of tool names
toolCallInfos: ToolCallInfo[]; // Detailed tool call information
}

ToolCallInfo 是:

🌐 Where ToolCallInfo is:

type ToolCallInfo = {
toolName: string; // Name of the tool
toolCallId: string; // Unique call identifier
messageIndex: number; // Index in the output array
invocationIndex: number; // Index within message's tool invocations
};

测试工具
Direct link to 测试工具

🌐 Test Utilities

这些工具有助于为评分器开发创建测试数据。

🌐 These utilities help create test data for scorer development.

createTestMessage
Direct link to createTestMessage

创建一个用于测试的 MastraDBMessage 对象。

🌐 Creates a MastraDBMessage object for testing purposes.

import { createTestMessage } from "@mastra/evals/scorers/utils";

const userMessage = createTestMessage({
content: "What is the weather?",
role: "user",
});

const assistantMessage = createTestMessage({
content: "The weather is sunny.",
role: "assistant",
toolInvocations: [
{
toolCallId: "call-1",
toolName: "weatherTool",
args: { location: "London" },
result: { temp: 20 },
state: "result",
},
],
});

createAgentTestRun
Direct link to createAgentTestRun

创建用于测试评分器的完整测试运行对象。

🌐 Creates a complete test run object for testing scorers.

import { createAgentTestRun, createTestMessage } from "@mastra/evals/scorers/utils";

const testRun = createAgentTestRun({
inputMessages: [
createTestMessage({ content: "Hello", role: "user" }),
],
output: [
createTestMessage({ content: "Hi there!", role: "assistant" }),
],
});

// Run your scorer with the test data
const result = await myScorer.run({
input: testRun.input,
output: testRun.output,
});

完整示例
Direct link to 完整示例

🌐 Complete Example

这是一个完整的示例,展示如何将多个工具一起使用:

🌐 Here's a complete example showing how to use multiple utilities together:

import { createScorer } from "@mastra/core/evals";
import {
getAssistantMessageFromRunOutput,
getReasoningFromRunOutput,
getUserMessageFromRunInput,
getCombinedSystemPrompt,
extractToolCalls,
} from "@mastra/evals/scorers/utils";

const comprehensiveScorer = createScorer({
id: "comprehensive-analysis",
name: "Comprehensive Analysis",
description: "Analyzes all aspects of an agent response",
type: "agent",
})
.preprocess(({ run }) => {
// Extract all relevant data
const userMessage = getUserMessageFromRunInput(run.input);
const response = getAssistantMessageFromRunOutput(run.output);
const reasoning = getReasoningFromRunOutput(run.output);
const systemPrompt = getCombinedSystemPrompt(run.input);
const { tools, toolCallInfos } = extractToolCalls(run.output);

return {
userMessage,
response,
reasoning,
systemPrompt,
toolsUsed: tools,
toolCount: tools.length,
};
})
.generateScore(({ results }) => {
const { response, reasoning, toolCount } = results.preprocessStepResult || {};

let score = 0;
if (response && response.length > 0) score += 0.4;
if (reasoning) score += 0.3;
if (toolCount > 0) score += 0.3;

return score;
})
.generateReason(({ results, score }) => {
const { response, reasoning, toolCount } = results.preprocessStepResult || {};

const parts = [];
if (response) parts.push("provided a response");
if (reasoning) parts.push("included reasoning");
if (toolCount > 0) parts.push(`used ${toolCount} tool(s)`);

return `Score: ${score}. The agent ${parts.join(", ")}.`;
});