Skip to main content

自定义评分器

🌐 Custom scorers

Mastra 提供了一个统一的 createScorer 工厂,允许你为每个步骤使用 JavaScript 函数或基于 LLM 的提示对象构建自定义评估逻辑。这种灵活性让你可以为评估流程的每个部分选择最合适的方法。

🌐 Mastra provides a unified createScorer factory that allows you to build custom evaluation logic using either JavaScript functions or LLM-based prompt objects for each step. This flexibility lets you choose the best approach for each part of your evaluation pipeline.

四步流程
Direct link to 四步流程

🌐 The Four-Step Pipeline

Mastra 中的所有评分者都遵循一致的四步评估流程:

🌐 All scorers in Mastra follow a consistent four-step evaluation pipeline:

  1. 预处理(可选):准备或转换输入/输出数据
  2. 分析(可选):进行评估分析并收集见解
  3. generateScore(必填):将分析结果转换为数值评分
  4. generateReason(可选):生成可读性的人类解释

每个步骤都可以使用 函数提示对象(基于大语言模型的评估),让你可以根据需要将确定性算法与 AI 判断相结合。

🌐 Each step can use either functions or prompt objects (LLM-based evaluation), giving you the flexibility to combine deterministic algorithms with AI judgment as needed.

函数 vs 提示对象
Direct link to 函数 vs 提示对象

🌐 Functions vs Prompt Objects

函数 使用 JavaScript 进行确定性逻辑。它们非常适合用于:

  • 具有明确标准的算法评估
  • 性能关键场景
  • 与现有库的集成
  • 一致且可重复的结果

提示对象 使用大型语言模型作为评估的裁判。它们非常适合:

  • 需要类似人类判断的主观评估
  • 难以用算法编码的复杂标准
  • 自然语言理解任务
  • 细致的情境评估

“prompt object”的意思: 这个步骤不是一个函数,而是一个包含 description + createPrompt(以及 preprocess/analyzeoutputSchema)的对象。该对象告诉 Mastra 运行该步骤的评判 LLM,并将结构化输出存储在 results.<step>StepResult 中。

你可以在同一个评分器中混合搭配不同的方法——例如,使用函数进行数据预处理,用大型语言模型(LLM)分析质量。

🌐 You can mix and match approaches within a single scorer - for example, use a function for preprocessing data and an LLM for analyzing quality.

初始化评分器
Direct link to 初始化评分器

🌐 Initializing a Scorer

每个评分器都从 createScorer 工厂函数开始,该函数需要一个 id 和描述,并且可选择性地接受类型规范和评审配置。

🌐 Every scorer starts with the createScorer factory function, which requires an id and description, and optionally accepts a type specification and judge configuration.

import { createScorer } from '@mastra/core/evals';

const glutenCheckerScorer = createScorer({
id: 'gluten-checker',
description: 'Check if recipes contain gluten ingredients',
judge: { // Optional: for prompt object steps
model: 'openai/gpt-5.1',
instructions: 'You are a Chef that identifies if recipes contain gluten.'
}
})
// Chain step methods here
.preprocess(...)
.analyze(...)
.generateScore(...)
.generateReason(...)

只有在你计划在任何步骤中使用提示对象时,才需要进行评审器配置。各个步骤可以使用它们自己的评审器设置来覆盖此默认配置。

🌐 The judge configuration is only needed if you plan to use prompt objects in any step. Individual steps can override this default configuration with their own judge settings.

如果所有步骤都是基于函数的,则法官永远不会被调用,也没有法官输出。要查看 LLM 输出,请至少将一个步骤定义为提示对象,并读取相应的步骤结果(例如 results.analyzeStepResult)。

🌐 If all steps are function-based, the judge is never called and there is no judge output. To see LLM output, define at least one step as a prompt object and read the corresponding step result (for example, results.analyzeStepResult).

最简评审示例(提示对象)
Direct link to 最简评审示例(提示对象)

🌐 Minimal judge example (prompt object)

这个示例在 analyze 中使用了一个提示对象,因此评审会运行,并且其结构化输出可作为 results.analyzeStepResult 使用。

🌐 This example uses a prompt object in analyze, so the judge runs and its structured output is available as results.analyzeStepResult.

import { createScorer } from "@mastra/core/evals";
import { z } from "zod";

const quoteSourcesScorer = createScorer({
id: "quote-sources",
description: "Check if the response includes sources",
judge: {
model: "openai/gpt-4.1-nano",
instructions: "You are a strict evaluator.",
},
})
.analyze({
description: "Detect whether sources are present",
outputSchema: z.object({
hasSources: z.boolean(),
sources: z.array(z.string()),
}),
createPrompt: ({ run }) => `
Does the response contain sources? Extract them as a list.

Response:
${run.output}
`,
})
.generateScore(({ results }) => (results.analyzeStepResult.hasSources ? 1 : 0));

// Run the scorer and inspect judge output
const result = await quoteSourcesScorer.run({
input: "What is the capital of France?",
output: "Paris is the capital of France [1]. Source: [1] Wikipedia",
});

console.log(result.score); // 1
console.log(result.analyzeStepResult); // { hasSources: true, sources: ["Wikipedia"] }

代理评估的Agent 类型
Direct link to 代理评估的Agent 类型

🌐 Agent Type for Agent Evaluation

为了类型安全并兼容实时代理评分和追踪评分,在为代理评估创建评分器时使用 type: 'agent'。这使你可以为代理使用相同的评分器,也可以用它来评分追踪:

🌐 For type safety and compatibility with both live agent scoring and trace scoring, use type: 'agent' when creating scorers for agent evaluation. This allows you to use the same scorer for an agent and also use it to score traces:

const myScorer = createScorer({
type: "agent", // Automatically handles agent input/output types
}).generateScore(({ run, results }) => {
// run.output is automatically typed as ScorerRunOutputForAgent
// run.input is automatically typed as ScorerRunInputForAgent
});

逐步分解
Direct link to 逐步分解

🌐 Step-by-Step Breakdown

预处理步骤(可选)
Direct link to 预处理步骤(可选)

🌐 preprocess Step (Optional)

在需要提取特定元素、筛选内容或转换复杂数据结构时,准备输入/输出数据。

🌐 Prepares input/output data when you need to extract specific elements, filter content, or transform complex data structures.

功能: ({ run, results }) => any

const glutenCheckerScorer = createScorer(...)
.preprocess(({ run }) => {
// Extract and clean recipe text
const recipeText = run.output.text.toLowerCase();
const wordCount = recipeText.split(' ').length;

return {
recipeText,
wordCount,
hasCommonGlutenWords: /flour|wheat|bread|pasta/.test(recipeText)
};
})

提示对象: 使用 descriptionoutputSchemacreatePrompt 来构建基于 LLM 的预处理。

const glutenCheckerScorer = createScorer(...)
.preprocess({
description: 'Extract ingredients from the recipe',
outputSchema: z.object({
ingredients: z.array(z.string()),
cookingMethods: z.array(z.string())
}),
createPrompt: ({ run }) => `
Extract all ingredients and cooking methods from this recipe:
${run.output.text}

Return JSON with ingredients and cookingMethods arrays.
`
})

数据流: 结果可作为 results.preprocessStepResult 提供给后续步骤

分析步骤(可选)
Direct link to 分析步骤(可选)

🌐 analyze Step (Optional)

执行核心评估分析,收集将为评分决策提供依据的见解。

🌐 Performs core evaluation analysis, gathering insights that will inform the scoring decision.

功能: ({ run, results }) => any

const glutenCheckerScorer = createScorer({...})
.preprocess(...)
.analyze(({ run, results }) => {
const { recipeText, hasCommonGlutenWords } = results.preprocessStepResult;

// Simple gluten detection algorithm
const glutenKeywords = ['wheat', 'flour', 'barley', 'rye', 'bread'];
const foundGlutenWords = glutenKeywords.filter(word =>
recipeText.includes(word)
);

return {
isGlutenFree: foundGlutenWords.length === 0,
detectedGlutenSources: foundGlutenWords,
confidence: hasCommonGlutenWords ? 0.9 : 0.7
};
})

提示对象: 对基于大型语言模型的分析使用 descriptionoutputSchemacreatePrompt

const glutenCheckerScorer = createScorer({...})
.preprocess(...)
.analyze({
description: 'Analyze recipe for gluten content',
outputSchema: z.object({
isGlutenFree: z.boolean(),
glutenSources: z.array(z.string()),
confidence: z.number().min(0).max(1)
}),
createPrompt: ({ run, results }) => `
Analyze this recipe for gluten content:
"${results.preprocessStepResult.recipeText}"

Look for wheat, barley, rye, and hidden sources like soy sauce.
Return JSON with isGlutenFree, glutenSources array, and confidence (0-1).
`
})

数据流: 结果可作为 results.analyzeStepResult 提供给后续步骤

生成分数步骤(必填)
Direct link to 生成分数步骤(必填)

🌐 generateScore Step (Required)

将分析结果转换为数值评分。这是管道中唯一必须的步骤。

🌐 Converts analysis results into a numerical score. This is the only required step in the pipeline.

功能: ({ run, results }) => number

const glutenCheckerScorer = createScorer({...})
.preprocess(...)
.analyze(...)
.generateScore(({ results }) => {
const { isGlutenFree, confidence } = results.analyzeStepResult;

// Return 1 for gluten-free, 0 for contains gluten
// Weight by confidence level
return isGlutenFree ? confidence : 0;
})

提示对象: 有关在 generateScore 中使用提示对象的详细信息,请参阅 createScorer API 参考,包括所需的 calculateScore 函数。

数据流: 该分数可作为 score 参数提供给 generateReason

生成原因步骤(可选)
Direct link to 生成原因步骤(可选)

🌐 generateReason Step (Optional)

为分数生成可读的解释,有助于调试、透明度或用户反馈。

🌐 Generates human-readable explanations for the score, useful for debugging, transparency, or user feedback.

功能: ({ run, results, score }) => string

const glutenCheckerScorer = createScorer({...})
.preprocess(...)
.analyze(...)
.generateScore(...)
.generateReason(({ results, score }) => {
const { isGlutenFree, glutenSources } = results.analyzeStepResult;

if (isGlutenFree) {
return `Score: ${score}. This recipe is gluten-free with no harmful ingredients detected.`;
} else {
return `Score: ${score}. Contains gluten from: ${glutenSources.join(', ')}`;
}
})

提示对象: 使用 descriptioncreatePrompt 来生成大型语言模型的解释。

const glutenCheckerScorer = createScorer({...})
.preprocess(...)
.analyze(...)
.generateScore(...)
.generateReason({
description: 'Explain the gluten assessment',
createPrompt: ({ results, score }) => `
Explain why this recipe received a score of ${score}.
Analysis: ${JSON.stringify(results.analyzeStepResult)}

Provide a clear explanation for someone with dietary restrictions.
`
})

示例:创建自定义评分器
Direct link to 示例:创建自定义评分器

🌐 Example: Create a custom scorer

Mastra 中的自定义评分器使用 createScorer,包含四个核心组件:

🌐 A custom scorer in Mastra uses createScorer with four core components:

  1. 法官配置
  2. 分析步骤
  3. 得分生成
  4. 原因生成

这些组件一起使你能够使用大型语言模型作为评审来定义自定义评估逻辑。

🌐 Together, these components allow you to define custom evaluation logic using LLMs as judges.

info

访问 createScorer 以查看完整的 API 和配置选项。

🌐 Visit createScorer for the full API and configuration options.

src/mastra/scorers/gluten-checker.ts
import { createScorer } from "@mastra/core/evals";
import { z } from "zod";

export const GLUTEN_INSTRUCTIONS = `You are a Chef that identifies if recipes contain gluten.`;

export const generateGlutenPrompt = ({
output,
}: {
output: string;
}) => `Check if this recipe is gluten-free.

Check for:
- Wheat
- Barley
- Rye
- Common sources like flour, pasta, bread

Example with gluten:
"Mix flour and water to make dough"
Response: {
"isGlutenFree": false,
"glutenSources": ["flour"]
}

Example gluten-free:
"Mix rice, beans, and vegetables"
Response: {
"isGlutenFree": true,
"glutenSources": []
}

Recipe to analyze:
${output}

Return your response in this format:
{
"isGlutenFree": boolean,
"glutenSources": ["list ingredients containing gluten"]
}`;

export const generateReasonPrompt = ({
isGlutenFree,
glutenSources,
}: {
isGlutenFree: boolean;
glutenSources: string[];
}) => `Explain why this recipe is${isGlutenFree ? "" : " not"} gluten-free.

${glutenSources.length > 0 ? `Sources of gluten: ${glutenSources.join(", ")}` : "No gluten-containing ingredients found"}

Return your response in this format:
"This recipe is [gluten-free/contains gluten] because [explanation]"`;

export const glutenCheckerScorer = createScorer({
id: "gluten-checker",
description: "Check if the output contains any gluten",
judge: {
model: "openai/gpt-4.1-nano",
instructions: GLUTEN_INSTRUCTIONS,
},
})
.analyze({
description: "Analyze the output for gluten",
outputSchema: z.object({
isGlutenFree: z.boolean(),
glutenSources: z.array(z.string()),
}),
createPrompt: ({ run }) => {
const { output } = run;
return generateGlutenPrompt({ output: output.text });
},
})
.generateScore(({ results }) => {
return results.analyzeStepResult.isGlutenFree ? 1 : 0;
})
.generateReason({
description: "Generate a reason for the score",
createPrompt: ({ results }) => {
return generateReasonPrompt({
glutenSources: results.analyzeStepResult.glutenSources,
isGlutenFree: results.analyzeStepResult.isGlutenFree,
});
},
});

裁判配置
Direct link to 裁判配置

🌐 Judge Configuration

设置 LLM 模型并将其角色定义为字段专家。

🌐 Sets up the LLM model and defines its role as a domain expert.

judge: {
model: 'openai/gpt-4.1-nano',
instructions: GLUTEN_INSTRUCTIONS,
}

分析步骤
Direct link to 分析步骤

🌐 Analysis Step

定义 LLM 应如何分析输入以及返回什么结构化输出。

🌐 Defines how the LLM should analyze the input and what structured output to return.

.analyze({
description: 'Analyze the output for gluten',
outputSchema: z.object({
isGlutenFree: z.boolean(),
glutenSources: z.array(z.string()),
}),
createPrompt: ({ run }) => {
const { output } = run;
return generateGlutenPrompt({ output: output.text });
},
})

分析步骤使用提示对象来:

🌐 The analysis step uses a prompt object to:

  • 提供对分析任务的清晰描述
  • 使用 Zod 模式定义预期输出结构(包括布尔结果和谷物来源列表)
  • 根据输入内容生成动态提示

分数生成
Direct link to 分数生成

🌐 Score Generation

将 LLM 的结构化分析转换为数值评分。

🌐 Converts the LLM's structured analysis into a numerical score.

.generateScore(({ results }) => {
return results.analyzeStepResult.isGlutenFree ? 1 : 0;
})

评分生成函数会根据分析结果并应用业务逻辑来生成分数。在这种情况下,LLM 会直接判断秘诀是否无麸质,因此我们使用该布尔结果:无麸质为 1,含麸质为 0。

🌐 The score generation function takes the analysis results and applies business logic to produce a score. In this case, the LLM directly determines if the recipe is gluten-free, so we use that boolean result: 1 for gluten-free, 0 for contains gluten.

原因生成
Direct link to 原因生成

🌐 Reason Generation

使用另一个大型语言模型调用为分数提供可读的解释。

🌐 Provides human-readable explanations for the score using another LLM call.

.generateReason({
description: 'Generate a reason for the score',
createPrompt: ({ results }) => {
return generateReasonPrompt({
glutenSources: results.analyzeStepResult.glutenSources,
isGlutenFree: results.analyzeStepResult.isGlutenFree,
});
},
})

生成步骤产生解释,帮助用户理解为什么会分配某个特定分数,同时使用布尔结果和分析步骤中识别出的具体麸质来源。

🌐 The reason generation step creates explanations that help users understand why a particular score was assigned, using both the boolean result and the specific gluten sources identified by the analysis step.

高无麸质示例
Direct link to 高无麸质示例

🌐 High gluten-free example

src/example-high-gluten-free.ts
const result = await glutenCheckerScorer.run({
input: [{ role: 'user', content: 'Mix rice, beans, and vegetables' }],
output: { text: 'Mix rice, beans, and vegetables' },
});

console.log('Score:', result.score);
console.log('Gluten sources:', result.analyzeStepResult.glutenSources);
console.log('Reason:', result.reason);

高无麸质产量
Direct link to 高无麸质产量

🌐 High gluten-free output

{
score: 1,
analyzeStepResult: {
isGlutenFree: true,
glutenSources: []
},
reason: 'This recipe is gluten-free because rice, beans, and vegetables are naturally gluten-free ingredients that are safe for people with celiac disease.'
}

部分麸质示例
Direct link to 部分麸质示例

🌐 Partial gluten example

src/example-partial-gluten.ts
const result = await glutenCheckerScorer.run({
input: [{ role: "user", content: "Mix flour and water to make dough" }],
output: { text: "Mix flour and water to make dough" },
});

console.log("Score:", result.score);
console.log("Gluten sources:", result.analyzeStepResult.glutenSources);
console.log("Reason:", result.reason);

部分面筋产量
Direct link to 部分面筋产量

🌐 Partial gluten output

{
score: 0,
analyzeStepResult: {
isGlutenFree: false,
glutenSources: ['flour']
},
reason: 'This recipe is not gluten-free because it contains flour. Regular flour is made from wheat and contains gluten, making it unsafe for people with celiac disease or gluten sensitivity.'
}

低无麸质示例
Direct link to 低无麸质示例

🌐 Low gluten-free example

src/example-low-gluten-free.ts
const result = await glutenCheckerScorer.run({
input: [{ role: "user", content: "Add soy sauce and noodles" }],
output: { text: "Add soy sauce and noodles" },
});

console.log("Score:", result.score);
console.log("Gluten sources:", result.analyzeStepResult.glutenSources);
console.log("Reason:", result.reason);

低无麸质产量
Direct link to 低无麸质产量

🌐 Low gluten-free output

{
score: 0,
analyzeStepResult: {
isGlutenFree: false,
glutenSources: ['soy sauce', 'noodles']
},
reason: 'This recipe is not gluten-free because it contains soy sauce, noodles. Regular soy sauce contains wheat and most noodles are made from wheat flour, both of which contain gluten and are unsafe for people with gluten sensitivity.'
}

示例与资源: