Skip to main content

createScorer

Mastra 提供了一个统一的 createScorer 工厂,允许你为输入/输出对定义自定义评分器。你可以在每个评估步骤中使用原生 JavaScript 函数或基于 LLM 的提示对象。自定义评分器可以添加到 Agents 和工作流步骤中。

🌐 Mastra provides a unified createScorer factory that allows you to define custom scorers for evaluating input/output pairs. You can use either native JavaScript functions or LLM-based prompt objects for each evaluation step. Custom scorers can be added to Agents and Workflow steps.

如何创建自定义评分器
Direct link to 如何创建自定义评分器

🌐 How to Create a Custom Scorer

使用 createScorer 工厂来定义你的评分器,包括名称、描述和可选的评审配置。然后链接步骤方法来构建你的评估流程。你必须至少提供一个 generateScore 步骤。

🌐 Use the createScorer factory to define your scorer with a name, description, and optional judge configuration. Then chain step methods to build your evaluation pipeline. You must provide at least a generateScore step.

Prompt 对象步骤 是以对象形式表示的步骤配置,包含 description + createPrompt(对于 preprocess/analyze 还包括 outputSchema)。这些步骤会调用裁判 LLM。函数步骤 是普通函数,永远不会调用裁判。

import { createScorer } from "@mastra/core/evals";

const scorer = createScorer({
id: "my-custom-scorer",
name: "My Custom Scorer", // Optional, defaults to id
description: "Evaluates responses based on custom criteria",
type: "agent", // Optional: for agent evaluation with automatic typing
judge: {
model: myModel,
instructions: "You are an expert evaluator...",
},
})
.preprocess({
/* step config */
})
.analyze({
/* step config */
})
.generateScore(({ run, results }) => {
// Return a number
})
.generateReason({
/* step config */
});

createScorer 选项
Direct link to createScorer 选项

🌐 createScorer Options

id:

string
Unique identifier for the scorer. Used as the name if `name` is not provided.

name?:

string
Name of the scorer. Defaults to `id` if not provided.

description:

string
Description of what the scorer does.

judge?:

object
Optional judge configuration for LLM-based steps. See Judge Object section below.

type?:

string
Type specification for input/output. Use 'agent' for automatic agent types. For custom types, use the generic approach instead.

此函数返回一个评分器构建器,你可以在其上链接步骤方法。有关 .run() 方法及其输入/输出的详细信息,请参阅 MastraScorer 参考

🌐 This function returns a scorer builder that you can chain step methods onto. See the MastraScorer reference for details on the .run() method and its input/output.

法官对象
Direct link to 法官对象

🌐 Judge Object

model:

LanguageModel
The LLM model instance to use for evaluation.

instructions:

string
System prompt/instructions for the LLM.

评审只对被定义为提示对象的步骤执行(在提示模式下为 preprocessanalyzegenerateScoregenerateReason)。如果你只使用函数步骤,评审将不会被调用,也不会有可检查的 LLM 输出。在这种情况下,任何分数或理由都必须由你的函数生成。

🌐 The judge only runs for steps defined as prompt objects (preprocess, analyze, generateScore, generateReason in prompt mode). If you use function steps only, the judge is never called and there is no LLM output to inspect. In that case, any score/reason must be produced by your functions.

当一个提示对象步骤运行时,它的结构化 LLM 输出会存储在相应的结果字段中(preprocessStepResultanalyzeStepResult,或在 generateScore 中被 calculateScore 使用的值)。

🌐 When a prompt-object step runs, its structured LLM output is stored in the corresponding result field (preprocessStepResult, analyzeStepResult, or the value consumed by calculateScore in generateScore).

类型安全
Direct link to 类型安全

🌐 Type Safety

在创建评分器时,你可以指定输入/输出类型,以获得更好的类型推断和 IntelliSense 支持:

🌐 You can specify input/output types when creating scorers for better type inference and IntelliSense support:

Agent 类型快捷方式
Direct link to Agent 类型快捷方式

🌐 Agent Type Shortcut

在评估代理时,使用 type: 'agent' 自动获取代理输入/输出的正确类型:

🌐 For evaluating agents, use type: 'agent' to automatically get the correct types for agent input/output:

import { createScorer } from "@mastra/core/evals";

// Agent scorer with automatic typing
const agentScorer = createScorer({
id: "agent-response-quality",
description: "Evaluates agent responses",
type: "agent", // Automatically provides ScorerRunInputForAgent/ScorerRunOutputForAgent
})
.preprocess(({ run }) => {
// run.input is automatically typed as ScorerRunInputForAgent
const userMessage = run.inputData.inputMessages[0]?.content;
return { userMessage };
})
.generateScore(({ run, results }) => {
// run.output is automatically typed as ScorerRunOutputForAgent
const response = run.output[0]?.content;
return response.length > 10 ? 1.0 : 0.5;
});

带泛型的自定义类型
Direct link to 带泛型的自定义类型

🌐 Custom Types with Generics

对于自定义输入/输出类型,请使用通用方法:

🌐 For custom input/output types, use the generic approach:

import { createScorer } from "@mastra/core/evals";

type CustomInput = { query: string; context: string[] };
type CustomOutput = { answer: string; confidence: number };

const customScorer = createScorer<CustomInput, CustomOutput>({
id: "custom-scorer",
description: "Evaluates custom data",
}).generateScore(({ run }) => {
// run.input is typed as CustomInput
// run.output is typed as CustomOutput
return run.output.confidence;
});

内置Agent 类型
Direct link to 内置Agent 类型

🌐 Built-in Agent Types

  • ScorerRunInputForAgent - 包含 inputMessagesrememberedMessagessystemMessagestaggedSystemMessages 用于代理评估
  • ScorerRunOutputForAgent - 代理响应消息数组

使用这些类型可以提供自动补全、编译时验证,并为你的评分逻辑提供更好的文档。

🌐 Using these types provides autocomplete, compile-time validation, and better documentation for your scoring logic.

使用Agent 类型的跟踪评分
Direct link to 使用Agent 类型的跟踪评分

🌐 Trace Scoring with Agent Types

当你使用 type: 'agent' 时,你的评分器既可以直接添加到代理,也可以对代理交互的跟踪进行评分。评分器会自动将跟踪数据转换为适当的代理输入/输出格式:

🌐 When you use type: 'agent', your scorer is compatible for both adding directly to agents and scoring traces from agent interactions. The scorer automatically transforms trace data into the proper agent input/output format:

const agentTraceScorer = createScorer({
id: "agent-trace-length",
description: "Evaluates agent response length",
type: "agent",
}).generateScore(({ run }) => {
// Trace data is automatically transformed to agent format
const userMessages = run.inputData.inputMessages;
const agentResponse = run.output[0]?.content;

// Score based on response length
return agentResponse?.length > 50 ? 0 : 1;
});

// Register with Mastra for trace scoring
const mastra = new Mastra({
scorers: {
agentTraceScorer,
},
});

步骤方法签名
Direct link to 步骤方法签名

🌐 Step Method Signatures

preprocess
Direct link to preprocess

可选的预处理步骤,可以在分析之前提取或转换数据。

🌐 Optional preprocessing step that can extract or transform data before analysis.

功能模式: 功能: ({ run, results }) => any

run.input:

any
Input records provided to the scorer. If the scorer is added to an agent, this will be an array of user messages, e.g. `[{ role: 'user', content: 'hello world' }]`. If the scorer is used in a workflow, this will be the input of the workflow.

run.output:

any
Output record provided to the scorer. For agents, this is usually the agent's response. For workflows, this is the workflow's output.

run.runId:

string
Unique identifier for this scoring run.

run.requestContext?:

object
Request Context from the agent or workflow step being evaluated (optional).

results:

object
Empty object (no previous steps).

返回值:any
该方法可以返回任何值。返回的值将在后续步骤中作为 preprocessStepResult 使用。

🌐 Returns: any
The method can return any value. The returned value will be available to subsequent steps as preprocessStepResult.

提示对象模式:

description:

string
Description of what this preprocessing step does.

outputSchema:

ZodSchema
Zod schema for the expected output of the preprocess step.

createPrompt:

function
Function: ({ run, results }) => string. Returns the prompt for the LLM.

judge?:

object
(Optional) LLM judge for this step (can override main judge). See Judge Object section.

analyze
Direct link to analyze

可选的分析步骤,用于处理输入/输出和任何预处理数据。

🌐 Optional analysis step that processes the input/output and any preprocessed data.

功能模式: 功能: ({ run, results }) => any

run.input:

any
Input records provided to the scorer. If the scorer is added to an agent, this will be an array of user messages, e.g. `[{ role: 'user', content: 'hello world' }]`. If the scorer is used in a workflow, this will be the input of the workflow.

run.output:

any
Output record provided to the scorer. For agents, this is usually the agent's response. For workflows, this is the workflow's output.

run.runId:

string
Unique identifier for this scoring run.

run.requestContext?:

object
Request Context from the agent or workflow step being evaluated (optional).

results.preprocessStepResult?:

any
Result from preprocess step, if defined (optional).

返回值:any
该方法可以返回任何值。返回的值将在后续步骤中作为 analyzeStepResult 使用。

🌐 Returns: any
The method can return any value. The returned value will be available to subsequent steps as analyzeStepResult.

提示对象模式:

description:

string
Description of what this analysis step does.

outputSchema:

ZodSchema
Zod schema for the expected output of the analyze step.

createPrompt:

function
Function: ({ run, results }) => string. Returns the prompt for the LLM.

judge?:

object
(Optional) LLM judge for this step (can override main judge). See Judge Object section.

generateScore
Direct link to generateScore

必需 步骤,用于计算最终的数值得分。

功能模式: 功能: ({ run, results }) => number

run.input:

any
Input records provided to the scorer. If the scorer is added to an agent, this will be an array of user messages, e.g. `[{ role: 'user', content: 'hello world' }]`. If the scorer is used in a workflow, this will be the input of the workflow.

run.output:

any
Output record provided to the scorer. For agents, this is usually the agent's response. For workflows, this is the workflow's output.

run.runId:

string
Unique identifier for this scoring run.

run.requestContext?:

object
Request Context from the agent or workflow step being evaluated (optional).

results.preprocessStepResult?:

any
Result from preprocess step, if defined (optional).

results.analyzeStepResult?:

any
Result from analyze step, if defined (optional).

返回值:number
该方法必须返回一个数值分数。

🌐 Returns: number
The method must return a numerical score.

提示对象模式:

description:

string
Description of what this scoring step does.

outputSchema:

ZodSchema
Zod schema for the expected output of the generateScore step.

createPrompt:

function
Function: ({ run, results }) => string. Returns the prompt for the LLM.

judge?:

object
(Optional) LLM judge for this step (can override main judge). See Judge Object section.

在使用提示对象模式时,你还必须提供一个 calculateScore 函数,将 LLM 输出转换为数值评分:

🌐 When using prompt object mode, you must also provide a calculateScore function to convert the LLM output to a numerical score:

calculateScore:

function
Function: ({ run, results, analyzeStepResult }) => number. Converts the LLM's structured output into a numerical score.

generateReason
Direct link to generateReason

可选步骤,提供分数的解释。

🌐 Optional step that provides an explanation for the score.

功能模式: 功能: ({ run, results, score }) => string

run.input:

any
Input records provided to the scorer. If the scorer is added to an agent, this will be an array of user messages, e.g. `[{ role: 'user', content: 'hello world' }]`. If the scorer is used in a workflow, this will be the input of the workflow.

run.output:

any
Output record provided to the scorer. For agents, this is usually the agent's response. For workflows, this is the workflow's output.

run.runId:

string
Unique identifier for this scoring run.

run.requestContext?:

object
Request Context from the agent or workflow step being evaluated (optional).

results.preprocessStepResult?:

any
Result from preprocess step, if defined (optional).

results.analyzeStepResult?:

any
Result from analyze step, if defined (optional).

score:

number
Score computed by the generateScore step.

返回:string
该方法必须返回一个说明分数的字符串。

🌐 Returns: string
The method must return a string explaining the score.

提示对象模式:

description:

string
Description of what this reasoning step does.

createPrompt:

function
Function: ({ run, results, score }) => string. Returns the prompt for the LLM.

judge?:

object
(Optional) LLM judge for this step (can override main judge). See Judge Object section.

所有步骤函数都可以是异步的。

🌐 All step functions can be async.