Skip to main content

runEvals

runEvals 函数通过同时对多个评分器运行多个测试用例,实现了对代理和工作流的批量评估。这对于系统化测试、性能分析以及 AI 系统的验证是至关重要的。

🌐 The runEvals function enables batch evaluation of agents and workflows by running multiple test cases against scorers concurrently. This is essential for systematic testing, performance analysis, and validation of AI systems.

使用示例
Direct link to 使用示例

🌐 Usage Example

import { runEvals } from "@mastra/core/evals";
import { myAgent } from "./agents/my-agent";
import { myScorer1, myScorer2 } from "./scorers";

const result = await runEvals({
target: myAgent,
data: [
{ input: "What is machine learning?" },
{ input: "Explain neural networks" },
{ input: "How does AI work?" },
],
scorers: [myScorer1, myScorer2],
concurrency: 2,
onItemComplete: ({ item, targetResult, scorerResults }) => {
console.log(`Completed: ${item.input}`);
console.log(`Scores:`, scorerResults);
},
});

console.log(`Average scores:`, result.scores);
console.log(`Processed ${result.summary.totalItems} items`);

参数
Direct link to 参数

🌐 Parameters

target:

Agent | Workflow
The agent or workflow to evaluate.

data:

RunEvalsDataItem[]
Array of test cases with input data and optional ground truth.

scorers:

MastraScorer[] | WorkflowScorerConfig
Array of scorers for agents, or configuration object for workflows specifying scorers for the workflow and individual steps.

concurrency?:

number
= 1
Number of test cases to run concurrently.

onItemComplete?:

function
Callback function called after each test case completes. Receives item, target result, and scorer results.

数据项结构
Direct link to 数据项结构

🌐 Data Item Structure

input:

string | string[] | CoreMessage[] | any
Input data for the target. For agents: messages or strings. For workflows: workflow input data.

groundTruth?:

any
Expected or reference output for comparison during scoring.

requestContext?:

RequestContext
Request Context to pass to the target during execution.

tracingContext?:

TracingContext
Tracing context for observability and debugging.

工作流评分器配置
Direct link to 工作流评分器配置

🌐 Workflow Scorer Configuration

对于工作流,你可以使用 WorkflowScorerConfig 在不同级别指定评分器:

🌐 For workflows, you can specify scorers at different levels using WorkflowScorerConfig:

workflow?:

MastraScorer[]
Array of scorers to evaluate the entire workflow output.

steps?:

Record<string, MastraScorer[]>
Object mapping step IDs to arrays of scorers for evaluating individual step outputs.

返回
Direct link to 返回

🌐 Returns

scores:

Record<string, any>
Average scores across all test cases, organized by scorer name.

summary:

object
Summary information about the experiment execution.

summary.totalItems:

number
Total number of test cases processed.

示例
Direct link to 示例

🌐 Examples

代理评估
Direct link to 代理评估

🌐 Agent Evaluation

import { createScorer, runEvals } from "@mastra/core/evals";

const myScorer = createScorer({
id: "my-scorer",
description: "Check if Agent's response contains ground truth",
type: "agent",
}).generateScore(({ run }) => {
const response = run.output[0]?.content || "";
const expectedResponse = run.groundTruth;
return response.includes(expectedResponse) ? 1 : 0;
});

const result = await runEvals({
target: chatAgent,
data: [
{
input: "What is AI?",
groundTruth:
"AI is a field of computer science that creates intelligent machines.",
},
{
input: "How does machine learning work?",
groundTruth:
"Machine learning uses algorithms to learn patterns from data.",
},
],
scorers: [relevancyScorer],
concurrency: 3,
});

工作流程评估
Direct link to 工作流程评估

🌐 Workflow Evaluation

const workflowResult = await runEvals({
target: myWorkflow,
data: [
{ input: { query: "Process this data", priority: "high" } },
{ input: { query: "Another task", priority: "low" } },
],
scorers: {
workflow: [outputQualityScorer],
steps: {
"validation-step": [validationScorer],
"processing-step": [processingScorer],
},
},
onItemComplete: ({ item, targetResult, scorerResults }) => {
console.log(`Workflow completed for: ${item.inputData.query}`);
if (scorerResults.workflow) {
console.log("Workflow scores:", scorerResults.workflow);
}
if (scorerResults.steps) {
console.log("Step scores:", scorerResults.steps);
}
},
});

🌐 Related