答案相似度评分器

🌐 Answer Similarity Scorer

createAnswerSimilarityScorer() 函数创建一个评分器，用于评估一个代理的输出与标准答案的相似程度。该评分器专门用于 CI/CD 测试场景，在这些场景中，你有预期答案，并希望确保随着时间推移结果的一致性。

🌐 The createAnswerSimilarityScorer() function creates a scorer that evaluates how similar an agent's output is to a ground truth answer. This scorer is specifically designed for CI/CD testing scenarios where you have expected answers and want to ensure consistency over time.

参数
Direct link to 参数

🌐 Parameters

model:

LanguageModel

The language model used to evaluate semantic similarity between outputs and ground truth.

options:

AnswerSimilarityOptions

Configuration options for the scorer.

AnswerSimilarityOptions
Direct link to AnswerSimilarityOptions

requireGroundTruth:

boolean

= true

Whether to require ground truth for evaluation. If false, missing ground truth returns score 0.

semanticThreshold:

number

= 0.8

Weight for semantic matches vs exact matches (0-1).

exactMatchBonus:

number

= 0.2

Additional score bonus for exact matches (0-1).

missingPenalty:

number

= 0.15

Penalty per missing key concept from ground truth.

contradictionPenalty:

number

= 1.0

Penalty for contradictory information. High value ensures wrong answers score near 0.

extraInfoPenalty:

number

= 0.05

Mild penalty for extra information not present in ground truth (capped at 0.2).

scale:

number

= 1

Score scaling factor.

此函数返回 MastraScorer 类的一个实例。.run() 方法接受与其他评分器相同的输入（参见 MastraScorer 参考），但需要在运行对象中提供真实值。

🌐 This function returns an instance of the MastraScorer class. The .run() method accepts the same input as other scorers (see the MastraScorer reference), but requires ground truth to be provided in the run object.

.run() 返回
Direct link to .run() 返回

🌐 .run() Returns

runId:

string

The id of the run (optional).

score:

number

Similarity score between 0-1 (or 0-scale if custom scale used). Higher scores indicate better similarity to ground truth.

reason:

string

Human-readable explanation of the score with actionable feedback.

preprocessStepResult:

object

Extracted semantic units from output and ground truth.

analyzeStepResult:

object

Detailed analysis of matches, contradictions, and extra information.

preprocessPrompt:

string

The prompt used for semantic unit extraction.

analyzePrompt:

string

The prompt used for similarity analysis.

generateReasonPrompt:

string

The prompt used for generating the explanation.

评分详情
Direct link to 评分详情

🌐 Scoring Details

评分者使用多步骤流程：

🌐 The scorer uses a multi-step process:

提取：将输出结果和真实结果分解为语义单元
分析：比较单元并识别匹配点、矛盾和空白
评分：计算带有矛盾惩罚的加权相似度
原因：生成可供人类阅读的解释

分数计算：max(0, base_score - contradiction_penalty - missing_penalty - extra_info_penalty) × scale

🌐 Score calculation: max(0, base_score - contradiction_penalty - missing_penalty - extra_info_penalty) × scale

示例
Direct link to 示例

🌐 Example

评估代理在不同情境下与真实答案的相似性：

🌐 Evaluate agent responses for similarity to ground truth across different scenarios:

src/example-answer-similarity.ts
import { runEvals } from "@mastra/core/evals";
import { createAnswerSimilarityScorer } from "@mastra/evals/scorers/prebuilt";
import { myAgent } from "./agent";

const scorer = createAnswerSimilarityScorer({ model: "openai/gpt-4o" });

const result = await runEvals({
  data: [
    {
      input: "What is 2+2?",
      groundTruth: "4",
    },
    {
      input: "What is the capital of France?",
      groundTruth: "The capital of France is Paris",
    },
    {
      input: "What are the primary colors?",
      groundTruth: "The primary colors are red, blue, and yellow",
    },
  ],
  scorers: [scorer],
  target: myAgent,
  onItemComplete: ({ scorerResults }) => {
    console.log({ 
      score: scorerResults[scorer.id].score,
      reason: scorerResults[scorer.id].reason,
    });
  },
});

console.log(result.scores);

有关 runEvals 的更多详细信息，请参阅 runEvals 参考。

🌐 For more details on runEvals, see the runEvals reference.

要将此评分器添加到代理中，请参阅评分器概览指南。

🌐 To add this scorer to an agent, see the Scorers overview guide.

参数Direct link to 参数

model:

options:

AnswerSimilarityOptionsDirect link to AnswerSimilarityOptions

requireGroundTruth:

semanticThreshold:

exactMatchBonus:

missingPenalty:

contradictionPenalty:

extraInfoPenalty:

scale:

.run() 返回Direct link to .run() 返回

runId:

score:

reason:

preprocessStepResult:

analyzeStepResult:

preprocessPrompt:

analyzePrompt:

generateReasonPrompt:

评分详情Direct link to 评分详情

示例Direct link to 示例

参数
Direct link to 参数

AnswerSimilarityOptions
Direct link to AnswerSimilarityOptions

.run() 返回
Direct link to .run() 返回

评分详情
Direct link to 评分详情

示例
Direct link to 示例