Skip to main content

答案相关性评分器

🌐 Answer Relevancy Scorer

createAnswerRelevancyScorer() 函数接受一个包含以下属性的单个选项对象:

🌐 The createAnswerRelevancyScorer() function accepts a single options object with the following properties:

参数
Direct link to 参数

🌐 Parameters

model:

LanguageModel
Configuration for the model used to evaluate relevancy.

uncertaintyWeight:

number
= 0.3
Weight given to 'unsure' verdicts in scoring (0-1).

scale:

number
= 1
Maximum score value.

此函数返回 MastraScorer 类的一个实例。.run() 方法接受与其他评分器相同的输入(参见 MastraScorer 参考),但返回值包括如下所述的特定于 LLM 的字段。

🌐 This function returns an instance of the MastraScorer class. The .run() method accepts the same input as other scorers (see the MastraScorer reference), but the return value includes LLM-specific fields as documented below.

.run() 返回
Direct link to .run() 返回

🌐 .run() Returns

runId:

string
The id of the run (optional).

score:

number
Relevancy score (0 to scale, default 0-1)

preprocessPrompt:

string
The prompt sent to the LLM for the preprocess step (optional).

preprocessStepResult:

object
Object with extracted statements: { statements: string[] }

analyzePrompt:

string
The prompt sent to the LLM for the analyze step (optional).

analyzeStepResult:

object
Object with results: { results: Array<{ result: 'yes' | 'unsure' | 'no', reason: string }> }

generateReasonPrompt:

string
The prompt sent to the LLM for the reason step (optional).

reason:

string
Explanation of the score.

评分详情
Direct link to 评分详情

🌐 Scoring Details

评分者通过查询与答案的对齐来评估相关性,考虑完整性和细节水平,但不考虑事实正确性。

🌐 The scorer evaluates relevancy through query-answer alignment, considering completeness and detail level, but not factual correctness.

评分流程
Direct link to 评分流程

🌐 Scoring Process

  1. 语句预处理:
    • 将输出分解为有意义的语句,同时保留上下文。
  2. 相关性分析:
    • 每个语句被评估为:
      • “是”:直接匹配的全权重
      • "unsure": 对近似匹配的部分权重(默认值:0.3)
      • “不”:对无关内容零权重
  3. 分数计算:
    • ((direct + uncertainty * partial) / total_statements) * scale

分数解释
Direct link to 分数解释

🌐 Score Interpretation

相关性评分介于 0 到 1 之间:

🌐 A relevancy score between 0 and 1:

  • 1.0:该回答完全回答了问题,提供了相关且集中的信息。
  • 0.7–0.9:回答大部分满足查询需求,但可能包含少量无关内容。
  • 0.4–0.6:回答部分解决了问题,混合了相关和无关的信息。
  • 0.1–0.3:响应仅包含少量相关内容,且在很大程度上未能抓住查询的意图。
  • 0.0:回复完全无关,并没有回答问题。

示例
Direct link to 示例

🌐 Example

评估代理在不同场景下的响应相关性:

🌐 Evaluate agent responses for relevancy across different scenarios:

src/example-answer-relevancy.ts
import { runEvals } from "@mastra/core/evals";
import { createAnswerRelevancyScorer } from "@mastra/evals/scorers/prebuilt";
import { myAgent } from "./agent";

const scorer = createAnswerRelevancyScorer({ model: "openai/gpt-4o" });

const result = await runEvals({
data: [
{
input: "What are the health benefits of regular exercise?",
},
{
input: "What should a healthy breakfast include?",
},
{
input: "What are the benefits of meditation?",
},
],
scorers: [scorer],
target: myAgent,
onItemComplete: ({ scorerResults }) => {
console.log({
score: scorerResults[scorer.id].score,
reason: scorerResults[scorer.id].reason,
});
},
});

console.log(result.scores);

有关 runEvals 的更多详细信息,请参阅 runEvals 参考

🌐 For more details on runEvals, see the runEvals reference.

要将此评分器添加到代理中,请参阅 评分器概览 指南。

🌐 To add this scorer to an agent, see the Scorers overview guide.

🌐 Related