Skip to main content

完整性评分器

🌐 Completeness Scorer

createCompletenessScorer() 函数用于评估大型语言模型 (LLM) 输出对输入中关键元素的覆盖程度。它分析名词、动词、主题和术语,以确定覆盖情况,并提供详细的完整性评分。

🌐 The createCompletenessScorer() function evaluates how thoroughly an LLM's output covers the key elements present in the input. It analyzes nouns, verbs, topics, and terms to determine coverage and provides a detailed completeness score.

参数
Direct link to 参数

🌐 Parameters

createCompletenessScorer() 函数不接受任何选项。

🌐 The createCompletenessScorer() function does not take any options.

此函数返回 MastraScorer 类的一个实例。有关 .run() 方法及其输入/输出的详细信息,请参见 MastraScorer 参考

🌐 This function returns an instance of the MastraScorer class. See the MastraScorer reference for details on the .run() method and its input/output.

.run() 返回
Direct link to .run() 返回

🌐 .run() Returns

runId:

string
The id of the run (optional).

preprocessStepResult:

object
Object with extracted elements and coverage details: { inputElements: string[], outputElements: string[], missingElements: string[], elementCounts: { input: number, output: number } }

score:

number
Completeness score (0-1) representing the proportion of input elements covered in the output.

.run() 方法返回的结果形状如下:

🌐 The .run() method returns a result in the following shape:

{
runId: string,
extractStepResult: {
inputElements: string[],
outputElements: string[],
missingElements: string[],
elementCounts: { input: number, output: number }
},
score: number
}

元素提取详情
Direct link to 元素提取详情

🌐 Element Extraction Details

评分者提取并分析几种类型的元素:

🌐 The scorer extracts and analyzes several types of elements:

  • 名词:关键对象、概念和实体
  • 动词:动作和状态(转为不定式形式)
  • 主题:主要科目和主题
  • 术语:个别重要词

提取过程包括:

🌐 The extraction process includes:

  • 文本规范化(去除变音符号,转换为小写)
  • 拆分驼峰命名的单词
  • 词边界处理
  • 对短词(3个字符或更少)的特殊处理
  • 元素去重

extractStepResult
Direct link to extractStepResult

通过 .run() 方法,你可以获得具有以下属性的 extractStepResult 对象:

🌐 From the .run() method, you can get the extractStepResult object with the following properties:

  • inputElements:输入中包含的关键元素(例如,名词、动词、主题、术语)。
  • outputElements:输出中找到的关键元素。
  • missingElements:在输出中未找到输入元素。
  • elementCounts:输入和输出中的元素数量。

评分详情
Direct link to 评分详情

🌐 Scoring Details

评分者通过语言元素覆盖分析来评估完整性。

🌐 The scorer evaluates completeness through linguistic element coverage analysis.

评分流程
Direct link to 评分流程

🌐 Scoring Process

  1. 提取关键要素:
    • 名词和命名实体
    • 动作动词
    • 专题术语
    • 规范化词形
  2. 计算输入元素的覆盖率:
    • 短词(≤3 个字符)的完全匹配
    • 较长术语的重叠度较高(>60%)

最终得分:(covered_elements / total_input_elements) * scale

🌐 Final score: (covered_elements / total_input_elements) * scale

分数解释
Direct link to 分数解释

🌐 Score interpretation

完整性分数,介于0到1之间:

🌐 A completeness score between 0 and 1:

  • 1.0:全面详尽地处理了问题的各个方面。
  • 0.7–0.9:涵盖了大部分重要方面,细节良好,仅有少量遗漏。
  • 0.4–0.6:涉及了一些关键点,但缺少重要方面或细节不足。
  • 0.1–0.3:仅部分回答了问题,存在明显缺漏。
  • 0.0:未能回答问题或提供了无关的信息。

示例
Direct link to 示例

🌐 Example

评估代理在不同查询复杂度下的响应完整性:

🌐 Evaluate agent responses for completeness across different query complexities:

src/example-completeness.ts
import { runEvals } from "@mastra/core/evals";
import { createCompletenessScorer } from "@mastra/evals/scorers/prebuilt";
import { myAgent } from "./agent";

const scorer = createCompletenessScorer();

const result = await runEvals({
data: [
{
input:
"Explain the process of photosynthesis, including the inputs, outputs, and stages involved.",
},
{
input:
"What are the benefits and drawbacks of remote work for both employees and employers?",
},
{
input:
"Compare renewable and non-renewable energy sources in terms of cost, environmental impact, and sustainability.",
},
],
scorers: [scorer],
target: myAgent,
onItemComplete: ({ scorerResults }) => {
console.log({
score: scorerResults[scorer.id].score,
});
},
});

console.log(result.scores);

有关 runEvals 的更多详细信息,请参阅 runEvals 参考

🌐 For more details on runEvals, see the runEvals reference.

要将此评分器添加到代理中,请参阅 评分器概览 指南。

🌐 To add this scorer to an agent, see the Scorers overview guide.

🌐 Related