完整性评分器
🌐 Completeness Scorer
createCompletenessScorer() 函数用于评估大型语言模型 (LLM) 输出对输入中关键元素的覆盖程度。它分析名词、动词、主题和术语,以确定覆盖情况,并提供详细的完整性评分。
🌐 The createCompletenessScorer() function evaluates how thoroughly an LLM's output covers the key elements present in the input. It analyzes nouns, verbs, topics, and terms to determine coverage and provides a detailed completeness score.
参数Direct link to 参数
🌐 Parameters
createCompletenessScorer() 函数不接受任何选项。
🌐 The createCompletenessScorer() function does not take any options.
此函数返回 MastraScorer 类的一个实例。有关 .run() 方法及其输入/输出的详细信息,请参见 MastraScorer 参考。
🌐 This function returns an instance of the MastraScorer class. See the MastraScorer reference for details on the .run() method and its input/output.
.run() 返回Direct link to .run() 返回
🌐 .run() Returns
runId:
preprocessStepResult:
score:
.run() 方法返回的结果形状如下:
🌐 The .run() method returns a result in the following shape:
{
runId: string,
extractStepResult: {
inputElements: string[],
outputElements: string[],
missingElements: string[],
elementCounts: { input: number, output: number }
},
score: number
}
元素提取详情Direct link to 元素提取详情
🌐 Element Extraction Details
评分者提取并分析几种类型的元素:
🌐 The scorer extracts and analyzes several types of elements:
- 名词:关键对象、概念和实体
- 动词:动作和状态(转为不定式形式)
- 主题:主要科目和主题
- 术语:个别重要词
提取过程包括:
🌐 The extraction process includes:
- 文本规范化(去除变音符号,转换为小写)
- 拆分驼峰命名的单词
- 词边界处理
- 对短词(3个字符或更少)的特殊处理
- 元素去重
extractStepResultDirect link to extractStepResult
通过 .run() 方法,你可以获得具有以下属性的 extractStepResult 对象:
🌐 From the .run() method, you can get the extractStepResult object with the following properties:
- inputElements:输入中包含的关键元素(例如,名词、动词、主题、术语)。
- outputElements:输出中找到的关键元素。
- missingElements:在输出中未找到输入元素。
- elementCounts:输入和输出中的元素数量。
评分详情Direct link to 评分详情
🌐 Scoring Details
评分者通过语言元素覆盖分析来评估完整性。
🌐 The scorer evaluates completeness through linguistic element coverage analysis.
评分流程Direct link to 评分流程
🌐 Scoring Process
- 提取关键要素:
- 名词和命名实体
- 动作动词
- 专题术语
- 规范化词形
- 计算输入元素的覆盖率:
- 短词(≤3 个字符)的完全匹配
- 较长术语的重叠度较高(>60%)
最终得分:(covered_elements / total_input_elements) * scale
🌐 Final score: (covered_elements / total_input_elements) * scale
分数解释Direct link to 分数解释
🌐 Score interpretation
完整性分数,介于0到1之间:
🌐 A completeness score between 0 and 1:
- 1.0:全面详尽地处理了问题的各个方面。
- 0.7–0.9:涵盖了大部分重要方面,细节良好,仅有少量遗漏。
- 0.4–0.6:涉及了一些关键点,但缺少重要方面或细节不足。
- 0.1–0.3:仅部分回答了问题,存在明显缺漏。
- 0.0:未能回答问题或提供了无关的信息。
示例Direct link to 示例
🌐 Example
评估代理在不同查询复杂度下的响应完整性:
🌐 Evaluate agent responses for completeness across different query complexities:
import { runEvals } from "@mastra/core/evals";
import { createCompletenessScorer } from "@mastra/evals/scorers/prebuilt";
import { myAgent } from "./agent";
const scorer = createCompletenessScorer();
const result = await runEvals({
data: [
{
input:
"Explain the process of photosynthesis, including the inputs, outputs, and stages involved.",
},
{
input:
"What are the benefits and drawbacks of remote work for both employees and employers?",
},
{
input:
"Compare renewable and non-renewable energy sources in terms of cost, environmental impact, and sustainability.",
},
],
scorers: [scorer],
target: myAgent,
onItemComplete: ({ scorerResults }) => {
console.log({
score: scorerResults[scorer.id].score,
});
},
});
console.log(result.scores);
有关 runEvals 的更多详细信息,请参阅 runEvals 参考。
🌐 For more details on runEvals, see the runEvals reference.
要将此评分器添加到代理中,请参阅 评分器概览 指南。
🌐 To add this scorer to an agent, see the Scorers overview guide.
相关Direct link to 相关
🌐 Related