Skip to main content

关键词覆盖评分器

🌐 Keyword Coverage Scorer

createKeywordCoverageScorer() 函数评估大型语言模型(LLM)的输出在多大程度上覆盖了输入中的重要关键词。它在分析关键词的出现和匹配时,会忽略常用词和停用词。

🌐 The createKeywordCoverageScorer() function evaluates how well an LLM's output covers the important keywords from the input. It analyzes keyword presence and matches while ignoring common words and stop words.

参数
Direct link to 参数

🌐 Parameters

createKeywordCoverageScorer() 函数不接受任何选项。

🌐 The createKeywordCoverageScorer() function does not take any options.

此函数返回 MastraScorer 类的一个实例。有关 .run() 方法及其输入/输出的详细信息,请参见 MastraScorer 参考

🌐 This function returns an instance of the MastraScorer class. See the MastraScorer reference for details on the .run() method and its input/output.

.run() 返回
Direct link to .run() 返回

🌐 .run() Returns

runId:

string
The id of the run (optional).

preprocessStepResult:

object
Object with extracted keywords: { referenceKeywords: Set<string>, responseKeywords: Set<string> }

analyzeStepResult:

object
Object with keyword coverage: { totalKeywords: number, matchedKeywords: number }

score:

number
Coverage score (0-1) representing the proportion of matched keywords.

.run() 返回的结果形状如下:

{
runId: string,
extractStepResult: {
referenceKeywords: Set<string>,
responseKeywords: Set<string>
},
analyzeStepResult: {
totalKeywords: number,
matchedKeywords: number
},
score: number
}

评分详情
Direct link to 评分详情

🌐 Scoring Details

评分者通过将关键词与以下特性匹配来评估关键词覆盖情况:

🌐 The scorer evaluates keyword coverage by matching keywords with the following features:

  • 常用词和停用词过滤(例如,“the”、“a”、“and”)
  • 不区分大小写的匹配
  • 词形变化处理
  • 技术术语和复合词的特殊处理

评分流程
Direct link to 评分流程

🌐 Scoring Process

  1. 处理来自输入和输出的关键字:
    • 过滤掉常用词和停止词
    • 规范大小写和词形
    • 处理特殊术语和复合词
  2. 计算关键词覆盖率:
    • 匹配文本间的关键词
    • 统计成功匹配
    • 计算覆盖率

最终得分:(matched_keywords / total_keywords) * scale

🌐 Final score: (matched_keywords / total_keywords) * scale

分数解释
Direct link to 分数解释

🌐 Score interpretation

覆盖分数在 0 到 1 之间:

🌐 A coverage score between 0 and 1:

  • 1.0:全面覆盖——所有关键词均已包含。
  • 0.7–0.9:覆盖率高——包含了大多数关键词。
  • 0.4–0.6:部分覆盖 – 存在一些关键词。
  • 0.1–0.3:覆盖率低——匹配的关键词很少。
  • 0.0:无覆盖 – 未找到关键词。

特殊情况
Direct link to 特殊情况

🌐 Special Cases

评分器处理几种特殊情况:

🌐 The scorer handles several special cases:

  • 空输入/输出:如果两者都为空,则返回得分1.0;如果只有一个为空,则返回得分0.0
  • 单词:作为一个关键词处理
  • 技术术语:保留复合技术术语(例如,“React.js”、“机器学习”)
  • 大小写差异:“JavaScript”匹配“javascript”
  • 常用词:在评分中被忽略,以便专注于有意义的关键词

示例
Direct link to 示例

🌐 Example

评估输入查询与代理响应之间的关键词覆盖率:

🌐 Evaluate keyword coverage between input queries and agent responses:

src/example-keyword-coverage.ts
import { runEvals } from "@mastra/core/evals";
import { createKeywordCoverageScorer } from "@mastra/evals/scorers/prebuilt";
import { myAgent } from "./agent";

const scorer = createKeywordCoverageScorer();

const result = await runEvals({
data: [
{
input: "JavaScript frameworks like React and Vue",
},
{
input: "TypeScript offers interfaces, generics, and type inference",
},
{
input:
"Machine learning models require data preprocessing, feature engineering, and hyperparameter tuning",
},
],
scorers: [scorer],
target: myAgent,
onItemComplete: ({ scorerResults }) => {
console.log({
score: scorerResults[scorer.id].score,
});
},
});

console.log(result.scores);

有关 runEvals 的更多详细信息,请参阅 runEvals 参考

🌐 For more details on runEvals, see the runEvals reference.

要将此评分器添加到代理中,请参阅 评分器概览 指南。

🌐 To add this scorer to an agent, see the Scorers overview guide.

🌐 Related