上下文相关性评分器
🌐 Context Relevance Scorer
createContextRelevanceScorerLLM() 函数创建一个评分器,用于评估所提供的上下文在生成代理响应时的相关性和有效性。它使用加权相关性等级,并对未使用的高度相关上下文和缺失的信息施加惩罚。
🌐 The createContextRelevanceScorerLLM() function creates a scorer that evaluates how relevant and useful provided context was for generating agent responses. It uses weighted relevance levels and applies penalties for unused high-relevance context and missing information.
它在以下这些用例中特别有用:
🌐 It is especially useful for these use cases:
内容生成评估
最适合评估以下方面的上下文质量:
🌐 Best for evaluating context quality in:
- 上下文使用很重要的聊天系统
- 需要细致相关性评估的 RAG 流水线
- 缺失上下文影响质量的系统
上下文选择优化
用于优化时:
🌐 Use when optimizing for:
- 全面的上下文覆盖
- 有效的上下文利用
- 识别情境差距
参数Direct link to 参数
🌐 Parameters
model:
options:
注意:必须提供 context 或 contextExtractor 中的一个。如果两者都提供,则以 contextExtractor 为准。
🌐 Note: Either context or contextExtractor must be provided. If both are provided, contextExtractor takes precedence.
.run() 返回Direct link to .run() 返回
🌐 .run() Returns
score:
reason:
评分详情Direct link to 评分详情
🌐 Scoring Details
加权相关性评分Direct link to 加权相关性评分
🌐 Weighted Relevance Scoring
上下文相关性使用一种考虑以下因素的复杂评分算法:
🌐 Context Relevance uses a sophisticated scoring algorithm that considers:
- 相关性等级:每个上下文部分都根据权重值进行分类:
high= 1.0(直接回应查询)medium= 0.7(支持信息)low= 0.3(略相关)none= 0.0(完全无关)
- 使用检测:跟踪相关上下文是否在回复中被实际使用
- 适用处罚(可通过
penalties选项配置):- 未使用的高相关性:每个未使用的高相关性上下文的惩罚
unusedHighRelevanceContext(默认值:0.1) - 缺失上下文:针对已识别的缺失信息,可设置最高
maxMissingContextPenalty(默认值:0.5)
- 未使用的高相关性:每个未使用的高相关性上下文的惩罚
评分公式Direct link to 评分公式
🌐 Scoring Formula
Base Score = Σ(relevance_weights) / (num_contexts × 1.0)
Usage Penalty = count(unused_high_relevance) × unusedHighRelevanceContext
Missing Penalty = min(count(missing_context) × missingContextPerItem, maxMissingContextPenalty)
Final Score = max(0, Base Score - Usage Penalty - Missing Penalty) × scale
默认值:
unusedHighRelevanceContext= 0.1(每个未使用的高度相关上下文扣除 10%)missingContextPerItem= 0.15(每缺失一个上下文项罚15%)maxMissingContextPenalty= 0.5(缺失上下文的最高惩罚为50%)scale= 1
分数解释Direct link to 分数解释
🌐 Score interpretation
- 0.9-1.0:优秀 - 所有内容高度相关且被使用
- 0.7-0.8:良好 - 主要相关,仅有少量缺失
- 0.4-0.6:混合 - 重要的无关或未使用的上下文
- 0.2-0.3:差 - 大多数上下文不相关
- 0.0-0.1:非常差 - 未找到相关内容
原因分析Direct link to 原因分析
🌐 Reason analysis
原因栏提供了以下方面的见解:
🌐 The reason field provides insights on:
- 每个上下文片段的相关性级别(高/中/低/无)
- 响应中实际上使用了哪种上下文
- 未使用的重要相关上下文将应用惩罚(可通过
unusedHighRelevanceContext配置) - 缺少本可以改善回复的上下文(通过
missingContextPerItem至maxMissingContextPenalty进行处罚)
优化策略Direct link to 优化策略
🌐 Optimization strategies
使用结果来改进你的系统:
🌐 Use results to improve your system:
- 过滤无关内容:在处理之前删除低相关或无关的信息
- 确保上下文使用:确保纳入高度相关的上下文
- 填补背景空白:添加评分者识别出的缺失信息
- 平衡上下文大小:找到最佳上下文量以获得最佳相关性
- 调节惩罚敏感度:根据你的应用对未使用或缺失上下文的容忍度调整
unusedHighRelevanceContext、missingContextPerItem和maxMissingContextPenalty
与上下文精确度的差异Direct link to 与上下文精确度的差异
🌐 Difference from Context Precision
| 方面 | 上下文相关性 | 上下文精确度 |
|---|---|---|
| 算法 | 带有惩罚的加权等级 | 平均精度均值(MAP) |
| 相关性 | 多级(高/中/低/无) | 二元(是/否) |
| 位置 | 未考虑 | 关键(奖励提前出现的位置) |
| 使用情况 | 跟踪并惩罚未使用的上下文 | 未考虑 |
| 缺失 | 识别并惩罚空缺 | 未评估 |
评分器配置Direct link to 评分器配置
🌐 Scorer configuration
自定义惩罚配置Direct link to 自定义惩罚配置
🌐 Custom penalty configuration
控制未使用和缺失上下文时的惩罚应用方式:
🌐 Control how penalties are applied for unused and missing context:
import { createContextRelevanceScorerLLM } from "@mastra/evals";
// Stricter penalty configuration
const strictScorer = createContextRelevanceScorerLLM({
model: "openai/gpt-5.1",
options: {
context: [
"Einstein won the Nobel Prize for photoelectric effect",
"He developed the theory of relativity",
"Einstein was born in Germany",
],
penalties: {
unusedHighRelevanceContext: 0.2, // 20% penalty per unused high-relevance context
missingContextPerItem: 0.25, // 25% penalty per missing context item
maxMissingContextPenalty: 0.6, // Maximum 60% penalty for missing context
},
scale: 1,
},
});
// Lenient penalty configuration
const lenientScorer = createContextRelevanceScorerLLM({
model: "openai/gpt-5.1",
options: {
context: [
"Einstein won the Nobel Prize for photoelectric effect",
"He developed the theory of relativity",
"Einstein was born in Germany",
],
penalties: {
unusedHighRelevanceContext: 0.05, // 5% penalty per unused high-relevance context
missingContextPerItem: 0.1, // 10% penalty per missing context item
maxMissingContextPenalty: 0.3, // Maximum 30% penalty for missing context
},
scale: 1,
},
});
const testRun = {
input: {
inputMessages: [
{
id: "1",
role: "user",
content: "What did Einstein achieve in physics?",
},
],
},
output: [
{
id: "2",
role: "assistant",
content:
"Einstein won the Nobel Prize for his work on the photoelectric effect.",
},
],
};
const strictResult = await strictScorer.run(testRun);
const lenientResult = await lenientScorer.run(testRun);
console.log("Strict penalties:", strictResult.score); // Lower score due to unused context
console.log("Lenient penalties:", lenientResult.score); // Higher score, less penalty
动态上下文提取Direct link to 动态上下文提取
🌐 Dynamic Context Extraction
const scorer = createContextRelevanceScorerLLM({
model: "openai/gpt-5.1",
options: {
contextExtractor: (input, output) => {
// Extract context based on the query
const userQuery = input?.inputMessages?.[0]?.content || "";
if (userQuery.includes("Einstein")) {
return [
"Einstein won the Nobel Prize for the photoelectric effect",
"He developed the theory of relativity",
];
}
return ["General physics information"];
},
penalties: {
unusedHighRelevanceContext: 0.15,
},
},
});
自定义缩放因子Direct link to 自定义缩放因子
🌐 Custom scale factor
const scorer = createContextRelevanceScorerLLM({
model: "openai/gpt-5.1",
options: {
context: ["Relevant information...", "Supporting details..."],
scale: 100, // Scale scores from 0-100 instead of 0-1
},
});
// Result will be scaled: score: 85 instead of 0.85
结合多个上下文来源Direct link to 结合多个上下文来源
🌐 Combining multiple context sources
const scorer = createContextRelevanceScorerLLM({
model: "openai/gpt-5.1",
options: {
contextExtractor: (input, output) => {
const query = input?.inputMessages?.[0]?.content || "";
// Combine from multiple sources
const kbContext = knowledgeBase.search(query);
const docContext = documentStore.retrieve(query);
const cacheContext = contextCache.get(query);
return [...kbContext, ...docContext, ...cacheContext];
},
scale: 1,
},
});
示例Direct link to 示例
🌐 Examples
高度相关的例子Direct link to 高度相关的例子
🌐 High relevance example
这个例子展示了出色的上下文相关性,所有上下文都直接支持响应:
🌐 This example shows excellent context relevance where all context directly supports the response:
import { createContextRelevanceScorerLLM } from "@mastra/evals";
const scorer = createContextRelevanceScorerLLM({
model: "openai/gpt-5.1",
options: {
context: [
"Einstein won the Nobel Prize for his discovery of the photoelectric effect in 1921.",
"He published his theory of special relativity in 1905.",
"His general relativity theory, published in 1915, revolutionized our understanding of gravity.",
],
scale: 1,
},
});
const result = await scorer.run({
input: {
inputMessages: [
{
id: "1",
role: "user",
content: "What were Einstein's major scientific achievements?",
},
],
},
output: [
{
id: "2",
role: "assistant",
content:
"Einstein's major achievements include the Nobel Prize for the photoelectric effect, special relativity in 1905, and general relativity in 1915.",
},
],
});
console.log(result);
// Output:
// {
// score: 1.0,
// reason: "The score is 1.0 because all context pieces are highly relevant to Einstein's achievements and were effectively used in generating the comprehensive response."
// }
混合相关示例Direct link to 混合相关示例
🌐 Mixed relevance example
此示例显示适度相关性,其中部分内容与上下文无关或未使用:
🌐 This example shows moderate relevance with some context being irrelevant or unused:
import { createContextRelevanceScorerLLM } from "@mastra/evals";
const scorer = createContextRelevanceScorerLLM({
model: "openai/gpt-5.1",
options: {
context: [
"Solar eclipses occur when the Moon blocks the Sun.",
"The Moon moves between the Earth and Sun during eclipses.",
"The Moon is visible at night.",
"Stars twinkle due to atmospheric interference.",
"Total eclipses can last up to 7.5 minutes.",
],
scale: 1,
},
});
const result = await scorer.run({
input: {
inputMessages: [
{
id: "1",
role: "user",
content: "What causes solar eclipses?",
},
],
},
output: [
{
id: "2",
role: "assistant",
content:
"Solar eclipses happen when the Moon moves between Earth and the Sun, blocking sunlight.",
},
],
});
console.log(result);
// Output with default penalties:
// {
// score: 0.64,
// reason: "The score is 0.64 because contexts 1 and 2 are highly relevant and used, context 5 is relevant but unused (10% penalty), while contexts 3 and 4 are irrelevant."
// }
// With custom penalty configuration
const customScorer = createContextRelevanceScorerLLM({
model: "openai/gpt-5.1",
options: {
context: [
"Solar eclipses occur when the Moon blocks the Sun.",
"The Moon moves between the Earth and Sun during eclipses.",
"The Moon is visible at night.",
"Stars twinkle due to atmospheric interference.",
"Total eclipses can last up to 7.5 minutes.",
],
penalties: {
unusedHighRelevanceContext: 0.05, // Lower penalty for unused context
missingContextPerItem: 0.1,
maxMissingContextPenalty: 0.3,
},
},
});
const customResult = await customScorer.run({
input: {
inputMessages: [
{ id: "1", role: "user", content: "What causes solar eclipses?" },
],
},
output: [
{
id: "2",
role: "assistant",
content:
"Solar eclipses happen when the Moon moves between Earth and the Sun, blocking sunlight.",
},
],
});
console.log(customResult);
// Output with lenient penalties:
// {
// score: 0.69, // Higher score due to reduced penalty for unused context
// reason: "The score is 0.69 because contexts 1 and 2 are highly relevant and used, context 5 is relevant but unused (5% penalty), while contexts 3 and 4 are irrelevant."
// }
低相关性示例Direct link to 低相关性示例
🌐 Low relevance example
这个例子显示了上下文相关性差,大部分信息都无关紧要:
🌐 This example shows poor context relevance with mostly irrelevant information:
import { createContextRelevanceScorerLLM } from "@mastra/evals";
const scorer = createContextRelevanceScorerLLM({
model: "openai/gpt-5.1",
options: {
context: [
"The Great Barrier Reef is located in Australia.",
"Coral reefs need warm water to survive.",
"Many fish species live in coral reefs.",
"Australia has six states and two territories.",
"The capital of Australia is Canberra.",
],
scale: 1,
},
});
const result = await scorer.run({
input: {
inputMessages: [
{
id: "1",
role: "user",
content: "What is the capital of Australia?",
},
],
},
output: [
{
id: "2",
role: "assistant",
content: "The capital of Australia is Canberra.",
},
],
});
console.log(result);
// Output:
// {
// score: 0.26,
// reason: "The score is 0.26 because only context 5 is relevant to the query about Australia's capital, while the other contexts about reefs are completely irrelevant."
// }
动态上下文提取Direct link to 动态上下文提取
🌐 Dynamic context extraction
根据运行输入动态提取上下文:
🌐 Extract context dynamically based on the run input:
import { createContextRelevanceScorerLLM } from "@mastra/evals";
const scorer = createContextRelevanceScorerLLM({
model: "openai/gpt-5.1",
options: {
contextExtractor: (input, output) => {
// Extract query from input
const query = input?.inputMessages?.[0]?.content || "";
// Dynamically retrieve context based on query
if (query.toLowerCase().includes("einstein")) {
return [
"Einstein developed E=mc²",
"He won the Nobel Prize in 1921",
"His theories revolutionized physics",
];
}
if (query.toLowerCase().includes("climate")) {
return [
"Global temperatures are rising",
"CO2 levels affect climate",
"Renewable energy reduces emissions",
];
}
return ["General knowledge base entry"];
},
penalties: {
unusedHighRelevanceContext: 0.15, // 15% penalty for unused relevant context
missingContextPerItem: 0.2, // 20% penalty per missing context item
maxMissingContextPenalty: 0.4, // Cap at 40% total missing context penalty
},
scale: 1,
},
});
RAG 系统集成Direct link to RAG 系统集成
🌐 RAG system integration
与RAG管道集成以评估检索到的上下文:
🌐 Integrate with RAG pipelines to evaluate retrieved context:
import { createContextRelevanceScorerLLM } from "@mastra/evals";
const scorer = createContextRelevanceScorerLLM({
model: "openai/gpt-5.1",
options: {
contextExtractor: (input, output) => {
// Extract from RAG retrieval results
const ragResults = inputData.metadata?.ragResults || [];
// Return the text content of retrieved documents
return ragResults
.filter((doc) => doc.relevanceScore > 0.5)
.map((doc) => doc.content);
},
penalties: {
unusedHighRelevanceContext: 0.12, // Moderate penalty for unused RAG context
missingContextPerItem: 0.18, // Higher penalty for missing information in RAG
maxMissingContextPenalty: 0.45, // Slightly higher cap for RAG systems
},
scale: 1,
},
});
// Evaluate RAG system performance
const evaluateRAG = async (testCases) => {
const results = [];
for (const testCase of testCases) {
const score = await scorer.run(testCase);
results.push({
query: testCase.inputData.inputMessages[0].content,
relevanceScore: score.score,
feedback: score.reason,
unusedContext: score.reason.includes("unused"),
missingContext: score.reason.includes("missing"),
});
}
return results;
};
与上下文精度的比较Direct link to 与上下文精度的比较
🌐 Comparison with Context Precision
根据你的需求选择合适的评分器:
🌐 Choose the right scorer for your needs:
| 用例 | 上下文相关性 | 上下文精确度 |
|---|---|---|
| RAG 评估 | 当使用重要时 | 当排序重要时 |
| 上下文质量 | 细微层次 | 二元相关性 |
| 缺失检测 | ✓ 识别缺口 | ✗ 未评估 |
| 使用跟踪 | ✓ 跟踪使用情况 | ✗ 未考虑 |
| 位置敏感性 | ✗ 与位置无关 | ✓ 奖励提前放置 |
相关Direct link to 相关
🌐 Related