上下文相关性评分器

🌐 Context Relevance Scorer

createContextRelevanceScorerLLM() 函数创建一个评分器，用于评估所提供的上下文在生成代理响应时的相关性和有效性。它使用加权相关性等级，并对未使用的高度相关上下文和缺失的信息施加惩罚。

🌐 The createContextRelevanceScorerLLM() function creates a scorer that evaluates how relevant and useful provided context was for generating agent responses. It uses weighted relevance levels and applies penalties for unused high-relevance context and missing information.

它在以下这些用例中特别有用：

🌐 It is especially useful for these use cases:

内容生成评估

最适合评估以下方面的上下文质量：

🌐 Best for evaluating context quality in:

上下文使用很重要的聊天系统
需要细致相关性评估的 RAG 流水线
缺失上下文影响质量的系统

上下文选择优化

用于优化时：

🌐 Use when optimizing for:

全面的上下文覆盖
有效的上下文利用
识别情境差距

参数
Direct link to 参数

🌐 Parameters

model:

MastraModelConfig

The language model to use for evaluating context relevance

options:

ContextRelevanceOptions

Configuration options for the scorer

注意：必须提供 context 或 contextExtractor 中的一个。如果两者都提供，则以 contextExtractor 为准。

🌐 Note: Either context or contextExtractor must be provided. If both are provided, contextExtractor takes precedence.

.run() 返回
Direct link to .run() 返回

🌐 .run() Returns

score:

number

Weighted relevance score between 0 and scale (default 0-1)

reason:

string

Human-readable explanation of the context relevance evaluation

评分详情
Direct link to 评分详情

🌐 Scoring Details

加权相关性评分
Direct link to 加权相关性评分

🌐 Weighted Relevance Scoring

上下文相关性使用一种考虑以下因素的复杂评分算法：

🌐 Context Relevance uses a sophisticated scoring algorithm that considers:

相关性等级：每个上下文部分都根据权重值进行分类：
- high = 1.0（直接回应查询）
- medium = 0.7（支持信息）
- low = 0.3（略相关）
- none = 0.0（完全无关）
使用检测：跟踪相关上下文是否在回复中被实际使用
适用处罚（可通过 penalties 选项配置）：
- 未使用的高相关性：每个未使用的高相关性上下文的惩罚 unusedHighRelevanceContext（默认值：0.1）
- 缺失上下文：针对已识别的缺失信息，可设置最高 maxMissingContextPenalty（默认值：0.5）

评分公式
Direct link to 评分公式

🌐 Scoring Formula

Base Score = Σ(relevance_weights) / (num_contexts × 1.0)
Usage Penalty = count(unused_high_relevance) × unusedHighRelevanceContext
Missing Penalty = min(count(missing_context) × missingContextPerItem, maxMissingContextPenalty)

Final Score = max(0, Base Score - Usage Penalty - Missing Penalty) × scale

默认值：

unusedHighRelevanceContext = 0.1（每个未使用的高度相关上下文扣除 10%）
missingContextPerItem = 0.15（每缺失一个上下文项罚15%）
maxMissingContextPenalty = 0.5（缺失上下文的最高惩罚为50%）
scale = 1

分数解释
Direct link to 分数解释

🌐 Score interpretation

0.9-1.0：优秀 - 所有内容高度相关且被使用
0.7-0.8：良好 - 主要相关，仅有少量缺失
0.4-0.6：混合 - 重要的无关或未使用的上下文
0.2-0.3：差 - 大多数上下文不相关
0.0-0.1：非常差 - 未找到相关内容

原因分析
Direct link to 原因分析

🌐 Reason analysis

原因栏提供了以下方面的见解：

🌐 The reason field provides insights on:

每个上下文片段的相关性级别（高/中/低/无）
响应中实际上使用了哪种上下文
未使用的重要相关上下文将应用惩罚（可通过 unusedHighRelevanceContext 配置）
缺少本可以改善回复的上下文（通过 missingContextPerItem 至 maxMissingContextPenalty 进行处罚）

优化策略
Direct link to 优化策略

🌐 Optimization strategies

使用结果来改进你的系统：

🌐 Use results to improve your system:

过滤无关内容：在处理之前删除低相关或无关的信息
确保上下文使用：确保纳入高度相关的上下文
填补背景空白：添加评分者识别出的缺失信息
平衡上下文大小：找到最佳上下文量以获得最佳相关性
调节惩罚敏感度：根据你的应用对未使用或缺失上下文的容忍度调整 unusedHighRelevanceContext、missingContextPerItem 和 maxMissingContextPenalty

与上下文精确度的差异
Direct link to 与上下文精确度的差异

🌐 Difference from Context Precision

方面	上下文相关性	上下文精确度
算法	带有惩罚的加权等级	平均精度均值（MAP）
相关性	多级（高/中/低/无）	二元（是/否）
位置	未考虑	关键（奖励提前出现的位置）
使用情况	跟踪并惩罚未使用的上下文	未考虑
缺失	识别并惩罚空缺	未评估

评分器配置
Direct link to 评分器配置

🌐 Scorer configuration

自定义惩罚配置
Direct link to 自定义惩罚配置

🌐 Custom penalty configuration

控制未使用和缺失上下文时的惩罚应用方式：

🌐 Control how penalties are applied for unused and missing context:

import { createContextRelevanceScorerLLM } from "@mastra/evals";

// Stricter penalty configuration
const strictScorer = createContextRelevanceScorerLLM({
  model: "openai/gpt-5.1",
  options: {
    context: [
      "Einstein won the Nobel Prize for photoelectric effect",
      "He developed the theory of relativity",
      "Einstein was born in Germany",
    ],
    penalties: {
      unusedHighRelevanceContext: 0.2, // 20% penalty per unused high-relevance context
      missingContextPerItem: 0.25, // 25% penalty per missing context item
      maxMissingContextPenalty: 0.6, // Maximum 60% penalty for missing context
    },
    scale: 1,
  },
});

// Lenient penalty configuration
const lenientScorer = createContextRelevanceScorerLLM({
  model: "openai/gpt-5.1",
  options: {
    context: [
      "Einstein won the Nobel Prize for photoelectric effect",
      "He developed the theory of relativity",
      "Einstein was born in Germany",
    ],
    penalties: {
      unusedHighRelevanceContext: 0.05, // 5% penalty per unused high-relevance context
      missingContextPerItem: 0.1, // 10% penalty per missing context item
      maxMissingContextPenalty: 0.3, // Maximum 30% penalty for missing context
    },
    scale: 1,
  },
});

const testRun = {
  input: {
    inputMessages: [
      {
        id: "1",
        role: "user",
        content: "What did Einstein achieve in physics?",
      },
    ],
  },
  output: [
    {
      id: "2",
      role: "assistant",
      content:
        "Einstein won the Nobel Prize for his work on the photoelectric effect.",
    },
  ],
};

const strictResult = await strictScorer.run(testRun);
const lenientResult = await lenientScorer.run(testRun);

console.log("Strict penalties:", strictResult.score); // Lower score due to unused context
console.log("Lenient penalties:", lenientResult.score); // Higher score, less penalty

动态上下文提取
Direct link to 动态上下文提取

🌐 Dynamic Context Extraction

const scorer = createContextRelevanceScorerLLM({
  model: "openai/gpt-5.1",
  options: {
    contextExtractor: (input, output) => {
      // Extract context based on the query
      const userQuery = input?.inputMessages?.[0]?.content || "";
      if (userQuery.includes("Einstein")) {
        return [
          "Einstein won the Nobel Prize for the photoelectric effect",
          "He developed the theory of relativity",
        ];
      }
      return ["General physics information"];
    },
    penalties: {
      unusedHighRelevanceContext: 0.15,
    },
  },
});

自定义缩放因子
Direct link to 自定义缩放因子

🌐 Custom scale factor

const scorer = createContextRelevanceScorerLLM({
  model: "openai/gpt-5.1",
  options: {
    context: ["Relevant information...", "Supporting details..."],
    scale: 100, // Scale scores from 0-100 instead of 0-1
  },
});

// Result will be scaled: score: 85 instead of 0.85

结合多个上下文来源
Direct link to 结合多个上下文来源

🌐 Combining multiple context sources

const scorer = createContextRelevanceScorerLLM({
  model: "openai/gpt-5.1",
  options: {
    contextExtractor: (input, output) => {
      const query = input?.inputMessages?.[0]?.content || "";

      // Combine from multiple sources
      const kbContext = knowledgeBase.search(query);
      const docContext = documentStore.retrieve(query);
      const cacheContext = contextCache.get(query);

      return [...kbContext, ...docContext, ...cacheContext];
    },
    scale: 1,
  },
});

示例
Direct link to 示例

🌐 Examples

高度相关的例子
Direct link to 高度相关的例子

🌐 High relevance example

这个例子展示了出色的上下文相关性，所有上下文都直接支持响应：

🌐 This example shows excellent context relevance where all context directly supports the response:

import { createContextRelevanceScorerLLM } from "@mastra/evals";

const scorer = createContextRelevanceScorerLLM({
  model: "openai/gpt-5.1",
  options: {
    context: [
      "Einstein won the Nobel Prize for his discovery of the photoelectric effect in 1921.",
      "He published his theory of special relativity in 1905.",
      "His general relativity theory, published in 1915, revolutionized our understanding of gravity.",
    ],
    scale: 1,
  },
});

const result = await scorer.run({
  input: {
    inputMessages: [
      {
        id: "1",
        role: "user",
        content: "What were Einstein's major scientific achievements?",
      },
    ],
  },
  output: [
    {
      id: "2",
      role: "assistant",
      content:
        "Einstein's major achievements include the Nobel Prize for the photoelectric effect, special relativity in 1905, and general relativity in 1915.",
    },
  ],
});

console.log(result);
// Output:
// {
//   score: 1.0,
//   reason: "The score is 1.0 because all context pieces are highly relevant to Einstein's achievements and were effectively used in generating the comprehensive response."
// }

混合相关示例
Direct link to 混合相关示例

🌐 Mixed relevance example

此示例显示适度相关性，其中部分内容与上下文无关或未使用：

🌐 This example shows moderate relevance with some context being irrelevant or unused:

import { createContextRelevanceScorerLLM } from "@mastra/evals";

const scorer = createContextRelevanceScorerLLM({
  model: "openai/gpt-5.1",
  options: {
    context: [
      "Solar eclipses occur when the Moon blocks the Sun.",
      "The Moon moves between the Earth and Sun during eclipses.",
      "The Moon is visible at night.",
      "Stars twinkle due to atmospheric interference.",
      "Total eclipses can last up to 7.5 minutes.",
    ],
    scale: 1,
  },
});

const result = await scorer.run({
  input: {
    inputMessages: [
      {
        id: "1",
        role: "user",
        content: "What causes solar eclipses?",
      },
    ],
  },
  output: [
    {
      id: "2",
      role: "assistant",
      content:
        "Solar eclipses happen when the Moon moves between Earth and the Sun, blocking sunlight.",
    },
  ],
});

console.log(result);
// Output with default penalties:
// {
//   score: 0.64,
//   reason: "The score is 0.64 because contexts 1 and 2 are highly relevant and used, context 5 is relevant but unused (10% penalty), while contexts 3 and 4 are irrelevant."
// }

// With custom penalty configuration
const customScorer = createContextRelevanceScorerLLM({
  model: "openai/gpt-5.1",
  options: {
    context: [
      "Solar eclipses occur when the Moon blocks the Sun.",
      "The Moon moves between the Earth and Sun during eclipses.",
      "The Moon is visible at night.",
      "Stars twinkle due to atmospheric interference.",
      "Total eclipses can last up to 7.5 minutes.",
    ],
    penalties: {
      unusedHighRelevanceContext: 0.05, // Lower penalty for unused context
      missingContextPerItem: 0.1,
      maxMissingContextPenalty: 0.3,
    },
  },
});

const customResult = await customScorer.run({
  input: {
    inputMessages: [
      { id: "1", role: "user", content: "What causes solar eclipses?" },
    ],
  },
  output: [
    {
      id: "2",
      role: "assistant",
      content:
        "Solar eclipses happen when the Moon moves between Earth and the Sun, blocking sunlight.",
    },
  ],
});

console.log(customResult);
// Output with lenient penalties:
// {
//   score: 0.69, // Higher score due to reduced penalty for unused context
//   reason: "The score is 0.69 because contexts 1 and 2 are highly relevant and used, context 5 is relevant but unused (5% penalty), while contexts 3 and 4 are irrelevant."
// }

低相关性示例
Direct link to 低相关性示例

🌐 Low relevance example

这个例子显示了上下文相关性差，大部分信息都无关紧要：

🌐 This example shows poor context relevance with mostly irrelevant information:

import { createContextRelevanceScorerLLM } from "@mastra/evals";

const scorer = createContextRelevanceScorerLLM({
  model: "openai/gpt-5.1",
  options: {
    context: [
      "The Great Barrier Reef is located in Australia.",
      "Coral reefs need warm water to survive.",
      "Many fish species live in coral reefs.",
      "Australia has six states and two territories.",
      "The capital of Australia is Canberra.",
    ],
    scale: 1,
  },
});

const result = await scorer.run({
  input: {
    inputMessages: [
      {
        id: "1",
        role: "user",
        content: "What is the capital of Australia?",
      },
    ],
  },
  output: [
    {
      id: "2",
      role: "assistant",
      content: "The capital of Australia is Canberra.",
    },
  ],
});

console.log(result);
// Output:
// {
//   score: 0.26,
//   reason: "The score is 0.26 because only context 5 is relevant to the query about Australia's capital, while the other contexts about reefs are completely irrelevant."
// }

动态上下文提取
Direct link to 动态上下文提取

🌐 Dynamic context extraction

根据运行输入动态提取上下文：

🌐 Extract context dynamically based on the run input:

import { createContextRelevanceScorerLLM } from "@mastra/evals";

const scorer = createContextRelevanceScorerLLM({
  model: "openai/gpt-5.1",
  options: {
    contextExtractor: (input, output) => {
      // Extract query from input
      const query = input?.inputMessages?.[0]?.content || "";

      // Dynamically retrieve context based on query
      if (query.toLowerCase().includes("einstein")) {
        return [
          "Einstein developed E=mc²",
          "He won the Nobel Prize in 1921",
          "His theories revolutionized physics",
        ];
      }

      if (query.toLowerCase().includes("climate")) {
        return [
          "Global temperatures are rising",
          "CO2 levels affect climate",
          "Renewable energy reduces emissions",
        ];
      }

      return ["General knowledge base entry"];
    },
    penalties: {
      unusedHighRelevanceContext: 0.15, // 15% penalty for unused relevant context
      missingContextPerItem: 0.2, // 20% penalty per missing context item
      maxMissingContextPenalty: 0.4, // Cap at 40% total missing context penalty
    },
    scale: 1,
  },
});

RAG 系统集成
Direct link to RAG 系统集成

🌐 RAG system integration

与RAG管道集成以评估检索到的上下文：

🌐 Integrate with RAG pipelines to evaluate retrieved context:

import { createContextRelevanceScorerLLM } from "@mastra/evals";

const scorer = createContextRelevanceScorerLLM({
  model: "openai/gpt-5.1",
  options: {
    contextExtractor: (input, output) => {
      // Extract from RAG retrieval results
      const ragResults = inputData.metadata?.ragResults || [];

      // Return the text content of retrieved documents
      return ragResults
        .filter((doc) => doc.relevanceScore > 0.5)
        .map((doc) => doc.content);
    },
    penalties: {
      unusedHighRelevanceContext: 0.12, // Moderate penalty for unused RAG context
      missingContextPerItem: 0.18, // Higher penalty for missing information in RAG
      maxMissingContextPenalty: 0.45, // Slightly higher cap for RAG systems
    },
    scale: 1,
  },
});

// Evaluate RAG system performance
const evaluateRAG = async (testCases) => {
  const results = [];

  for (const testCase of testCases) {
    const score = await scorer.run(testCase);
    results.push({
      query: testCase.inputData.inputMessages[0].content,
      relevanceScore: score.score,
      feedback: score.reason,
      unusedContext: score.reason.includes("unused"),
      missingContext: score.reason.includes("missing"),
    });
  }

  return results;
};

与上下文精度的比较
Direct link to 与上下文精度的比较

🌐 Comparison with Context Precision

根据你的需求选择合适的评分器：

🌐 Choose the right scorer for your needs:

用例	上下文相关性	上下文精确度
RAG 评估	当使用重要时	当排序重要时
上下文质量	细微层次	二元相关性
缺失检测	✓ 识别缺口	✗ 未评估
使用跟踪	✓ 跟踪使用情况	✗ 未考虑
位置敏感性	✗ 与位置无关	✓ 奖励提前放置

🌐 Related

上下文精度评分器 - 使用MAP评估上下文排序
可信度评分器 - 衡量答案在上下文中的基础性
自定义评分器 - 创建你自己的评估指标

参数Direct link to 参数

model:

options:

.run() 返回Direct link to .run() 返回

score:

reason:

评分详情Direct link to 评分详情

加权相关性评分Direct link to 加权相关性评分

评分公式Direct link to 评分公式

分数解释Direct link to 分数解释

原因分析Direct link to 原因分析

优化策略Direct link to 优化策略

与上下文精确度的差异Direct link to 与上下文精确度的差异

评分器配置Direct link to 评分器配置

自定义惩罚配置Direct link to 自定义惩罚配置

动态上下文提取Direct link to 动态上下文提取

自定义缩放因子Direct link to 自定义缩放因子

结合多个上下文来源Direct link to 结合多个上下文来源

示例Direct link to 示例

高度相关的例子Direct link to 高度相关的例子

混合相关示例Direct link to 混合相关示例

低相关性示例Direct link to 低相关性示例

动态上下文提取Direct link to 动态上下文提取

RAG 系统集成Direct link to RAG 系统集成

与上下文精度的比较Direct link to 与上下文精度的比较

相关Direct link to 相关

参数
Direct link to 参数

.run() 返回
Direct link to .run() 返回

评分详情
Direct link to 评分详情

加权相关性评分
Direct link to 加权相关性评分

评分公式
Direct link to 评分公式

分数解释
Direct link to 分数解释

原因分析
Direct link to 原因分析

优化策略
Direct link to 优化策略

与上下文精确度的差异
Direct link to 与上下文精确度的差异

评分器配置
Direct link to 评分器配置

自定义惩罚配置
Direct link to 自定义惩罚配置

动态上下文提取
Direct link to 动态上下文提取

自定义缩放因子
Direct link to 自定义缩放因子

结合多个上下文来源
Direct link to 结合多个上下文来源

示例
Direct link to 示例

高度相关的例子
Direct link to 高度相关的例子

混合相关示例
Direct link to 混合相关示例

低相关性示例
Direct link to 低相关性示例

动态上下文提取
Direct link to 动态上下文提取

RAG 系统集成
Direct link to RAG 系统集成

与上下文精度的比较
Direct link to 与上下文精度的比较

相关
Direct link to 相关