上下文精确度评分器

🌐 Context Precision Scorer

createContextPrecisionScorer() 函数创建一个评分器，用于评估检索到的上下文片段在生成预期输出时的相关性和位置合理性。它使用 平均精确度均值 (MAP) 来奖励那些将相关上下文提前排列的系统。

🌐 The createContextPrecisionScorer() function creates a scorer that evaluates how relevant and well-positioned retrieved context pieces are for generating expected outputs. It uses Mean Average Precision (MAP) to reward systems that place relevant context earlier in the sequence.

它在以下这些用例中特别有用：

🌐 It is especially useful for these use cases:

RAG 系统评估

非常适合在 RAG 流程中评估检索到的上下文，其中：

🌐 Ideal for evaluating retrieved context in RAG pipelines where:

上下文顺序对模型性能很重要
你需要衡量检索质量，而不仅仅是简单的相关性
早期相关的背景比后期相关的背景更有价值

上下文窗口优化

在优化上下文选择时使用：

🌐 Use when optimizing context selection for:

有限的上下文窗口
令牌预算限制
多步骤推断任务

参数
Direct link to 参数

🌐 Parameters

model:

MastraModelConfig

The language model to use for evaluating context relevance

options:

ContextPrecisionMetricOptions

Configuration options for the scorer

注意：必须提供 context 或 contextExtractor 中的一个。如果两者都提供，则以 contextExtractor 为准。

.run() 返回
Direct link to .run() 返回

🌐 .run() Returns

score:

number

Mean Average Precision score between 0 and scale (default 0-1)

reason:

string

Human-readable explanation of the context precision evaluation

评分详情
Direct link to 评分详情

🌐 Scoring Details

平均精度均值 (MAP)
Direct link to 平均精度均值 (MAP)

🌐 Mean Average Precision (MAP)

上下文精度使用平均精度均值来评估相关性和位置:

🌐 Context Precision uses Mean Average Precision to evaluate both relevance and positioning:

上下文评估：每个上下文片段都被分类为与生成预期输出相关或无关
精确计算：对于位置 i 的每个相关上下文，精确度 = relevant_items_so_far / (i + 1)
平均精度：将所有精度值相加，然后除以相关项目的总数
最终得分：乘以缩放因子并四舍五入到小数点后两位

评分公式
Direct link to 评分公式

🌐 Scoring Formula

MAP = (Σ Precision@k) / R

Where:
- Precision@k = (relevant items in positions 1...k) / k
- R = total number of relevant items
- Only calculated at positions where relevant items appear

分数解释
Direct link to 分数解释

🌐 Score Interpretation

0.9-1.0：精确度极高——序列早期就包含所有相关上下文
0.7-0.8：精度良好——大多数相关内容位置适当
0.4-0.6：中等精度 - 相关背景与无关内容混合
0.1-0.3：精度差——相关内容很少或位置不当
0.0：未找到相关内容

原因分析
Direct link to 原因分析

🌐 Reason analysis

原因字段说明：

🌐 The reason field explains:

哪些上下文信息被认为是相关的/不相关的
定位如何影响平均动脉压（MAP）的计算
评估中使用的具体相关性标准

优化见解
Direct link to 优化见解

🌐 Optimization insights

使用结果来：

🌐 Use results to:

改进检索：在排序之前筛除无关的上下文
优化排名：确保相关内容尽早出现
调整块大小：平衡上下文细节与相关性精确度
评估嵌入：测试不同的嵌入模型以获得更好的检索效果

示例计算
Direct link to 示例计算

🌐 Example Calculation

给定的上下文：[relevant, irrelevant, relevant, irrelevant]

🌐 Given context: [relevant, irrelevant, relevant, irrelevant]

位置 0：相关 → 精确度 = 1/1 = 1.0
位置1：跳过（无关）
位置 2：相关 → 精确度 = 2/3 = 0.67
位置3：跳过（无关）

MAP = (1.0 + 0.67) / 2 = 0.835 ≈ 0.83

评分器配置
Direct link to 评分器配置

🌐 Scorer configuration

动态上下文提取
Direct link to 动态上下文提取

🌐 Dynamic context extraction

const scorer = createContextPrecisionScorer({
  model: "openai/gpt-5.1",
  options: {
    contextExtractor: (input, output) => {
      // Extract context dynamically based on the query
      const query = input?.inputMessages?.[0]?.content || "";

      // Example: Retrieve from a vector database
      const searchResults = vectorDB.search(query, { limit: 10 });
      return searchResults.map((result) => result.content);
    },
    scale: 1,
  },
});

大规模上下文评估
Direct link to 大规模上下文评估

🌐 Large context evaluation

const scorer = createContextPrecisionScorer({
  model: "openai/gpt-5.1",
  options: {
    context: [
      // Simulate retrieved documents from vector database
      "Document 1: Highly relevant content...",
      "Document 2: Somewhat related content...",
      "Document 3: Tangentially related...",
      "Document 4: Not relevant...",
      "Document 5: Highly relevant content...",
      // ... up to dozens of context pieces
    ],
  },
});

示例
Direct link to 示例

🌐 Example

评估 RAG 系统在不同查询下的上下文检索精度：

🌐 Evaluate RAG system context retrieval precision for different queries:

src/example-context-precision.ts
import { runEvals } from "@mastra/core/evals";
import { createContextPrecisionScorer } from "@mastra/evals/scorers/prebuilt";
import { myAgent } from "./agent";

const scorer = createContextPrecisionScorer({
  model: "openai/gpt-4o",
  options: {
    contextExtractor: (input, output) => {
      // Extract context from agent's retrieved documents
      return output.metadata?.retrievedContext || [];
    },
  },
});

const result = await runEvals({
  data: [
    {
      input: "How does photosynthesis work in plants?",
    },
    {
      input: "What are the mental and physical benefits of exercise?",
    },
  ],
  scorers: [scorer],
  target: myAgent,
  onItemComplete: ({ scorerResults }) => {
    console.log({
      score: scorerResults[scorer.id].score,
      reason: scorerResults[scorer.id].reason,
    });
  },
});

console.log(result.scores);

有关 runEvals 的更多详细信息，请参阅 runEvals 参考。

🌐 For more details on runEvals, see the runEvals reference.

要将此评分器添加到代理中，请参阅评分器概览指南。

🌐 To add this scorer to an agent, see the Scorers overview guide.

与上下文相关性的比较
Direct link to 与上下文相关性的比较

🌐 Comparison with Context Relevance

根据你的需求选择合适的评分器：

🌐 Choose the right scorer for your needs:

用例	上下文相关性	上下文精确度
RAG 评估	当使用重要时	当排序重要时
上下文质量	细微层次	二元相关性
缺失检测	✓ 识别缺口	✗ 未评估
使用跟踪	✓ 跟踪使用情况	✗ 未考虑
位置敏感性	✗ 与位置无关	✓ 奖励提前放置

🌐 Related

答案相关性评分器 - 评估答案是否切题
可信度评分器 - 衡量答案在上下文中的基础性
自定义评分器 - 创建你自己的评估指标

参数Direct link to 参数

model:

options:

.run() 返回Direct link to .run() 返回

score:

reason:

评分详情Direct link to 评分详情

平均精度均值 (MAP)Direct link to 平均精度均值 (MAP)

评分公式Direct link to 评分公式

分数解释Direct link to 分数解释

原因分析Direct link to 原因分析

优化见解Direct link to 优化见解

示例计算Direct link to 示例计算

评分器配置Direct link to 评分器配置

动态上下文提取Direct link to 动态上下文提取

大规模上下文评估Direct link to 大规模上下文评估

示例Direct link to 示例

与上下文相关性的比较Direct link to 与上下文相关性的比较

相关Direct link to 相关