忠诚评分器
🌐 Faithfulness Scorer
createFaithfulnessScorer() 函数用来评估大型语言模型(LLM)输出内容相对于提供的上下文的事实准确性。它会从输出中提取陈述,并将其与上下文进行核实,这对于测量 RAG 流程响应的可靠性至关重要。
🌐 The createFaithfulnessScorer() function evaluates how factually accurate an LLM's output is compared to the provided context. It extracts claims from the output and verifies them against the context, making it essential to measure RAG pipeline responses' reliability.
参数Direct link to 参数
🌐 Parameters
createFaithfulnessScorer() 函数接受一个包含以下属性的单一选项对象:
🌐 The createFaithfulnessScorer() function accepts a single options object with the following properties:
model:
context:
scale:
此函数返回 MastraScorer 类的一个实例。.run() 方法接受与其他评分器相同的输入(参见 MastraScorer 参考),但返回值包括如下所述的特定于 LLM 的字段。
🌐 This function returns an instance of the MastraScorer class. The .run() method accepts the same input as other scorers (see the MastraScorer reference), but the return value includes LLM-specific fields as documented below.
.run() 返回Direct link to .run() 返回
🌐 .run() Returns
runId:
preprocessStepResult:
preprocessPrompt:
analyzeStepResult:
analyzePrompt:
score:
reason:
generateReasonPrompt:
评分详情Direct link to 评分详情
🌐 Scoring Details
评分者通过将陈述与提供的上下文进行验证来评估其可信度。
🌐 The scorer evaluates faithfulness through claim verification against provided context.
评分流程Direct link to 评分流程
🌐 Scoring Process
- 分析主张和背景:
- 提取所有声明(事实性和推测性)
- 根据上下文验证每个声明
- 分配三种判决之一:
- “是的”——上下文支持的说法
- “不”——声明与上下文相矛盾
- “不确定” - 无法核实的声明
- 计算忠实度评分:
- 统计支持的主张
- 除以总索赔数
- 按配置范围缩放
最终得分:(supported_claims / total_claims) * scale
🌐 Final score: (supported_claims / total_claims) * scale
分数解释Direct link to 分数解释
🌐 Score interpretation
介于 0 到 1 之间的忠实度评分:
🌐 A faithfulness score between 0 and 1:
- 1.0:所有声明都是准确的,并有上下文直接支持。
- 0.7–0.9:大部分说法是正确的,仅有少量补充或遗漏。
- 0.4–0.6:有些说法是有依据的,但其他的无法核实。
- 0.1–0.3:大部分内容不准确或缺乏支持。
- 0.0:所有主张都是错误的或与上下文相矛盾。
示例Direct link to 示例
🌐 Example
评估代理的回应是否忠实于提供的上下文:
🌐 Evaluate agent responses for faithfulness to provided context:
import { runEvals } from "@mastra/core/evals";
import { createFaithfulnessScorer } from "@mastra/evals/scorers/prebuilt";
import { myAgent } from "./agent";
// Context is typically populated from agent tool calls or RAG retrieval
const scorer = createFaithfulnessScorer({
model: "openai/gpt-4o",
});
const result = await runEvals({
data: [
{
input: "Tell me about the Tesla Model 3.",
},
{
input: "What are the key features of this electric vehicle?",
},
],
scorers: [scorer],
target: myAgent,
onItemComplete: ({ scorerResults }) => {
console.log({
score: scorerResults[scorer.id].score,
reason: scorerResults[scorer.id].reason,
});
},
});
console.log(result.scores);
有关 runEvals 的更多详细信息,请参阅 runEvals 参考。
🌐 For more details on runEvals, see the runEvals reference.
要将此评分器添加到代理中,请参阅 评分器概览 指南。
🌐 To add this scorer to an agent, see the Scorers overview guide.
相关Direct link to 相关
🌐 Related