Skip to main content

RAG 系统中的检索

🌐 Retrieval in RAG Systems

在存储嵌入后,你需要检索相关的片段来回答用户的查询。

🌐 After storing embeddings, you need to retrieve relevant chunks to answer user queries.

Mastra 提供灵活的检索选项,支持语义搜索、过滤和重新排序。

🌐 Mastra provides flexible retrieval options with support for semantic search, filtering, and re-ranking.

检索是如何工作的
Direct link to 检索是如何工作的

🌐 How Retrieval Works

  1. 用户的查询会使用与文档嵌入相同的模型转换为嵌入向量
  2. 该嵌入向量将通过向量相似度与存储的嵌入向量进行比较
  3. 最相似的块将被检索,并可以选择性地:
  • 按元数据筛选
  • 已重新排序以提高相关性
  • 通过知识图处理

基础检索
Direct link to 基础检索

🌐 Basic Retrieval

最简单的方法是直接语义搜索。这种方法使用向量相似度来找到与查询语义相似的内容块:

🌐 The simplest approach is direct semantic search. This method uses vector similarity to find chunks that are semantically similar to the query:

import { embed } from "ai";
import { PgVector } from "@mastra/pg";
import { ModelRouterEmbeddingModel } from "@mastra/core/llm";

// Convert query to embedding
const { embedding } = await embed({
value: "What are the main points in the article?",
model: new ModelRouterEmbeddingModel("openai/text-embedding-3-small"),
});

// Query vector store
const pgVector = new PgVector({
id: 'pg-vector',
connectionString: process.env.POSTGRES_CONNECTION_STRING,
});
const results = await pgVector.query({
indexName: "embeddings",
queryVector: embedding,
topK: 10,
});

// Display results
console.log(results);

topK 参数指定从向量搜索中返回的最相似结果的最大数量。

🌐 The topK parameter specifies the maximum number of most similar results to return from the vector search.

结果包括文本内容和相似度评分:

🌐 Results include both the text content and a similarity score:

[
{
text: "Climate change poses significant challenges...",
score: 0.89,
metadata: { source: "article1.txt" },
},
{
text: "Rising temperatures affect crop yields...",
score: 0.82,
metadata: { source: "article1.txt" },
},
];

高级检索选项
Direct link to 高级检索选项

🌐 Advanced Retrieval options

元数据过滤
Direct link to 元数据过滤

🌐 Metadata Filtering

根据元数据字段筛选结果,以缩小搜索范围。这种方法——将向量相似度搜索与元数据过滤相结合——有时被称为混合向量搜索,因为它将语义搜索与结构化过滤条件融合在一起。

🌐 Filter results based on metadata fields to narrow down the search space. This approach - combining vector similarity search with metadata filters - is sometimes called hybrid vector search, as it merges semantic search with structured filtering criteria.

当你拥有来自不同来源、不同时间段或具有特定属性的文档时,这会很有用。Mastra 提供了一种统一的 MongoDB 风格查询语法,可用于所有支持的向量存储。

🌐 This is useful when you have documents from different sources, time periods, or with specific attributes. Mastra provides a unified MongoDB-style query syntax that works across all supported vector stores.

有关可用运算符和语法的详细信息,请参阅 元数据筛选器参考

🌐 For detailed information about available operators and syntax, see the Metadata Filters Reference.

基本过滤示例:

🌐 Basic filtering examples:

// Simple equality filter
const results = await pgVector.query({
indexName: "embeddings",
queryVector: embedding,
topK: 10,
filter: {
source: "article1.txt",
},
});

// Numeric comparison
const results = await pgVector.query({
indexName: "embeddings",
queryVector: embedding,
topK: 10,
filter: {
price: { $gt: 100 },
},
});

// Multiple conditions
const results = await pgVector.query({
indexName: "embeddings",
queryVector: embedding,
topK: 10,
filter: {
category: "electronics",
price: { $lt: 1000 },
inStock: true,
},
});

// Array operations
const results = await pgVector.query({
indexName: "embeddings",
queryVector: embedding,
topK: 10,
filter: {
tags: { $in: ["sale", "new"] },
},
});

// Logical operators
const results = await pgVector.query({
indexName: "embeddings",
queryVector: embedding,
topK: 10,
filter: {
$or: [{ category: "electronics" }, { category: "accessories" }],
$and: [{ price: { $gt: 50 } }, { price: { $lt: 200 } }],
},
});

元数据过滤的常见用例:

🌐 Common use cases for metadata filtering:

  • 按文档来源或类型筛选
  • 按日期范围筛选
  • 按特定类别或标签筛选
  • 按数值范围筛选(例如,价格、评分)
  • 组合多个条件以进行精确查询
  • 按文档属性筛选(例如,语言、作者)

向量查询工具
Direct link to 向量查询工具

🌐 Vector Query Tool

有时你希望赋予你的代理直接查询向量数据库的能力。向量查询工具允许你的代理负责检索决策,将语义搜索与可选的过滤和重新排序相结合,基于代理对用户需求的理解。

🌐 Sometimes you want to give your agent the ability to query a vector database directly. The Vector Query Tool allows your agent to be in charge of retrieval decisions, combining semantic search with optional filtering and reranking based on the agent's understanding of the user's needs.

import { createVectorQueryTool } from "@mastra/rag";
import { ModelRouterEmbeddingModel } from "@mastra/core/llm";

const vectorQueryTool = createVectorQueryTool({
vectorStoreName: "pgVector",
indexName: "embeddings",
model: new ModelRouterEmbeddingModel("openai/text-embedding-3-small"),
});

在创建工具时,请特别注意工具的名称和描述——这些有助于代理理解何时以及如何使用检索功能。例如,你可以将其命名为“SearchKnowledgeBase”,并将其描述为“在我们的文档中搜索以查找关于X主题的相关信息。”

🌐 When creating the tool, pay special attention to the tool's name and description - these help the agent understand when and how to use the retrieval capabilities. For example, you might name it "SearchKnowledgeBase" and describe it as "Search through our documentation to find relevant information about X topic."

这在以下情况下特别有用:

🌐 This is particularly useful when:

  • 你的代理需要动态决定要检索哪些信息
  • 检索过程需要复杂的决策
  • 你希望该代理根据上下文结合多种检索策略

数据库特定配置
Direct link to 数据库特定配置

🌐 Database-Specific Configurations

Vector Query Tool 支持针对数据库的特定配置,使你能够利用不同向量存储的独特功能和优化。

🌐 The Vector Query Tool supports database-specific configurations that enable you to leverage unique features and optimizations of different vector stores.

note

这些配置用于查询时选项,例如命名空间、性能调优和过滤——而不是用于数据库连接设置。

连接凭据(URL、认证令牌)在实例化向量存储类(例如 new LibSQLVector({ url: '...' }))时进行配置。

🌐 Connection credentials (URLs, auth tokens) are configured when you instantiate the vector store class (e.g., new LibSQLVector({ url: '...' })).

import { createVectorQueryTool } from "@mastra/rag";
import { ModelRouterEmbeddingModel } from "@mastra/core/llm";

// Pinecone with namespace
const pineconeQueryTool = createVectorQueryTool({
vectorStoreName: "pinecone",
indexName: "docs",
model: new ModelRouterEmbeddingModel("openai/text-embedding-3-small"),
databaseConfig: {
pinecone: {
namespace: "production", // Isolate data by environment
},
},
});

// pgVector with performance tuning
const pgVectorQueryTool = createVectorQueryTool({
vectorStoreName: "postgres",
indexName: "embeddings",
model: new ModelRouterEmbeddingModel("openai/text-embedding-3-small"),
databaseConfig: {
pgvector: {
minScore: 0.7, // Filter low-quality results
ef: 200, // HNSW search parameter
probes: 10, // IVFFlat probe parameter
},
},
});

// Chroma with advanced filtering
const chromaQueryTool = createVectorQueryTool({
vectorStoreName: "chroma",
indexName: "documents",
model: new ModelRouterEmbeddingModel("openai/text-embedding-3-small"),
databaseConfig: {
chroma: {
where: { category: "technical" },
whereDocument: { $contains: "API" },
},
},
});

// LanceDB with table specificity
const lanceQueryTool = createVectorQueryTool({
vectorStoreName: "lance",
indexName: "documents",
model: new ModelRouterEmbeddingModel("openai/text-embedding-3-small"),
databaseConfig: {
lance: {
tableName: "myVectors", // Specify which table to query
includeAllColumns: true, // Include all metadata columns in results
},
},
});

主要优势:

  • Pinecone 命名空间:按租户、环境或数据类型组织向量
  • pgVector 优化:通过 ef/probes 参数控制搜索精度和速度
  • 质量过滤:设置最低相似度阈值以提高结果相关性
  • LanceDB 表:将数据分成表以便更好地组织和提高性能
  • 运行时灵活性:根据上下文动态覆盖配置

常见用例:

  • 使用 Pinecone 命名空间的多租户应用
  • 高负载场景下的性能优化
  • 针对特定环境的配置(开发/预发布/生产)
  • 质量筛选的搜索结果
  • 用于边缘部署场景的基于文件的嵌入式向量存储,使用 LanceDB

你也可以在运行时使用请求上下文覆盖这些配置:

🌐 You can also override these configurations at runtime using the request context:

import { RequestContext } from "@mastra/core/request-context";

const requestContext = new RequestContext();
requestContext.set("databaseConfig", {
pinecone: {
namespace: "runtime-namespace",
},
});

await pineconeQueryTool.execute(
{ queryText: "search query" },
{ mastra, requestContext }
);

有关详细的配置选项和高级用法,请参见 Vector Query Tool Reference

🌐 For detailed configuration options and advanced usage, see the Vector Query Tool Reference.

向量存储提示
Direct link to 向量存储提示

🌐 Vector Store Prompts

向量存储提示定义了每个向量数据库实现的查询模式和过滤功能。 在实现过滤时,需要在代理的指令中使用这些提示,以指定每个向量存储实现的有效操作符和语法。

🌐 Vector store prompts define query patterns and filtering capabilities for each vector database implementation. When implementing filtering, these prompts are required in the agent's instructions to specify valid operators and syntax for each vector store implementation.

import { PGVECTOR_PROMPT } from "@mastra/pg";

export const ragAgent = new Agent({
id: "rag-agent",
name: "RAG Agent",
model: "openai/gpt-5.1",
instructions: `
Process queries using the provided context. Structure responses to be concise and relevant.
${PGVECTOR_PROMPT}
`,
tools: { vectorQueryTool },
});

重新排序
Direct link to 重新排序

🌐 Re-ranking

初始向量相似度搜索有时可能会忽略细微的相关性。重排序是一个计算成本更高的过程,但它是一种更准确的算法,通过以下方式改进结果:

🌐 Initial vector similarity search can sometimes miss nuanced relevance. Re-ranking is a more computationally expensive process, but more accurate algorithm that improves results by:

  • 考虑词序和精确匹配
  • 应用更复杂的相关性评分
  • 使用一种称为查询与文档之间交叉注意的方法

以下是重新排序的使用方法:

🌐 Here's how to use re-ranking:

import {
rerankWithScorer as rerank,
MastraAgentRelevanceScorer
} from "@mastra/rag";

// Get initial results from vector search
const initialResults = await pgVector.query({
indexName: "embeddings",
queryVector: queryEmbedding,
topK: 10,
});

// Create a relevance scorer
const relevanceProvider = new MastraAgentRelevanceScorer('relevance-scorer', "openai/gpt-5.1");

// Re-rank the results
const rerankedResults = await rerank({
results: initialResults,
query,
scorer: relevanceProvider,
options: {
weights: {
semantic: 0.5, // How well the content matches the query semantically
vector: 0.3, // Original vector similarity score
position: 0.2, // Preserves original result ordering
},
topK: 10,
},
);

权重控制不同因素如何影响最终排名:

🌐 The weights control how different factors influence the final ranking:

  • semantic:较高的数值优先考虑语义理解和与查询的相关性
  • vector:较高的值有利于原始向量相似度评分
  • position:较高的数值有助于保持结果的原始顺序
note

为了在重新排序时语义评分能够正常工作,每个结果必须在其 metadata.text 字段中包含文本内容。

你也可以使用其他相关性评分提供商,例如 Cohere 或 ZeroEntropy:

🌐 You can also use other relevance score providers like Cohere or ZeroEntropy:

const relevanceProvider = new CohereRelevanceScorer("rerank-v3.5");
const relevanceProvider = new ZeroEntropyRelevanceScorer("zerank-1");

重新排序的结果结合了向量相似性和语义理解,以提高检索质量。

🌐 The re-ranked results combine vector similarity with semantic understanding to improve retrieval quality.

有关重新排序的更多详情,请参见 rerank() 方法。

🌐 For more details about re-ranking, see the rerank() method.

对于基于图的检索,它会跟踪各个块之间的连接,请参阅 GraphRAG 文档。

🌐 For graph-based retrieval that follows connections between chunks, see the GraphRAG documentation.