搜索与索引

🌐 Search and Indexing

搜索让代理可以在已编入索引的工作区文件中找到相关内容。当代理需要回答问题或查找信息时，它可以搜索已编入索引的内容，而不是阅读每个文件。

🌐 Search lets agents find relevant content in indexed workspace files. When an agent needs to answer a question or find information, it can search the indexed content instead of reading every file.

它是如何工作的
Direct link to 它是如何工作的

🌐 How it works

工作区搜索有两个阶段：索引和查询。

🌐 Workspace search has two phases: indexing and querying.

索引
Direct link to 索引

🌐 Indexing

内容必须先建立索引才能被搜索。建立文档索引时：

🌐 Content must be indexed before it can be searched. When you index a document:

内容已被分词（拆分为可搜索的术语）
对于 BM25：计算词频和文档统计信息
对于向量：内容使用你的嵌入函数进行嵌入，并存储在向量存储中

每个已编入索引的文档具有：

🌐 Each indexed document has:

id - 唯一标识符（通常是文件路径）
内容 - 文本内容
元数据 - 可选的键值数据，与文档一起存储

查询中
Direct link to 查询中

🌐 Querying

当你搜索时：

🌐 When you search:

查询使用与索引相同的分词/嵌入方式进行处理
文档根据与查询的相关性进行评分
结果按得分排序，并返回匹配的内容

工作区支持三种搜索模式：BM25 关键字搜索、向量语义搜索以及结合两者的混合搜索。

🌐 Workspaces support three search modes: BM25 keyword search, vector semantic search, and hybrid search that combines both.

BM25 关键词搜索
Direct link to BM25 关键词搜索

🌐 BM25 keyword search

BM25 根据词频和文档长度对文档进行评分。它在精确匹配和特定术语方面效果很好。

🌐 BM25 scores documents based on term frequency and document length. It works well for exact matches and specific terminology.

src/mastra/workspaces.ts
import { Workspace, LocalFilesystem } from '@mastra/core/workspace';

const workspace = new Workspace({
  filesystem: new LocalFilesystem({ basePath: './workspace' }),
  bm25: true,
});

自定义 BM25 参数（k1 是词频饱和度，b 是文档长度归一化）：

🌐 For custom BM25 parameters (k1 is term frequency saturation, b is document length normalization):

src/mastra/workspaces.ts
const workspace = new Workspace({
  filesystem: new LocalFilesystem({ basePath: './workspace' }),
  bm25: {
    k1: 1.5,
    b: 0.75,
  },
});

向量搜索
Direct link to 向量搜索

🌐 Vector search

向量搜索使用嵌入来查找语义相似的内容。它需要向量存储和嵌入函数。

🌐 Vector search uses embeddings to find semantically similar content. It requires a vector store and embedder function.

src/mastra/workspaces.ts
import { Workspace, LocalFilesystem } from '@mastra/core/workspace';
import { PineconeVector } from '@mastra/pinecone';
import { embed } from 'ai';
import { openai } from '@ai-sdk/openai';

const workspace = new Workspace({
  filesystem: new LocalFilesystem({ basePath: './workspace' }),
  vectorStore: new PineconeVector({
    apiKey: process.env.PINECONE_API_KEY,
    index: 'workspace-index',
  }),
  embedder: async (text: string) => {
    const { embedding } = await embed({
      model: openai.embedding('text-embedding-3-small'),
      value: text,
    });
    return embedding;
  },
});

混合搜索
Direct link to 混合搜索

🌐 Hybrid search

配置 BM25 和向量搜索以启用混合模式，该模式结合了关键词匹配和语义理解。

🌐 Configure both BM25 and vector search to enable hybrid mode, which combines keyword matching with semantic understanding.

src/mastra/workspaces.ts
const workspace = new Workspace({
  filesystem: new LocalFilesystem({ basePath: './workspace' }),
  bm25: true,
  vectorStore: pineconeVector,
  embedder: embedderFn,
});

索引内容
Direct link to 索引内容

🌐 Indexing content

手动索引
Direct link to 手动索引

🌐 Manual indexing

使用 workspace.index() 以编程方式向搜索索引添加内容。文件路径将成为文档 ID。你还可以为每个文档传递元数据。

🌐 Use workspace.index() to add content to the search index programmatically. The file paths become document IDs. You can also pass metadata for each document.

// Basic indexing
await workspace.index('/docs/guide.md', 'Content of the guide...');

// Index with metadata for filtering or context
await workspace.index('/docs/api.md', apiDocContent, {
  metadata: {
    category: 'api',
    version: '2.0',
  },
});

手动索引在以下情况下很有用：

🌐 Manual indexing is useful when:

你正在索引不来自文件的内容（例如数据库记录、API响应）
你想在建立索引之前对内容进行预处理或分块
你需要向文档添加自定义元数据

自动索引
Direct link to 自动索引

🌐 Auto-indexing

配置 autoIndexPaths 以在工作区初始化时自动索引文件。每个路径指定一个要递归索引的目录。

🌐 Configure autoIndexPaths to automatically index files when the workspace initializes. Each path specifies a directory to index recursively.

const workspace = new Workspace({
  filesystem: new LocalFilesystem({ basePath: './workspace' }),
  bm25: true,
  autoIndexPaths: ['/docs', '/support/faq'],
});

await workspace.init();

当调用 init() 时，会读取指定目录中的所有文件并为搜索建立索引。文件路径将作为文档ID。

🌐 When init() is called, all files in the specified directories are read and indexed for search. The file path becomes the document ID.

note

路径必须是目录，而不是通配符模式。使用 /docs 递归索引 docs 目录中的所有文件。不支持像 **/*.md 这样的通配符模式。

搜索中
Direct link to 搜索中

🌐 Searching

使用 workspace.search() 查找相关内容。结果按相关性得分排序。

🌐 Use workspace.search() to find relevant content. Results are ranked by relevance score.

const results = await workspace.search('password reset');

for (const result of results) {
  console.log(`${result.id}: ${result.score}`);
  console.log(result.content);
}

搜索选项
Direct link to 搜索选项

🌐 Search options

你可以通过以下选项自定义搜索行为：

🌐 You can customize the search behavior with options:

const results = await workspace.search('authentication flow', {
  topK: 10,
  mode: 'hybrid',
  minScore: 0.5,
  vectorWeight: 0.5,
});

选项	描述
`topK`	返回结果的最大数量。默认值：5
`mode`	搜索模式：`'bm25'`、`'vector'` 或 `'hybrid'`。默认根据配置选择最佳可用模式。
`minScore`	过滤掉低于此分数阈值（0-1）的结果。
`vectorWeight`	在混合模式下，向量分数与 BM25 的权重比例。0 = 全部 BM25，1 = 全部向量，0.5 = 二者相等。

搜索结果
Direct link to 搜索结果

🌐 Search results

每个结果包含：

🌐 Each result contains:

interface SearchResult {
  id: string; // Document ID (typically file path)
  content: string; // The matching content
  score: number; // Relevance score (0-1)
  lineRange?: { // Lines where the match was found
    start: number;
    end: number;
  };
  metadata?: Record<string, unknown>; // Metadata stored with the document
  scoreDetails?: { // Score breakdown (hybrid mode only)
    vector?: number;
    bm25?: number;
  };
}

理解分数：

分数范围从0到1，其中1表示完全匹配
BM25 分数会根据结果集中的最佳匹配进行归一化
向量分数表示查询与文档嵌入之间的余弦相似度
在混合模式下，分数使用 vectorWeight 参数进行组合

何时使用每种模式
Direct link to 何时使用每种模式

🌐 When to use each mode

模式	适用场景	示例查询
`bm25`	精确术语、技术查询、代码	"useState hook"、"404 错误"、"config.yaml"
`vector`	概念性查询、自然语言	"如何处理用户认证"、"错误处理最佳实践"
`hybrid`	一般搜索、未知查询类型	大多数代理使用场景

代理工具
Direct link to 代理工具

🌐 Agent tools

当你在工作区配置搜索时，代理将获得用于搜索和索引内容的工具。详细信息请参见工作区类参考。

🌐 When you configure search on a workspace, agents receive tools for searching and indexing content. See workspace class reference for details.

🌐 Related

它是如何工作的Direct link to 它是如何工作的

索引Direct link to 索引

查询中Direct link to 查询中

BM25 关键词搜索Direct link to BM25 关键词搜索

向量搜索Direct link to 向量搜索

混合搜索Direct link to 混合搜索

索引内容Direct link to 索引内容

手动索引Direct link to 手动索引

自动索引Direct link to 自动索引

搜索中Direct link to 搜索中

搜索选项Direct link to 搜索选项

搜索结果Direct link to 搜索结果

何时使用每种模式Direct link to 何时使用每种模式

代理工具Direct link to 代理工具

相关Direct link to 相关