Skip to main content

createDocumentChunkerTool()

createDocumentChunkerTool() 函数用于创建一个工具,将文档拆分成更小的块,以实现高效的处理和检索。它支持不同的拆分策略和可配置的参数。

🌐 The createDocumentChunkerTool() function creates a tool for splitting documents into smaller chunks for efficient processing and retrieval. It supports different chunking strategies and configurable parameters.

基本用法
Direct link to 基本用法

🌐 Basic Usage

import { createDocumentChunkerTool, MDocument } from "@mastra/rag";

const document = new MDocument({
text: "Your document content here...",
metadata: { source: "user-manual" },
});

const chunker = createDocumentChunkerTool({
doc: document,
params: {
strategy: "recursive",
size: 512,
overlap: 50,
separator: "\n",
},
});

const { chunks } = await chunker.execute();

参数
Direct link to 参数

🌐 Parameters

doc:

MDocument
The document to be chunked

params?:

ChunkParams
= Default chunking parameters
Configuration parameters for chunking

ChunkParams
Direct link to ChunkParams

strategy?:

'recursive'
= 'recursive'
The chunking strategy to use

size?:

number
= 512
Target size of each chunk in tokens/characters

overlap?:

number
= 50
Number of overlapping tokens/characters between chunks

separator?:

string
= '\n'
Character(s) to use as chunk separator

返回
Direct link to 返回

🌐 Returns

chunks:

DocumentChunk[]
Array of document chunks with their content and metadata

自定义参数示例
Direct link to 自定义参数示例

🌐 Example with Custom Parameters

const technicalDoc = new MDocument({
text: longDocumentContent,
metadata: {
type: "technical",
version: "1.0",
},
});

const chunker = createDocumentChunkerTool({
doc: technicalDoc,
params: {
strategy: "recursive",
size: 1024, // Larger chunks
overlap: 100, // More overlap
separator: "\n\n", // Split on double newlines
},
});

const { chunks } = await chunker.execute();

// Process the chunks
chunks.forEach((chunk, index) => {
console.log(`Chunk ${index + 1} length: ${chunk.content.length}`);
});

工具详情
Direct link to 工具详情

🌐 Tool Details

这个分块器作为 Mastra 工具创建,具有以下属性:

🌐 The chunker is created as a Mastra tool with the following properties:

  • 工具ID: Document Chunker {strategy} {size}
  • 描述Chunks document using {strategy} strategy with size {size} and {overlap} overlap
  • 输入模式:空对象(不需要其他输入)
  • 输出模式:包含 chunks 数组的对象

🌐 Related