Skip to main content

文档

🌐 MDocument

MDocument 类用于处理 RAG 应用的文档。主要方法有 .chunk().extractMetadata()

🌐 The MDocument class processes documents for RAG applications. The main methods are .chunk() and .extractMetadata().

构造函数
Direct link to 构造函数

🌐 Constructor

docs:

Array<{ text: string, metadata?: Record<string, any> }>
Array of document chunks with their text content and optional metadata

type:

'text' | 'html' | 'markdown' | 'json' | 'latex'
Type of document content

静态方法
Direct link to 静态方法

🌐 Static Methods

fromText()
Direct link to fromText()

从纯文本内容创建文档。

🌐 Creates a document from plain text content.

static fromText(text: string, metadata?: Record<string, any>): MDocument

fromHTML()
Direct link to fromHTML()

从 HTML 内容创建文档。

🌐 Creates a document from HTML content.

static fromHTML(html: string, metadata?: Record<string, any>): MDocument

fromMarkdown()
Direct link to fromMarkdown()

从 Markdown 内容创建文档。

🌐 Creates a document from Markdown content.

static fromMarkdown(markdown: string, metadata?: Record<string, any>): MDocument

fromJSON()
Direct link to fromJSON()

从 JSON 内容创建文档。

🌐 Creates a document from JSON content.

static fromJSON(json: string, metadata?: Record<string, any>): MDocument

实例方法
Direct link to 实例方法

🌐 Instance Methods

chunk()
Direct link to chunk()

将文档拆分成多个块,并可选择提取元数据。

🌐 Splits document into chunks and optionally extracts metadata.

async chunk(params?: ChunkParams): Promise<Chunk[]>

有关详细选项,请参见 chunk() 参考

🌐 See chunk() reference for detailed options.

getDocs()
Direct link to getDocs()

返回处理后文档块的数组。

🌐 Returns array of processed document chunks.

getDocs(): Chunk[]

getText()
Direct link to getText()

从块中返回文本字符串数组。

🌐 Returns array of text strings from chunks.

getText(): string[]

getMetadata()
Direct link to getMetadata()

从数据块返回元数据对象数组。

🌐 Returns array of metadata objects from chunks.

getMetadata(): Record<string, any>[]

extractMetadata()
Direct link to extractMetadata()

使用指定的提取器提取元数据。详情请参阅ExtractParams 参考

🌐 Extracts metadata using specified extractors. See ExtractParams reference for details.

async extractMetadata(params: ExtractParams): Promise<MDocument>

示例
Direct link to 示例

🌐 Examples

import { MDocument } from "@mastra/rag";

// Create document from text
const doc = MDocument.fromText("Your content here");

// Split into chunks with metadata extraction
const chunks = await doc.chunk({
strategy: "markdown",
headers: [
["#", "title"],
["##", "section"],
],
extract: {
summary: true, // Extract summaries with default settings
keywords: true, // Extract keywords with default settings
},
});

// Get processed chunks
const docs = doc.getDocs();
const texts = doc.getText();
const metadata = doc.getMetadata();