文档
🌐 MDocument
MDocument 类用于处理 RAG 应用的文档。主要方法有 .chunk() 和 .extractMetadata()。
🌐 The MDocument class processes documents for RAG applications. The main methods are .chunk() and .extractMetadata().
构造函数Direct link to 构造函数
🌐 Constructor
docs:
type:
静态方法Direct link to 静态方法
🌐 Static Methods
fromText()Direct link to fromText()
从纯文本内容创建文档。
🌐 Creates a document from plain text content.
static fromText(text: string, metadata?: Record<string, any>): MDocument
fromHTML()Direct link to fromHTML()
从 HTML 内容创建文档。
🌐 Creates a document from HTML content.
static fromHTML(html: string, metadata?: Record<string, any>): MDocument
fromMarkdown()Direct link to fromMarkdown()
从 Markdown 内容创建文档。
🌐 Creates a document from Markdown content.
static fromMarkdown(markdown: string, metadata?: Record<string, any>): MDocument
fromJSON()Direct link to fromJSON()
从 JSON 内容创建文档。
🌐 Creates a document from JSON content.
static fromJSON(json: string, metadata?: Record<string, any>): MDocument
实例方法Direct link to 实例方法
🌐 Instance Methods
chunk()Direct link to chunk()
将文档拆分成多个块,并可选择提取元数据。
🌐 Splits document into chunks and optionally extracts metadata.
async chunk(params?: ChunkParams): Promise<Chunk[]>
有关详细选项,请参见 chunk() 参考。
🌐 See chunk() reference for detailed options.
getDocs()Direct link to getDocs()
返回处理后文档块的数组。
🌐 Returns array of processed document chunks.
getDocs(): Chunk[]
getText()Direct link to getText()
从块中返回文本字符串数组。
🌐 Returns array of text strings from chunks.
getText(): string[]
getMetadata()Direct link to getMetadata()
从数据块返回元数据对象数组。
🌐 Returns array of metadata objects from chunks.
getMetadata(): Record<string, any>[]
extractMetadata()Direct link to extractMetadata()
使用指定的提取器提取元数据。详情请参阅ExtractParams 参考。
🌐 Extracts metadata using specified extractors. See ExtractParams reference for details.
async extractMetadata(params: ExtractParams): Promise<MDocument>
示例Direct link to 示例
🌐 Examples
import { MDocument } from "@mastra/rag";
// Create document from text
const doc = MDocument.fromText("Your content here");
// Split into chunks with metadata extraction
const chunks = await doc.chunk({
strategy: "markdown",
headers: [
["#", "title"],
["##", "section"],
],
extract: {
summary: true, // Extract summaries with default settings
keywords: true, // Extract keywords with default settings
},
});
// Get processed chunks
const docs = doc.getDocs();
const texts = doc.getText();
const metadata = doc.getMetadata();