DuckDB向量存储
🌐 DuckDBVector Store
DuckDB 的存储实现提供了一种嵌入式高性能向量搜索解决方案,使用 DuckDB —— 一种进程内分析型数据库。它使用 VSS 扩展进行向量相似性搜索,并采用 HNSW 索引,提供轻量且高效的向量数据库,无需外部服务器。
🌐 The DuckDB storage implementation provides an embedded high-performance vector search solution using DuckDB, an in-process analytical database. It uses the VSS extension for vector similarity search with HNSW indexing, offering a lightweight and efficient vector database that requires no external server.
它是 @mastra/duckdb 包的一部分,提供带有元数据过滤的高效向量相似度搜索。
🌐 It's part of the @mastra/duckdb package and offers efficient vector similarity search with metadata filtering.
安装Direct link to 安装
🌐 Installation
- npm
- pnpm
- Yarn
- Bun
npm install @mastra/duckdb@latest
pnpm add @mastra/duckdb@latest
yarn add @mastra/duckdb@latest
bun add @mastra/duckdb@latest
用法Direct link to 用法
🌐 Usage
import { DuckDBVector } from "@mastra/duckdb";
// Create a new vector store instance
const store = new DuckDBVector({
id: "duckdb-vector",
path: ":memory:", // or './vectors.duckdb' for file persistence
});
// Create an index
await store.createIndex({
indexName: "myCollection",
dimension: 1536,
metric: "cosine",
});
// Add vectors with metadata
const vectors = [[0.1, 0.2, ...], [0.3, 0.4, ...]];
const metadata = [
{ text: "first document", category: "A" },
{ text: "second document", category: "B" },
];
await store.upsert({
indexName: "myCollection",
vectors,
metadata,
});
// Query similar vectors
const queryVector = [0.1, 0.2, ...];
const results = await store.query({
indexName: "myCollection",
queryVector,
topK: 10,
filter: { category: "A" },
});
// Clean up
await store.close();
构造函数选项Direct link to 构造函数选项
🌐 Constructor Options
id:
path?:
dimensions?:
metric?:
方法Direct link to 方法
🌐 Methods
createIndex()Direct link to createIndex()
创建一个新的向量集合,并可选择使用 HNSW 索引以实现快速近似最近邻搜索。
🌐 Creates a new vector collection with optional HNSW index for fast approximate nearest neighbor search.
indexName:
dimension:
metric?:
upsert()Direct link to upsert()
在索引中添加或更新向量及其元数据。
🌐 Adds or updates vectors and their metadata in the index.
indexName:
vectors:
metadata?:
ids?:
query()Direct link to query()
搜索具有可选元数据过滤的相似向量。
🌐 Searches for similar vectors with optional metadata filtering.
indexName:
queryVector:
topK?:
filter?:
includeVector?:
describeIndex()Direct link to describeIndex()
获取有关索引的信息。
🌐 Gets information about an index.
indexName:
返回:
🌐 Returns:
interface IndexStats {
dimension: number;
count: number;
metric: "cosine" | "euclidean" | "dotproduct";
}
deleteIndex()Direct link to deleteIndex()
删除索引及其所有数据。
🌐 Deletes an index and all its data.
indexName:
listIndexes()Direct link to listIndexes()
列出数据库中所有的向量索引。
🌐 Lists all vector indexes in the database.
返回:Promise<string[]>
🌐 Returns: Promise<string[]>
updateVector()Direct link to updateVector()
通过 ID 或元数据过滤器更新单个向量。必须提供 id 或 filter 中的一个,但不能同时提供两者。
🌐 Update a single vector by ID or by metadata filter. Either id or filter must be provided, but not both.
indexName:
id?:
filter?:
update:
update.vector?:
update.metadata?:
deleteVector()Direct link to deleteVector()
通过其 ID 从索引中删除特定的向量条目。
🌐 Deletes a specific vector entry from an index by its ID.
indexName:
id:
deleteVectors()Direct link to deleteVectors()
通过 ID 或元数据过滤器删除多个向量。必须提供 ids 或 filter 中的一个,但不能同时提供两者。
🌐 Delete multiple vectors by IDs or by metadata filter. Either ids or filter must be provided, but not both.
indexName:
ids?:
filter?:
close()Direct link to close()
关闭数据库连接并释放资源。
🌐 Closes the database connection and releases resources.
await store.close();
响应类型Direct link to 响应类型
🌐 Response Types
查询结果以此格式返回:
🌐 Query results are returned in this format:
interface QueryResult {
id: string;
score: number;
metadata: Record<string, any>;
vector?: number[]; // Only included if includeVector is true
}
筛选运算符Direct link to 筛选运算符
🌐 Filter Operators
DuckDB 向量存储支持类似 MongoDB 的过滤操作符:
🌐 DuckDB vector store supports MongoDB-like filter operators:
| 类别 | 运算符 |
|---|---|
| 比较 | $eq, $ne, $gt, $gte, $lt, $lte |
| 逻辑 | $and, $or, $not, $nor |
| 数组 | $in, $nin |
| 元素 | $exists |
| 文本 | $contains |
筛选示例Direct link to 筛选示例
🌐 Filter Examples
// Allegato operators
const results = await store.query({
indexName: "docs",
queryVector: [...],
filter: {
$and: [
{ category: "electronics" },
{ price: { $gte: 100, $lte: 500 } },
],
},
});
// Nested field access
const results = await store.query({
indexName: "docs",
queryVector: [...],
filter: { "user.profile.tier": "premium" },
});
距离度量Direct link to 距离度量
🌐 Distance Metrics
| 指标 | 描述 | 分数解释 | 适用场景 |
|---|---|---|---|
cosine | 余弦相似度 | 0-1(1 = 最相似) | 文本向量、归一化向量 |
euclidean | L2 距离 | 0-∞(0 = 最相似) | 图片向量、空间数据 |
dotproduct | 内积 | 越大 = 越相似 | 当向量大小很重要时 |
错误处理Direct link to 错误处理
🌐 Error Handling
针对不同的失败情况,存储会抛出特定的错误:
🌐 The store throws specific errors for different failure cases:
try {
await store.query({
indexName: "my-collection",
queryVector: queryVector,
});
} catch (error) {
if (error.message.includes("not found")) {
console.error("The specified index does not exist");
} else if (error.message.includes("Invalid identifier")) {
console.error("Index name contains invalid characters");
} else {
console.error("Vector store error:", error.message);
}
}
常见错误情况包括:
🌐 Common error cases include:
- 索引名称格式无效
- 未找到索引/表
- 查询向量与索引的维度不匹配
- 删除/更新操作中的过滤器或 ID 数组为空
- 互斥冲突(同时提供
id和filter)
用例Direct link to 用例
🌐 Use Cases
嵌入式语义搜索Direct link to 嵌入式语义搜索
🌐 Embedded Semantic Search
构建具有语义搜索功能的离线 AI 应用,该搜索完全在进程内运行:
🌐 Build offline-capable AI applications with semantic search that runs entirely in-process:
const store = new DuckDBVector({
id: "offline-search",
path: "./search.duckdb",
});
本地 RAG 流水线Direct link to 本地 RAG 流水线
🌐 Local RAG Pipelines
在本地处理敏感文档,无需将数据发送到云端向量数据库:
🌐 Process sensitive documents locally without sending data to cloud vector databases:
const store = new DuckDBVector({
id: "private-rag",
path: "./confidential.duckdb",
dimensions: 1536,
});
开发与测试Direct link to 开发与测试
🌐 Development and Testing
无需基础设施即可快速原型化向量搜索功能:
🌐 Rapidly prototype vector search features with zero infrastructure:
const store = new DuckDBVector({
id: "dev-store",
path: ":memory:", // Fast in-memory for tests
});
相关Direct link to 相关
🌐 Related