Skip to main content

DuckDB向量存储

🌐 DuckDBVector Store

DuckDB 的存储实现提供了一种嵌入式高性能向量搜索解决方案,使用 DuckDB —— 一种进程内分析型数据库。它使用 VSS 扩展进行向量相似性搜索,并采用 HNSW 索引,提供轻量且高效的向量数据库,无需外部服务器。

🌐 The DuckDB storage implementation provides an embedded high-performance vector search solution using DuckDB, an in-process analytical database. It uses the VSS extension for vector similarity search with HNSW indexing, offering a lightweight and efficient vector database that requires no external server.

它是 @mastra/duckdb 包的一部分,提供带有元数据过滤的高效向量相似度搜索。

🌐 It's part of the @mastra/duckdb package and offers efficient vector similarity search with metadata filtering.

安装
Direct link to 安装

🌐 Installation

npm install @mastra/duckdb@latest

用法
Direct link to 用法

🌐 Usage

import { DuckDBVector } from "@mastra/duckdb";

// Create a new vector store instance
const store = new DuckDBVector({
id: "duckdb-vector",
path: ":memory:", // or './vectors.duckdb' for file persistence
});

// Create an index
await store.createIndex({
indexName: "myCollection",
dimension: 1536,
metric: "cosine",
});

// Add vectors with metadata
const vectors = [[0.1, 0.2, ...], [0.3, 0.4, ...]];
const metadata = [
{ text: "first document", category: "A" },
{ text: "second document", category: "B" },
];
await store.upsert({
indexName: "myCollection",
vectors,
metadata,
});

// Query similar vectors
const queryVector = [0.1, 0.2, ...];
const results = await store.query({
indexName: "myCollection",
queryVector,
topK: 10,
filter: { category: "A" },
});

// Clean up
await store.close();

构造函数选项
Direct link to 构造函数选项

🌐 Constructor Options

id:

string
Unique identifier for the vector store instance

path?:

string
= ':memory:'
Database file path. Use ':memory:' for in-memory database, or a file path like './vectors.duckdb' for persistence.

dimensions?:

number
= 1536
Default dimension for vector embeddings

metric?:

'cosine' | 'euclidean' | 'dotproduct'
= cosine
Default distance metric for similarity search

方法
Direct link to 方法

🌐 Methods

createIndex()
Direct link to createIndex()

创建一个新的向量集合,并可选择使用 HNSW 索引以实现快速近似最近邻搜索。

🌐 Creates a new vector collection with optional HNSW index for fast approximate nearest neighbor search.

indexName:

string
Name of the index to create

dimension:

number
Vector dimension size (must match your embedding model)

metric?:

'cosine' | 'euclidean' | 'dotproduct'
= cosine
Distance metric for similarity search

upsert()
Direct link to upsert()

在索引中添加或更新向量及其元数据。

🌐 Adds or updates vectors and their metadata in the index.

indexName:

string
Name of the index to insert into

vectors:

number[][]
Array of embedding vectors

metadata?:

Record<string, any>[]
Metadata for each vector

ids?:

string[]
Optional vector IDs (auto-generated UUIDs if not provided)

query()
Direct link to query()

搜索具有可选元数据过滤的相似向量。

🌐 Searches for similar vectors with optional metadata filtering.

indexName:

string
Name of the index to search in

queryVector:

number[]
Query vector to find similar vectors for

topK?:

number
= 10
Number of results to return

filter?:

Filter
Metadata filters using MongoDB-like query syntax

includeVector?:

boolean
= false
Whether to include vector data in results

describeIndex()
Direct link to describeIndex()

获取有关索引的信息。

🌐 Gets information about an index.

indexName:

string
Name of the index to describe

返回:

🌐 Returns:

interface IndexStats {
dimension: number;
count: number;
metric: "cosine" | "euclidean" | "dotproduct";
}

deleteIndex()
Direct link to deleteIndex()

删除索引及其所有数据。

🌐 Deletes an index and all its data.

indexName:

string
Name of the index to delete

listIndexes()
Direct link to listIndexes()

列出数据库中所有的向量索引。

🌐 Lists all vector indexes in the database.

返回:Promise<string[]>

🌐 Returns: Promise<string[]>

updateVector()
Direct link to updateVector()

通过 ID 或元数据过滤器更新单个向量。必须提供 idfilter 中的一个,但不能同时提供两者。

🌐 Update a single vector by ID or by metadata filter. Either id or filter must be provided, but not both.

indexName:

string
Name of the index containing the vector

id?:

string
ID of the vector entry to update (mutually exclusive with filter)

filter?:

Record<string, any>
Metadata filter to identify vector(s) to update (mutually exclusive with id)

update:

object
Update data containing vector and/or metadata

update.vector?:

number[]
New vector data to update

update.metadata?:

Record<string, any>
New metadata to update

deleteVector()
Direct link to deleteVector()

通过其 ID 从索引中删除特定的向量条目。

🌐 Deletes a specific vector entry from an index by its ID.

indexName:

string
Name of the index containing the vector

id:

string
ID of the vector entry to delete

deleteVectors()
Direct link to deleteVectors()

通过 ID 或元数据过滤器删除多个向量。必须提供 idsfilter 中的一个,但不能同时提供两者。

🌐 Delete multiple vectors by IDs or by metadata filter. Either ids or filter must be provided, but not both.

indexName:

string
Name of the index containing the vectors to delete

ids?:

string[]
Array of vector IDs to delete (mutually exclusive with filter)

filter?:

Record<string, any>
Metadata filter to identify vectors to delete (mutually exclusive with ids)

close()
Direct link to close()

关闭数据库连接并释放资源。

🌐 Closes the database connection and releases resources.

await store.close();

响应类型
Direct link to 响应类型

🌐 Response Types

查询结果以此格式返回:

🌐 Query results are returned in this format:

interface QueryResult {
id: string;
score: number;
metadata: Record<string, any>;
vector?: number[]; // Only included if includeVector is true
}

筛选运算符
Direct link to 筛选运算符

🌐 Filter Operators

DuckDB 向量存储支持类似 MongoDB 的过滤操作符:

🌐 DuckDB vector store supports MongoDB-like filter operators:

类别运算符
比较$eq, $ne, $gt, $gte, $lt, $lte
逻辑$and, $or, $not, $nor
数组$in, $nin
元素$exists
文本$contains

筛选示例
Direct link to 筛选示例

🌐 Filter Examples

// Allegato operators
const results = await store.query({
indexName: "docs",
queryVector: [...],
filter: {
$and: [
{ category: "electronics" },
{ price: { $gte: 100, $lte: 500 } },
],
},
});

// Nested field access
const results = await store.query({
indexName: "docs",
queryVector: [...],
filter: { "user.profile.tier": "premium" },
});

距离度量
Direct link to 距离度量

🌐 Distance Metrics

指标描述分数解释适用场景
cosine余弦相似度0-1(1 = 最相似)文本向量、归一化向量
euclideanL2 距离0-∞(0 = 最相似)图片向量、空间数据
dotproduct内积越大 = 越相似当向量大小很重要时

错误处理
Direct link to 错误处理

🌐 Error Handling

针对不同的失败情况,存储会抛出特定的错误:

🌐 The store throws specific errors for different failure cases:

try {
await store.query({
indexName: "my-collection",
queryVector: queryVector,
});
} catch (error) {
if (error.message.includes("not found")) {
console.error("The specified index does not exist");
} else if (error.message.includes("Invalid identifier")) {
console.error("Index name contains invalid characters");
} else {
console.error("Vector store error:", error.message);
}
}

常见错误情况包括:

🌐 Common error cases include:

  • 索引名称格式无效
  • 未找到索引/表
  • 查询向量与索引的维度不匹配
  • 删除/更新操作中的过滤器或 ID 数组为空
  • 互斥冲突(同时提供 idfilter

用例
Direct link to 用例

🌐 Use Cases

🌐 Embedded Semantic Search

构建具有语义搜索功能的离线 AI 应用,该搜索完全在进程内运行:

🌐 Build offline-capable AI applications with semantic search that runs entirely in-process:

const store = new DuckDBVector({
id: "offline-search",
path: "./search.duckdb",
});

本地 RAG 流水线
Direct link to 本地 RAG 流水线

🌐 Local RAG Pipelines

在本地处理敏感文档,无需将数据发送到云端向量数据库:

🌐 Process sensitive documents locally without sending data to cloud vector databases:

const store = new DuckDBVector({
id: "private-rag",
path: "./confidential.duckdb",
dimensions: 1536,
});

开发与测试
Direct link to 开发与测试

🌐 Development and Testing

无需基础设施即可快速原型化向量搜索功能:

🌐 Rapidly prototype vector search features with zero infrastructure:

const store = new DuckDBVector({
id: "dev-store",
path: ":memory:", // Fast in-memory for tests
});

🌐 Related