Skip to main content

亚马逊 S3 向量存储

🌐 Amazon S3 Vectors Store

⚠️ Amazon S3 Vectors 是一项预览服务。 预览功能可能会在没有通知的情况下更改或移除,并且不受 AWS SLA 覆盖。 行为、限制和区域可用性可能随时变化。 为了与 AWS 保持一致,该库可能会引入重大变更。

S3Vectors 类提供使用 Amazon S3 Vectors (预览) 的向量搜索。它将向量存储在 向量桶 中,并在 向量索引 中执行相似性搜索,同时支持基于 JSON 的元数据过滤。

🌐 The S3Vectors class provides vector search using Amazon S3 Vectors (Preview). It stores vectors in vector buckets and performs similarity search in vector indexes, with JSON-based metadata filters.

安装
Direct link to 安装

🌐 Installation

npm install @mastra/s3vectors@latest

使用示例
Direct link to 使用示例

🌐 Usage Example

import { S3Vectors } from "@mastra/s3vectors";

const store = new S3Vectors({
vectorBucketName: process.env.S3_VECTORS_BUCKET_NAME!, // e.g. "my-vector-bucket"
clientConfig: {
region: process.env.AWS_REGION!, // credentials use the default AWS provider chain
},
// Optional: mark large/long-text fields as non-filterable at index creation time
nonFilterableMetadataKeys: ["content"],
});

// Create an index (names are normalized: "_" → "-" and lowercased)
await store.createIndex({
indexName: "my_index",
dimension: 1536,
metric: "cosine", // "euclidean" also supported; "dotproduct" is NOT supported
});

// Upsert vectors (ids auto-generated if omitted). Date values in metadata are serialized to epoch ms.
const ids = await store.upsert({
indexName: "my_index",
vectors: [
[0.1, 0.2 /* … */],
[0.3, 0.4 /* … */],
],
metadata: [
{
text: "doc1",
genre: "documentary",
year: 2023,
createdAt: new Date("2024-01-01"),
},
{ text: "doc2", genre: "comedy", year: 2021 },
],
});

// Query with metadata filters (implicit AND is canonicalized)
const results = await store.query({
indexName: "my-index",
queryVector: [0.1, 0.2 /* … */],
topK: 10, // Service-side limits may apply (commonly 30)
filter: { genre: { $in: ["documentary", "comedy"] }, year: { $gte: 2020 } },
includeVector: false, // set true to include raw vectors (may trigger a secondary fetch)
});

// Clean up resources (closes the underlying HTTP handler)
await store.disconnect();

构造函数选项
Direct link to 构造函数选项

🌐 Constructor Options

vectorBucketName:

string
Target S3 Vectors vector bucket name.

clientConfig?:

S3VectorsClientConfig
AWS SDK v3 client options (e.g., `region`, `credentials`).

nonFilterableMetadataKeys?:

string[]
Metadata keys that should NOT be filterable (applied to the index at creation time). Use this for large text fields like `content`.

方法
Direct link to 方法

🌐 Methods

createIndex()
Direct link to createIndex()

在配置的向量存储桶中创建一个新的向量索引。如果索引已存在,该调用会验证模式并成为无操作(现有的度量和维度保持不变)。

🌐 Creates a new vector index in the configured vector bucket. If the index already exists, the call validates the schema and becomes a no-op (existing metric and dimension are preserved).

indexName:

string
Logical index name. Normalized internally: underscores are replaced with hyphens and the name is lowercased.

dimension:

number
Vector dimension (must match your embedding model)

metric?:

'cosine' | 'euclidean'
= cosine
Distance metric for similarity search. `dotproduct` is not supported by S3 Vectors.

upsert()
Direct link to upsert()

添加或替换向量(完整记录放入)。如果未提供 ids,将生成 UUID。

🌐 Adds or replaces vectors (full-record put). If ids are not provided, UUIDs are generated.

indexName:

string
Name of the index to upsert into

vectors:

number[][]
Array of embedding vectors

metadata?:

Record<string, any>[]
Metadata for each vector

ids?:

string[]
Optional vector IDs (auto-generated if not provided)

query()
Direct link to query()

搜索最近邻,同时可选择进行元数据筛选。

🌐 Searches for nearest neighbors with optional metadata filtering.

indexName:

string
Name of the index to query

queryVector:

number[]
Query vector to find similar vectors

topK?:

number
= 10
Number of results to return

filter?:

S3VectorsFilter
JSON-based metadata filter supporting `$and`, `$or`, `$eq`, `$ne`, `$gt`, `$gte`, `$lt`, `$lte`, `$in`, `$nin`, `$exists`.

includeVector?:

boolean
= false
Whether to include vectors in the results

评分: 结果包括 score = 1/(1 + distance),这样分数越高越好,同时保留了原始的距离排序。

describeIndex()
Direct link to describeIndex()

返回有关索引的信息。

🌐 Returns information about the index.

indexName:

string
Index name to describe.

返回:

🌐 Returns:

interface IndexStats {
dimension: number;
count: number; // computed via ListVectors pagination (O(n))
metric: "cosine" | "euclidean";
}

deleteIndex()
Direct link to deleteIndex()

删除索引及其数据。

🌐 Deletes an index and its data.

indexName:

string
Index to delete.

listIndexes()
Direct link to listIndexes()

列出配置的向量存储桶中的所有索引。

🌐 Lists all indexes in the configured vector bucket.

返回:Promise<string[]>

🌐 Returns: Promise<string[]>

updateVector()
Direct link to updateVector()

更新索引中某个特定 ID 的向量或元数据。

🌐 Updates a vector or metadata for a specific ID within an index.

indexName:

string
Index containing the vector.

id:

string
ID to update.

update:

object
Update data containing vector and/or metadata

update.vector?:

number[]
New vector data to update

update.metadata?:

Record<string, any>
New metadata to update

deleteVector()
Direct link to deleteVector()

通过ID删除特定向量。

🌐 Deletes a specific vector by ID.

indexName:

string
Index containing the vector.

id:

string
ID to delete.

disconnect()
Direct link to disconnect()

关闭底层 AWS SDK HTTP 处理程序以释放套接字。

🌐 Closes the underlying AWS SDK HTTP handler to free sockets.

响应类型
Direct link to 响应类型

🌐 Response Types

查询结果以此格式返回:

🌐 Query results are returned in this format:

interface QueryResult {
id: string;
score: number; // 1/(1 + distance)
metadata: Record<string, any>;
vector?: number[]; // Only included if includeVector is true
}

过滤器语法
Direct link to 过滤器语法

🌐 Filter Syntax

S3 Vectors 支持严格子集的运算符和数值类型。Mastra 过滤器转换器:

🌐 S3 Vectors supports a strict subset of operators and value types. The Mastra filter translator:

  • 规范化隐式 AND{a:1,b:2}{ $and: [{a:1},{b:2}] }
  • 将日期值标准化为纪元毫秒,用于数值比较和数组元素。
  • 在等值位置(field: value$eq/$ne)不允许使用 Date;等值的值必须是 string | number | boolean
  • 拒绝 null/undefined 的相等比较;不支持数组相等(请使用 $in/$nin)。
  • 只有 $and / $or 被允许作为顶层逻辑运算符。
  • 逻辑运算符必须包含字段条件(而不是直接操作符)。

支持的操作符:

  • 逻辑: $and$or(非空数组)
  • 基本: $eq$ne(字符串 | 数字 | 布尔值)
  • 数字: $gt$gte$lt$lte(数字或 Date → 纪元毫秒)
  • 数组: $in$nin(非空数组,包含字符串 | 数字 | 布尔值;Date → Unix 时间毫秒)
  • 元素: $exists(布尔)

不支持 / 不允许(已拒绝): $not$nor$regex$all$elemMatch$size$text 等。

示例:

// Implicit AND
{ genre: { $in: ["documentary", "comedy"] }, year: { $gte: 2020 } }

// Explicit logicals and ranges
{
$and: [
{ price: { $gte: 100, $lte: 1000 } },
{ $or: [{ stock: { $gt: 0 } }, { preorder: true }] }
]
}

// Dates in range (converted to epoch ms)
{ timestamp: { $gt: new Date("2024-01-01T00:00:00Z") } }

不可过滤的键: 如果你在索引创建时设置了 nonFilterableMetadataKeys,这些键会被存储,但不能用于过滤。

错误处理
Direct link to 错误处理

🌐 Error Handling

该存储会抛出可以被捕获的类型化错误:

🌐 The store throws typed errors that can be caught:

try {
await store.query({
indexName: "index-name",
queryVector: queryVector,
});
} catch (error) {
if (error instanceof VectorStoreError) {
console.log(error.code); // 'connection_failed' | 'invalid_dimension' | etc
console.log(error.details); // Additional error context
}
}

环境变量
Direct link to 环境变量

🌐 Environment Variables

在配置应用时的典型环境变量:

🌐 Typical environment variables when wiring your app:

  • S3_VECTORS_BUCKET_NAME:你的 S3 向量存储桶 名称(用于填充 vectorBucketName)。
  • AWS_REGION:S3 Vectors 存储桶的 AWS 区域。
  • AWS 凭证:通过标准的 AWS SDK 提供链(AWS_ACCESS_KEY_IDAWS_SECRET_ACCESS_KEYAWS_PROFILE 等)。

最佳实践
Direct link to 最佳实践

🌐 Best Practices

  • 选择匹配你嵌入模型的度量(cosineeuclidean);不支持 dotproduct
  • 保持可筛选的元数据小且结构化(字符串/数字/布尔值)。将大文本(例如 content)存储为不可筛选
  • 对于嵌套元数据使用点状路径,对于复杂逻辑使用明确的 $and/$or
  • 避免在关键路径调用 describeIndex()——count 是通过分页 ListVectors 计算的(O(n))。
  • 只有在需要原始向量时才使用 includeVector: true

🌐 Related