Skip to main content

Chroma 向量存储

🌐 Chroma Vector Store

ChromaVector 类使用 Chroma 提供向量搜索,Chroma 是一个开源的嵌入数据库。它提供高效的向量搜索,并支持元数据过滤和混合搜索功能。

🌐 The ChromaVector class provides vector search using Chroma, an open-source embedding database. It offers efficient vector search with metadata filtering and hybrid search capabilities.

info
色彩云

Chroma Cloud 提供无服务器向量和全文搜索。它速度极快、成本高效、可扩展且使用无忧。创建一个数据库,并用 5 美元的免费额度在不到 30 秒的时间内试用。

开始使用 Chroma Cloud

构造函数选项
Direct link to 构造函数选项

🌐 Constructor Options

host?:

string
The host address of the Chroma server. Defaults to 'localhost'

port?:

number
The port number of the Chroma server. Defaults to 8000

ssl?:

boolean
Whether to use SSL/HTTPS for connections. Defaults to false

apiKey?:

string
A Chroma Cloud API key

tenant?:

string
The tenant name in the Chroma server to connect to. Defaults to 'default_tenant' for single-node Chroma. Auto-resolved for Chroma Cloud users based on the provided API key

database?:

string
The database name to connect to. Defaults to 'default_database' for single-node Chroma. Auto-resolved for Chroma Cloud users based on the provided API key

headers?:

Record<string, any>
Additional HTTP headers to send with requests

fetchOptions?:

RequestInit
Additional fetch options for HTTP requests

运行 Chroma 服务器
Direct link to 运行 Chroma 服务器

🌐 Running a Chroma Server

如果你是 Chroma Cloud 用户,只需向 ChromaVector 构造函数提供你的 API 密钥、租户和数据库名称。

🌐 If you are a Chroma Cloud user, simply provide the ChromaVector constructor your API key, tenant, and database name.

当你安装 @mastra/chroma 包时,你可以使用 Chroma CLI,它可以为你设置这些环境变量:chroma db connect [DB-NAME] --env-file

🌐 When you install the @mastra/chroma package, you get access to the Chroma CLI, which can set these as environment variables for you: chroma db connect [DB-NAME] --env-file.

否则,你有几种选项可以设置你的单节点 Chroma 服务器:

🌐 Otherwise, you have several options for setting up your single-node Chroma server:

  • 使用 Chroma CLI 本地运行一个实例:chroma run。你可以在 Chroma 文档 中找到更多配置选项。
  • 使用官方 Chroma 镜像在 Docker 上运行。
  • 在你选择的服务提供商上部署自己的 Chroma 服务器。Chroma 提供了适用于 AWSAzureGCP 的示例模板。

方法
Direct link to 方法

🌐 Methods

createIndex()
Direct link to createIndex()

indexName:

string
Name of the index to create

dimension:

number
Vector dimension (must match your embedding model)

metric?:

'cosine' | 'euclidean' | 'dotproduct'
= cosine
Distance metric for similarity search

forkIndex()
Direct link to forkIndex()

注意:仅在 Chroma Cloud 上支持分叉,或者如果你部署自己的开源 分布式 Chroma,也支持分叉。

🌐 Note: Forking is only supported on Chroma Cloud, or if you deploy your own OSS distributed Chroma.

forkIndex 让你可以立即分叉现有的 Chroma 索引。对分叉索引的操作不会影响原始索引。更多信息请参见 Chroma 文档

indexName:

string
Name of the index to fork

newIndexName:

string
The name of the forked index

upsert()
Direct link to upsert()

indexName:

string
Name of the index to upsert into

vectors:

number[][]
Array of embedding vectors

metadata?:

Record<string, any>[]
Metadata for each vector

ids?:

string[]
Optional vector IDs (auto-generated if not provided)

documents?:

string[]
Chroma-specific: Original text documents associated with the vectors

query()
Direct link to query()

使用 queryVector 查询索引。返回按与 queryVector 距离排序的语义相似记录数组。每条记录的结构如下:

🌐 Query an index using a queryVector. Returns an array of semantically similar records in order of distance from the queryVector. Each record has the shape:

{
id: string;
score: number;
document?: string;
metadata?: Record<string, string | number | boolean>;
embedding?: number[]
}

你还可以向 query 调用提供你的元数据的形状以进行类型推断:query<T>()

indexName:

string
Name of the index to query

queryVector:

number[]
Query vector to find similar vectors

topK?:

number
= 10
Number of results to return

filter?:

Record<string, any>
Metadata filters for the query

includeVector?:

boolean
= false
Whether to include vectors in the results

documentFilter?:

Record<string, any>
Chroma-specific: Filter to apply on the document content

get()
Direct link to get()

通过ID、元数据和文档过滤器从你的Chroma索引中获取记录。它返回一个形状如下的记录数组:

🌐 Get records from your Chroma index by IDs, metadata, and document filters. It returns an array of records of the shape:

{
id: string;
document?: string;
metadata?: Record<string, string | number | boolean>;
embedding?: number[]
}

你还可以向 get 调用提供你的元数据的形状以进行类型推断:get<T>()

indexName:

string
Name of the index to query

ids?:

string[]
A list of record IDs to return. If not provided, all records are returned.

filter?:

Record<string, any>
Metadata filters.

includeVector?:

boolean
= false
Whether to include vectors in the results

documentFilter?:

Record<string, any>
Chroma-specific: Filter to apply on the document content

limit?:

number
= 100
The maximum number of records to return

offset?:

number
0
Offset for returning records. Use with `limit` to paginate results.

listIndexes()
Direct link to listIndexes()

返回一个由索引名称组成的字符串数组。

🌐 Returns an array of index names as strings.

describeIndex()
Direct link to describeIndex()

indexName:

string
Name of the index to describe

返回:

🌐 Returns:

interface IndexStats {
dimension: number;
count: number;
metric: "cosine" | "euclidean" | "dotproduct";
}

deleteIndex()
Direct link to deleteIndex()

indexName:

string
Name of the index to delete

updateVector()
Direct link to updateVector()

通过 ID 或元数据过滤器更新单个向量。必须提供 idfilter 中的一个,但不能同时提供两者。

🌐 Update a single vector by ID or by metadata filter. Either id or filter must be provided, but not both.

indexName:

string
Name of the index containing the vector to update

id?:

string
ID of the vector to update (mutually exclusive with filter)

filter?:

Record<string, any>
Metadata filter to identify vector(s) to update (mutually exclusive with id)

update:

object
Update parameters

update 对象可以包含:

🌐 The update object can contain:

vector?:

number[]
New vector to replace the existing one

metadata?:

Record<string, any>
New metadata to replace the existing metadata

示例:

🌐 Example:

// Update by ID
await vectorStore.updateVector({
indexName: 'docs',
id: 'vec_123',
update: { metadata: { status: 'reviewed' } }
});

// Update by filter
await vectorStore.updateVector({
indexName: 'docs',
filter: { source_id: 'manual.pdf' },
update: { metadata: { version: 2 } }
});

deleteVector()
Direct link to deleteVector()

indexName:

string
Name of the index containing the vector to delete

id:

string
ID of the vector to delete

deleteVectors()
Direct link to deleteVectors()

通过 ID 或元数据过滤器删除多个向量。此方法支持批量删除和基于来源的向量管理。必须提供 idsfilter,但不能同时提供两者。

🌐 Delete multiple vectors by IDs or by metadata filter. This method enables bulk deletion and source-based vector management. Either ids or filter must be provided, but not both.

indexName:

string
Name of the index containing the vectors to delete

ids?:

string[]
Array of vector IDs to delete (mutually exclusive with filter)

filter?:

Record<string, any>
Metadata filter to identify vectors to delete (mutually exclusive with ids)

示例:

🌐 Example:

// Delete all chunks from a document
await vectorStore.deleteVectors({
indexName: 'docs',
filter: { source_id: 'manual.pdf' }
});

// Delete multiple vectors by ID
await vectorStore.deleteVectors({
indexName: 'docs',
ids: ['vec_1', 'vec_2', 'vec_3']
});

// Delete old temporary documents
await vectorStore.deleteVectors({
indexName: 'docs',
filter: {
$and: [
{ bucket: 'temp' },
{ indexed_at: { $lt: '2025-01-01' } }
]
}
});

响应类型
Direct link to 响应类型

🌐 Response Types

查询结果以此格式返回:

🌐 Query results are returned in this format:

interface QueryResult {
id: string;
score: number;
metadata: Record<string, any>;
document?: string; // Chroma-specific: Original document if it was stored
vector?: number[]; // Only included if includeVector is true
}

错误处理
Direct link to 错误处理

🌐 Error Handling

该存储会抛出可以被捕获的类型化错误:

🌐 The store throws typed errors that can be caught:

try {
await store.query({
indexName: "index_name",
queryVector: queryVector,
});
} catch (error) {
if (error instanceof VectorStoreError) {
console.log(error.code); // 'connection_failed' | 'invalid_dimension' | etc
console.log(error.details); // Additional error context
}
}

🌐 Related