Google Gemini实时语音
🌐 Google Gemini Live Voice
GeminiLiveVoice 类使用 Google 的 Gemini Live API 提供实时语音交互功能。它支持双向音频流、工具调用、会话管理,以及标准 Google API 和 Vertex AI 两种认证方式。
🌐 The GeminiLiveVoice class provides real-time voice interaction capabilities using Google's Gemini Live API. It supports bidirectional audio streaming, tool calling, session management, and both standard Google API and Vertex AI authentication methods.
使用示例Direct link to 使用示例
🌐 Usage Example
import { GeminiLiveVoice } from "@mastra/voice-google-gemini-live";
import { playAudio, getMicrophoneStream } from "@mastra/node-audio";
// Initialize with Gemini API (using API key)
const voice = new GeminiLiveVoice({
apiKey: process.env.GOOGLE_API_KEY, // Required for Gemini API
model: "gemini-2.0-flash-exp",
speaker: "Puck", // Default voice
debug: true,
});
// Or initialize with Vertex AI (using OAuth)
const voiceWithVertexAI = new GeminiLiveVoice({
vertexAI: true,
project: "your-gcp-project",
location: "us-central1",
serviceAccountKeyFile: "/path/to/service-account.json",
model: "gemini-2.0-flash-exp",
speaker: "Puck",
});
// Or use the VoiceConfig pattern (recommended for consistency with other providers)
const voiceWithConfig = new GeminiLiveVoice({
speechModel: {
name: "gemini-2.0-flash-exp",
apiKey: process.env.GOOGLE_API_KEY,
},
speaker: "Puck",
realtimeConfig: {
model: "gemini-2.0-flash-exp",
apiKey: process.env.GOOGLE_API_KEY,
options: {
debug: true,
sessionConfig: {
interrupts: { enabled: true },
},
},
},
});
// Establish connection (required before using other methods)
await voice.connect();
// Set up event listeners
voice.on("speaker", (audioStream) => {
// Handle audio stream (NodeJS.ReadableStream)
playAudio(audioStream);
});
voice.on("writing", ({ text, role }) => {
// Handle transcribed text
console.log(`${role}: ${text}`);
});
voice.on("turnComplete", ({ timestamp }) => {
// Handle turn completion
console.log("Turn completed at:", timestamp);
});
// Convert text to speech
await voice.speak("Hello, how can I help you today?", {
speaker: "Charon", // Override default voice
responseModalities: ["AUDIO", "TEXT"],
});
// Process audio input
const microphoneStream = getMicrophoneStream();
await voice.send(microphoneStream);
// Update session configuration
await voice.updateSessionConfig({
speaker: "Kore",
instructions: "Be more concise in your responses",
});
// When done, disconnect
await voice.disconnect();
// Or use the synchronous wrapper
voice.close();
配置Direct link to 配置
🌐 Configuration
构造函数选项Direct link to 构造函数选项
🌐 Constructor Options
apiKey?:
model?:
speaker?:
vertexAI?:
project?:
location?:
serviceAccountKeyFile?:
serviceAccountEmail?:
instructions?:
sessionConfig?:
debug?:
会话配置Direct link to 会话配置
🌐 Session Configuration
interrupts?:
interrupts.enabled?:
interrupts.allowUserInterruption?:
contextCompression?:
方法Direct link to 方法
🌐 Methods
connect()Direct link to connect()
建立与 Gemini Live API 的连接。必须在使用 speak、listen 或 send 方法之前调用。
🌐 Establishes a connection to the Gemini Live API. Must be called before using speak, listen, or send methods.
requestContext?:
returns:
speak()Direct link to speak()
将文本转换为语音并发送到模型。输入可以是字符串或可读流。
🌐 Converts text to speech and sends it to the model. Can accept either a string or a readable stream as input.
input:
options?:
options.speaker?:
options.languageCode?:
options.responseModalities?:
返回:Promise<void>(响应通过 speaker 和 writing 事件发送)
🌐 Returns: Promise<void> (responses are emitted via speaker and writing events)
listen()Direct link to listen()
处理用于语音识别的音频输入。接收可读的音频数据流并返回转录的文本。
🌐 Processes audio input for speech recognition. Takes a readable stream of audio data and returns the transcribed text.
audioStream:
options?:
返回:Promise<string> - 转录的文本
🌐 Returns: Promise<string> - The transcribed text
send()Direct link to send()
将音频数据实时流式传输到 Gemini 服务,以实现持续音频流场景,例如实时麦克风输入。
🌐 Streams audio data in real-time to the Gemini service for continuous audio streaming scenarios like live microphone input.
audioData:
返回:Promise<void>
🌐 Returns: Promise<void>
updateSessionConfig()Direct link to updateSessionConfig()
动态更新会话配置。可用于修改语音设置、扬声器选择以及其他运行时配置。
🌐 Updates the session configuration dynamically. This can be used to modify voice settings, speaker selection, and other runtime configurations.
config:
返回:Promise<void>
🌐 Returns: Promise<void>
addTools()Direct link to addTools()
向语音实例添加一组工具。工具使模型能够在对话过程中执行额外的操作。当将 GeminiLiveVoice 添加到代理时,为该代理配置的任何工具都将自动可用于语音界面。
🌐 Adds a set of tools to the voice instance. Tools allow the model to perform additional actions during conversations. When GeminiLiveVoice is added to an Agent, any tools configured for the Agent will automatically be available to the voice interface.
tools:
返回:void
🌐 Returns: void
addInstructions()Direct link to addInstructions()
为模型添加或更新系统指令。
🌐 Adds or updates system instructions for the model.
instructions?:
返回:void
🌐 Returns: void
answer()Direct link to answer()
触发模型的响应。此方法主要在与代理集成时内部使用。
🌐 Triggers a response from the model. This method is primarily used internally when integrated with an Agent.
options?:
返回:Promise<void>
🌐 Returns: Promise<void>
getSpeakers()Direct link to getSpeakers()
返回 Gemini Live API 可用的语音列表。
🌐 Returns a list of available voice speakers for the Gemini Live API.
返回:Promise<Array<{ voiceId: string; description?: string }>>
disconnect()Direct link to disconnect()
断开与 Gemini Live 会话的连接并清理资源。这是正确处理清理的异步方法。
🌐 Disconnects from the Gemini Live session and cleans up resources. This is the async method that properly handles cleanup.
返回:Promise<void>
🌐 Returns: Promise<void>
close()Direct link to close()
disconnect() 的同步封装器。在内部调用 disconnect(),但不等待其完成。
🌐 Synchronous wrapper for disconnect(). Calls disconnect() internally without awaiting.
返回:void
🌐 Returns: void
on()Direct link to on()
为语音事件注册一个事件监听器。
🌐 Registers an event listener for voice events.
event:
callback:
返回:void
🌐 Returns: void
off()Direct link to off()
移除先前注册的事件监听器。
🌐 Removes a previously registered event listener.
event:
callback:
返回:void
🌐 Returns: void
事件Direct link to 事件
🌐 Events
GeminiLiveVoice 类会触发以下事件:
🌐 The GeminiLiveVoice class emits the following events:
speaker:
speaking:
writing:
session:
turnComplete:
toolCall:
usage:
error:
interrupt:
可用型号Direct link to 可用型号
🌐 Available Models
以下 Gemini Live 模型可用:
🌐 The following Gemini Live models are available:
gemini-2.0-flash-exp(默认)gemini-2.0-flash-exp-image-generationgemini-2.0-flash-live-001gemini-live-2.5-flash-preview-native-audiogemini-2.5-flash-exp-native-audio-thinking-dialoggemini-live-2.5-flash-previewgemini-2.6.flash-preview-tts
可用语音Direct link to 可用语音
🌐 Available Voices
以下语音选项可用:
🌐 The following voice options are available:
Puck(默认):对话式、友好Charon:深刻、权威Kore:中立、专业Fenrir:热情、平易近人
认证方法Direct link to 认证方法
🌐 Authentication Methods
Gemini API(开发版)Direct link to Gemini API(开发版)
🌐 Gemini API (Development)
使用来自 Google AI Studio 的 API 密钥的最简单方法:
🌐 The simplest method using an API key from Google AI Studio:
const voice = new GeminiLiveVoice({
apiKey: "your-api-key", // Required for Gemini API
model: "gemini-2.0-flash-exp",
});
Vertex AI(生产环境)Direct link to Vertex AI(生产环境)
🌐 Vertex AI (Production)
用于具有 OAuth 身份验证和 Google 云平台的生产环境:
🌐 For production use with OAuth authentication and Google Cloud Platform:
// Using service account key file
const voice = new GeminiLiveVoice({
vertexAI: true,
project: "your-gcp-project",
location: "us-central1",
serviceAccountKeyFile: "/path/to/service-account.json",
});
// Using Application Default Credentials
const voice = new GeminiLiveVoice({
vertexAI: true,
project: "your-gcp-project",
location: "us-central1",
});
// Using service account impersonation
const voice = new GeminiLiveVoice({
vertexAI: true,
project: "your-gcp-project",
location: "us-central1",
serviceAccountEmail: "service-account@project.iam.gserviceaccount.com",
});
高级功能Direct link to 高级功能
🌐 Advanced Features
会话管理Direct link to 会话管理
🌐 Session Management
Gemini Live API 支持会话恢复,以处理网络中断:
🌐 The Gemini Live API supports session resumption for handling network interruptions:
voice.on("sessionHandle", ({ handle, expiresAt }) => {
// Store session handle for resumption
saveSessionHandle(handle, expiresAt);
});
// Resume a previous session
const voice = new GeminiLiveVoice({
sessionConfig: {
enableResumption: true,
maxDuration: "2h",
},
});
工具调用Direct link to 工具调用
🌐 Tool Calling
在对话中启用模型调用函数:
🌐 Enable the model to call functions during conversations:
import { z } from "zod";
voice.addTools({
weather: {
description: "Get weather information",
parameters: z.object({
location: z.string(),
}),
execute: async ({ location }) => {
const weather = await getWeather(location);
return weather;
},
},
});
voice.on("toolCall", ({ name, args, id }) => {
console.log(`Tool called: ${name} with args:`, args);
});
注意Direct link to 注意
🌐 Notes
- Gemini 实时 API 使用 WebSockets 进行实时通信
- 音频以16kHz PCM16处理作为输入,以24kHz PCM16处理作为输出
- 在使用其他方法之前,语音实例必须先与
connect()连接 - 完成后始终调用
close()以正确清理资源 - Vertex AI 身份验证需要相应的 IAM 权限(
aiplatform.user角色) - 会话恢复可以从网络中断中恢复
- 该 API 支持与文本和音频的实时交互