OpenAI 实时语音
🌐 OpenAI Realtime Voice
OpenAIRealtimeVoice 类提供使用 OpenAI 基于 WebSocket 的 API 的实时语音交互功能。它支持实时语音到语音转换、语音活动检测以及基于事件的音频流传输。
🌐 The OpenAIRealtimeVoice class provides real-time voice interaction capabilities using OpenAI's WebSocket-based API. It supports real time speech to speech, voice activity detection, and event-based audio streaming.
使用示例Direct link to 使用示例
🌐 Usage Example
import { OpenAIRealtimeVoice } from "@mastra/voice-openai-realtime";
import { playAudio, getMicrophoneStream } from "@mastra/node-audio";
// Initialize with default configuration using environment variables
const voice = new OpenAIRealtimeVoice();
// Or initialize with specific configuration
const voiceWithConfig = new OpenAIRealtimeVoice({
apiKey: "your-openai-api-key",
model: "gpt-5.1-realtime-preview-2024-12-17",
speaker: "alloy", // Default voice
});
voiceWithConfig.updateSession({
turn_detection: {
type: "server_vad",
threshold: 0.6,
silence_duration_ms: 1200,
},
});
// Establish connection
await voice.connect();
// Set up event listeners
voice.on("speaker", ({ audio }) => {
// Handle audio data (Int16Array) pcm format by default
playAudio(audio);
});
voice.on("writing", ({ text, role }) => {
// Handle transcribed text
console.log(`${role}: ${text}`);
});
// Convert text to speech
await voice.speak("Hello, how can I help you today?", {
speaker: "echo", // Override default voice
});
// Process audio input
const microphoneStream = getMicrophoneStream();
await voice.send(microphoneStream);
// When done, disconnect
voice.connect();
配置Direct link to 配置
🌐 Configuration
构造函数选项Direct link to 构造函数选项
🌐 Constructor Options
model?:
apiKey?:
speaker?:
语音活动检测(VAD)配置Direct link to 语音活动检测(VAD)配置
🌐 Voice Activity Detection (VAD) Configuration
type?:
threshold?:
prefix_padding_ms?:
silence_duration_ms?:
方法Direct link to 方法
🌐 Methods
connect()Direct link to connect()
建立与 OpenAI 实时服务的连接。必须在使用 speak、listen 或 send 函数之前调用。
🌐 Establishes a connection to the OpenAI realtime service. Must be called before using speak, listen, or send functions.
returns:
speak()Direct link to speak()
使用配置的语音模型发出语音事件。输入可以是字符串或可读流。
🌐 Emits a speaking event using the configured voice model. Can accept either a string or a readable stream as input.
input:
options.speaker?:
返回:Promise<void>
🌐 Returns: Promise<void>
listen()Direct link to listen()
处理用于语音识别的音频输入。接受可读取的音频数据流,并触发带有转录文本的“listening”事件。
🌐 Processes audio input for speech recognition. Takes a readable stream of audio data and emits a 'listening' event with the transcribed text.
audioData:
返回:Promise<void>
🌐 Returns: Promise<void>
send()Direct link to send()
将音频数据实时传输到 OpenAI 服务,以实现持续音频流场景,例如实时麦克风输入。
🌐 Streams audio data in real-time to the OpenAI service for continuous audio streaming scenarios like live microphone input.
audioData:
返回:Promise<void>
🌐 Returns: Promise<void>
updateConfig()Direct link to updateConfig()
更新语音实例的会话配置。可用于修改语音设置、回合检测和其他参数。
🌐 Updates the session configuration for the voice instance. This can be used to modify voice settings, turn detection, and other parameters.
sessionConfig:
返回:void
🌐 Returns: void
addTools()Direct link to addTools()
向语音实例添加一组工具。工具允许模型在对话过程中执行额外的操作。当 OpenAIRealtimeVoice 被添加到代理时,为代理配置的任何工具都将自动可用于语音界面。
🌐 Adds a set of tools to the voice instance. Tools allow the model to perform additional actions during conversations. When OpenAIRealtimeVoice is added to an Agent, any tools configured for the Agent will automatically be available to the voice interface.
tools?:
返回:void
🌐 Returns: void
close()Direct link to close()
断开与 OpenAI 实时会话的连接并清理资源。当你使用完语音实例时应调用此方法。
🌐 Disconnects from the OpenAI realtime session and cleans up resources. Should be called when you're done with the voice instance.
返回:void
🌐 Returns: void
getSpeakers()Direct link to getSpeakers()
返回可用语音播报者的列表。
🌐 Returns a list of available voice speakers.
返回值:Promise<Array<{ voiceId: string; [key: string]: any }>>
on()Direct link to on()
为语音事件注册一个事件监听器。
🌐 Registers an event listener for voice events.
event:
callback:
返回:void
🌐 Returns: void
off()Direct link to off()
移除先前注册的事件监听器。
🌐 Removes a previously registered event listener.
event:
callback:
返回:void
🌐 Returns: void
事件Direct link to 事件
🌐 Events
OpenAIRealtimeVoice 类会触发以下事件:
🌐 The OpenAIRealtimeVoice class emits the following events:
speaking:
writing:
error:
OpenAI 实时事件Direct link to OpenAI 实时事件
🌐 OpenAI Realtime Events
你也可以通过在前面加上 'openAIRealtime:' 来收听 OpenAI Realtime 实用事件 :
🌐 You can also listen to OpenAI Realtime utility events by prefixing with 'openAIRealtime:':
openAIRealtime:conversation.created:
openAIRealtime:conversation.interrupted:
openAIRealtime:conversation.updated:
openAIRealtime:conversation.item.appended:
openAIRealtime:conversation.item.completed:
可用语音Direct link to 可用语音
🌐 Available Voices
以下语音选项可用:
🌐 The following voice options are available:
alloy:中立且平衡ash:清晰而精准ballad:旋律优美且流畅coral:热情友好echo:共鸣而深沉sage:冷静且深思熟虑shimmer:明亮而充满活力verse:多才多艺且富有表现力
注意Direct link to 注意
🌐 Notes
- API 密钥可以通过构造函数选项或
OPENAI_API_KEY环境变量提供 - OpenAI 实时语音 API 使用 WebSockets 进行实时通信
- 服务器端语音活动检测(VAD)为语音检测提供了更高的准确性
- 所有音频数据都以 Int16Array 格式处理
- 在使用其他方法之前,语音实例必须先与
connect()连接 - 完成后始终调用
close()以正确清理资源 - 内存管理由 OpenAI 实时 API 处理