OpenAI 实时语音

🌐 OpenAI Realtime Voice

OpenAIRealtimeVoice 类提供使用 OpenAI 基于 WebSocket 的 API 的实时语音交互功能。它支持实时语音到语音转换、语音活动检测以及基于事件的音频流传输。

🌐 The OpenAIRealtimeVoice class provides real-time voice interaction capabilities using OpenAI's WebSocket-based API. It supports real time speech to speech, voice activity detection, and event-based audio streaming.

使用示例
Direct link to 使用示例

🌐 Usage Example

import { OpenAIRealtimeVoice } from "@mastra/voice-openai-realtime";
import { playAudio, getMicrophoneStream } from "@mastra/node-audio";

// Initialize with default configuration using environment variables
const voice = new OpenAIRealtimeVoice();

// Or initialize with specific configuration
const voiceWithConfig = new OpenAIRealtimeVoice({
  apiKey: "your-openai-api-key",
  model: "gpt-5.1-realtime-preview-2024-12-17",
  speaker: "alloy", // Default voice
});

voiceWithConfig.updateSession({
  turn_detection: {
    type: "server_vad",
    threshold: 0.6,
    silence_duration_ms: 1200,
  },
});

// Establish connection
await voice.connect();

// Set up event listeners
voice.on("speaker", ({ audio }) => {
  // Handle audio data (Int16Array) pcm format by default
  playAudio(audio);
});

voice.on("writing", ({ text, role }) => {
  // Handle transcribed text
  console.log(`${role}: ${text}`);
});

// Convert text to speech
await voice.speak("Hello, how can I help you today?", {
  speaker: "echo", // Override default voice
});

// Process audio input
const microphoneStream = getMicrophoneStream();
await voice.send(microphoneStream);

// When done, disconnect
voice.connect();

配置
Direct link to 配置

🌐 Configuration

构造函数选项
Direct link to 构造函数选项

🌐 Constructor Options

model?:

string

= 'gpt-5.1-realtime-preview-2024-12-17'

The model ID to use for real-time voice interactions.

apiKey?:

string

OpenAI API key. Falls back to OPENAI_API_KEY environment variable.

speaker?:

string

= 'alloy'

Default voice ID for speech synthesis.

语音活动检测（VAD）配置
Direct link to 语音活动检测（VAD）配置

🌐 Voice Activity Detection (VAD) Configuration

type?:

string

= 'server_vad'

Type of VAD to use. Server-side VAD provides better accuracy.

threshold?:

number

= 0.5

Speech detection sensitivity (0.0-1.0).

prefix_padding_ms?:

number

= 1000

Milliseconds of audio to include before speech is detected.

silence_duration_ms?:

number

= 1000

Milliseconds of silence before ending a turn.

方法
Direct link to 方法

🌐 Methods

connect()
Direct link to connect()

建立与 OpenAI 实时服务的连接。必须在使用 speak、listen 或 send 函数之前调用。

🌐 Establishes a connection to the OpenAI realtime service. Must be called before using speak, listen, or send functions.

returns:

Promise<void>

Promise that resolves when the connection is established.

speak()
Direct link to speak()

使用配置的语音模型发出语音事件。输入可以是字符串或可读流。

🌐 Emits a speaking event using the configured voice model. Can accept either a string or a readable stream as input.

input:

string | NodeJS.ReadableStream

Text or text stream to convert to speech.

options.speaker?:

string

= Constructor's speaker value

Voice ID to use for this specific speech request.

返回：Promise<void>

🌐 Returns: Promise<void>

listen()
Direct link to listen()

处理用于语音识别的音频输入。接受可读取的音频数据流，并触发带有转录文本的“listening”事件。

🌐 Processes audio input for speech recognition. Takes a readable stream of audio data and emits a 'listening' event with the transcribed text.

audioData:

NodeJS.ReadableStream

Audio stream to transcribe.

返回：Promise<void>

🌐 Returns: Promise<void>

send()
Direct link to send()

将音频数据实时传输到 OpenAI 服务，以实现持续音频流场景，例如实时麦克风输入。

🌐 Streams audio data in real-time to the OpenAI service for continuous audio streaming scenarios like live microphone input.

audioData:

NodeJS.ReadableStream

Audio stream to send to the service.

返回：Promise<void>

🌐 Returns: Promise<void>

updateConfig()
Direct link to updateConfig()

更新语音实例的会话配置。可用于修改语音设置、回合检测和其他参数。

🌐 Updates the session configuration for the voice instance. This can be used to modify voice settings, turn detection, and other parameters.

sessionConfig:

Realtime.SessionConfig

New session configuration to apply.

返回：void

🌐 Returns: void

addTools()
Direct link to addTools()

向语音实例添加一组工具。工具允许模型在对话过程中执行额外的操作。当 OpenAIRealtimeVoice 被添加到代理时，为代理配置的任何工具都将自动可用于语音界面。

🌐 Adds a set of tools to the voice instance. Tools allow the model to perform additional actions during conversations. When OpenAIRealtimeVoice is added to an Agent, any tools configured for the Agent will automatically be available to the voice interface.

tools?:

ToolsInput

Tools configuration to equip.

返回：void

🌐 Returns: void

close()
Direct link to close()

断开与 OpenAI 实时会话的连接并清理资源。当你使用完语音实例时应调用此方法。

🌐 Disconnects from the OpenAI realtime session and cleans up resources. Should be called when you're done with the voice instance.

返回：void

🌐 Returns: void

getSpeakers()
Direct link to getSpeakers()

返回可用语音播报者的列表。

🌐 Returns a list of available voice speakers.

返回值：Promise<Array<{ voiceId: string; [key: string]: any }>>

on()
Direct link to on()

为语音事件注册一个事件监听器。

🌐 Registers an event listener for voice events.

event:

string

Name of the event to listen for.

callback:

Function

Function to call when the event occurs.

返回：void

🌐 Returns: void

off()
Direct link to off()

移除先前注册的事件监听器。

🌐 Removes a previously registered event listener.

event:

string

Name of the event to stop listening to.

callback:

Function

The specific callback function to remove.

返回：void

🌐 Returns: void

事件
Direct link to 事件

🌐 Events

OpenAIRealtimeVoice 类会触发以下事件：

🌐 The OpenAIRealtimeVoice class emits the following events:

speaking:

event

Emitted when audio data is received from the model. Callback receives { audio: Int16Array }.

writing:

event

Emitted when transcribed text is available. Callback receives { text: string, role: string }.

error:

event

Emitted when an error occurs. Callback receives the error object.

OpenAI 实时事件
Direct link to OpenAI 实时事件

🌐 OpenAI Realtime Events

你也可以通过在前面加上 'openAIRealtime:' 来收听 OpenAI Realtime 实用事件：

🌐 You can also listen to OpenAI Realtime utility events by prefixing with 'openAIRealtime:':

openAIRealtime:conversation.created:

event

Emitted when a new conversation is created.

openAIRealtime:conversation.interrupted:

event

Emitted when a conversation is interrupted.

openAIRealtime:conversation.updated:

event

Emitted when a conversation is updated.

openAIRealtime:conversation.item.appended:

event

Emitted when an item is appended to the conversation.

openAIRealtime:conversation.item.completed:

event

Emitted when an item in the conversation is completed.

可用语音
Direct link to 可用语音

🌐 Available Voices

以下语音选项可用：

🌐 The following voice options are available:

alloy：中立且平衡
ash：清晰而精准
ballad：旋律优美且流畅
coral：热情友好
echo：共鸣而深沉
sage：冷静且深思熟虑
shimmer：明亮而充满活力
verse：多才多艺且富有表现力

注意
Direct link to 注意

🌐 Notes

API 密钥可以通过构造函数选项或 OPENAI_API_KEY 环境变量提供
OpenAI 实时语音 API 使用 WebSockets 进行实时通信
服务器端语音活动检测（VAD）为语音检测提供了更高的准确性
所有音频数据都以 Int16Array 格式处理
在使用其他方法之前，语音实例必须先与 connect() 连接
完成后始终调用 close() 以正确清理资源
内存管理由 OpenAI 实时 API 处理

使用示例Direct link to 使用示例

配置Direct link to 配置

构造函数选项Direct link to 构造函数选项

model?:

apiKey?:

speaker?:

语音活动检测（VAD）配置Direct link to 语音活动检测（VAD）配置

type?:

threshold?:

prefix_padding_ms?:

silence_duration_ms?:

方法Direct link to 方法

connect()Direct link to connect()

returns:

speak()Direct link to speak()

input:

options.speaker?:

listen()Direct link to listen()

audioData:

send()Direct link to send()

audioData:

updateConfig()Direct link to updateConfig()

sessionConfig:

addTools()Direct link to addTools()

tools?:

close()Direct link to close()

getSpeakers()Direct link to getSpeakers()

on()Direct link to on()

event:

callback:

off()Direct link to off()

event:

callback:

事件Direct link to 事件

speaking:

writing:

error:

OpenAI 实时事件Direct link to OpenAI 实时事件

openAIRealtime:conversation.created:

openAIRealtime:conversation.interrupted:

openAIRealtime:conversation.updated:

openAIRealtime:conversation.item.appended:

openAIRealtime:conversation.item.completed:

可用语音Direct link to 可用语音

注意Direct link to 注意

使用示例
Direct link to 使用示例

配置
Direct link to 配置

构造函数选项
Direct link to 构造函数选项

语音活动检测（VAD）配置
Direct link to 语音活动检测（VAD）配置

方法
Direct link to 方法

connect()
Direct link to connect()

speak()
Direct link to speak()

listen()
Direct link to listen()

send()
Direct link to send()

updateConfig()
Direct link to updateConfig()

addTools()
Direct link to addTools()

close()
Direct link to close()

getSpeakers()
Direct link to getSpeakers()

on()
Direct link to on()

off()
Direct link to off()

事件
Direct link to 事件

OpenAI 实时事件
Direct link to OpenAI 实时事件

可用语音
Direct link to 可用语音

注意
Direct link to 注意