Mastra中的语音到语音功能
🌐 Speech-to-Speech Capabilities in Mastra
介绍Direct link to 介绍
🌐 Introduction
Mastra 中的语音对语音(STS)提供了一个标准化的接口,用于跨多个提供商的实时交互。STS 通过监听实时模型的事件,实现连续的双向音频通信。与单独的 TTS 和 STT 操作不同,STS 保持开放连接,能够在两个方向上持续处理语音。
🌐 Speech-to-Speech (STS) in Mastra provides a standardized interface for real-time interactions across multiple providers. STS enables continuous bidirectional audio communication through listening to events from Realtime models. Unlike separate TTS and STT operations, STS maintains an open connection that processes speech continuously in both directions.
配置Direct link to 配置
🌐 Configuration
apiKey:你的 OpenAI API 密钥。如未设置,则使用OPENAI_API_KEY环境变量。model:用于实时语音互动的模型ID(例如,gpt-5.1-realtime)。speaker:语音合成的默认语音 ID。这允许你指定用于语音输出的语音。
const voice = new OpenAIRealtimeVoice({
apiKey: "your-openai-api-key",
model: "gpt-5.1-realtime",
speaker: "alloy", // Default voice
});
// If using default settings the configuration can be simplified to:
const voice = new OpenAIRealtimeVoice();
使用STSDirect link to 使用STS
🌐 Using STS
import { Agent } from "@mastra/core/agent";
import { OpenAIRealtimeVoice } from "@mastra/voice-openai-realtime";
import { playAudio, getMicrophoneStream } from "@mastra/node-audio";
const agent = new Agent({
id: "agent",
name: "OpenAI Realtime Agent",
instructions: `You are a helpful assistant with real-time voice capabilities.`,
model: "openai/gpt-5.1",
voice: new OpenAIRealtimeVoice(),
});
// Connect to the voice service
await agent.voice.connect();
// Listen for agent audio responses
agent.voice.on("speaker", ({ audio }) => {
playAudio(audio);
});
// Initiate the conversation
await agent.voice.speak("How can I help you today?");
// Send continuous audio from the microphone
const micStream = getMicrophoneStream();
await agent.voice.send(micStream);
有关将语音转语音功能集成到代理中,请参阅 向代理添加语音 文档。
🌐 For integrating Speech-to-Speech capabilities with agents, refer to the Adding Voice to Agents documentation.
谷歌Gemini实时Direct link to 谷歌Gemini实时
🌐 Google Gemini Live (Realtime)
import { Agent } from "@mastra/core/agent";
import { GeminiLiveVoice } from "@mastra/voice-google-gemini-live";
import { playAudio, getMicrophoneStream } from "@mastra/node-audio";
const agent = new Agent({
id: "agent",
name: "Gemini Live Agent",
instructions:
"You are a helpful assistant with real-time voice capabilities.",
// Model used for text generation; voice provider handles realtime audio
model: "openai/gpt-5.1",
voice: new GeminiLiveVoice({
apiKey: process.env.GOOGLE_API_KEY,
model: "gemini-2.0-flash-exp",
speaker: "Puck",
debug: true,
// Vertex AI option:
// vertexAI: true,
// project: 'your-gcp-project',
// location: 'us-central1',
// serviceAccountKeyFile: '/path/to/service-account.json',
}),
});
await agent.voice.connect();
agent.voice.on("speaker", ({ audio }) => {
playAudio(audio);
});
agent.voice.on("writing", ({ role, text }) => {
console.log(`${role}: ${text}`);
});
await agent.voice.speak("How can I help you today?");
const micStream = getMicrophoneStream();
await agent.voice.send(micStream);
注意:
🌐 Note:
- 实时 API 需要
GOOGLE_API_KEY。Vertex AI 需要项目/位置和服务账号凭证。 - 事件:
speaker(音频流)、writing(文本)、turnComplete、usage和error。