Skip to main content

Mastra中的语音到语音功能

🌐 Speech-to-Speech Capabilities in Mastra

介绍
Direct link to 介绍

🌐 Introduction

Mastra 中的语音对语音(STS)提供了一个标准化的接口,用于跨多个提供商的实时交互。STS 通过监听实时模型的事件,实现连续的双向音频通信。与单独的 TTS 和 STT 操作不同,STS 保持开放连接,能够在两个方向上持续处理语音。

🌐 Speech-to-Speech (STS) in Mastra provides a standardized interface for real-time interactions across multiple providers. STS enables continuous bidirectional audio communication through listening to events from Realtime models. Unlike separate TTS and STT operations, STS maintains an open connection that processes speech continuously in both directions.

配置
Direct link to 配置

🌐 Configuration

  • apiKey:你的 OpenAI API 密钥。如未设置,则使用 OPENAI_API_KEY 环境变量。
  • model:用于实时语音互动的模型ID(例如,gpt-5.1-realtime)。
  • speaker:语音合成的默认语音 ID。这允许你指定用于语音输出的语音。
const voice = new OpenAIRealtimeVoice({
apiKey: "your-openai-api-key",
model: "gpt-5.1-realtime",
speaker: "alloy", // Default voice
});

// If using default settings the configuration can be simplified to:
const voice = new OpenAIRealtimeVoice();

使用STS
Direct link to 使用STS

🌐 Using STS

import { Agent } from "@mastra/core/agent";
import { OpenAIRealtimeVoice } from "@mastra/voice-openai-realtime";
import { playAudio, getMicrophoneStream } from "@mastra/node-audio";

const agent = new Agent({
id: "agent",
name: "OpenAI Realtime Agent",
instructions: `You are a helpful assistant with real-time voice capabilities.`,
model: "openai/gpt-5.1",
voice: new OpenAIRealtimeVoice(),
});

// Connect to the voice service
await agent.voice.connect();

// Listen for agent audio responses
agent.voice.on("speaker", ({ audio }) => {
playAudio(audio);
});

// Initiate the conversation
await agent.voice.speak("How can I help you today?");

// Send continuous audio from the microphone
const micStream = getMicrophoneStream();
await agent.voice.send(micStream);

有关将语音转语音功能集成到代理中,请参阅 向代理添加语音 文档。

🌐 For integrating Speech-to-Speech capabilities with agents, refer to the Adding Voice to Agents documentation.

谷歌Gemini实时
Direct link to 谷歌Gemini实时

🌐 Google Gemini Live (Realtime)

import { Agent } from "@mastra/core/agent";
import { GeminiLiveVoice } from "@mastra/voice-google-gemini-live";
import { playAudio, getMicrophoneStream } from "@mastra/node-audio";

const agent = new Agent({
id: "agent",
name: "Gemini Live Agent",
instructions:
"You are a helpful assistant with real-time voice capabilities.",
// Model used for text generation; voice provider handles realtime audio
model: "openai/gpt-5.1",
voice: new GeminiLiveVoice({
apiKey: process.env.GOOGLE_API_KEY,
model: "gemini-2.0-flash-exp",
speaker: "Puck",
debug: true,
// Vertex AI option:
// vertexAI: true,
// project: 'your-gcp-project',
// location: 'us-central1',
// serviceAccountKeyFile: '/path/to/service-account.json',
}),
});

await agent.voice.connect();

agent.voice.on("speaker", ({ audio }) => {
playAudio(audio);
});

agent.voice.on("writing", ({ role, text }) => {
console.log(`${role}: ${text}`);
});

await agent.voice.speak("How can I help you today?");

const micStream = getMicrophoneStream();
await agent.voice.send(micStream);

注意:

🌐 Note:

  • 实时 API 需要 GOOGLE_API_KEY。Vertex AI 需要项目/位置和服务账号凭证。
  • 事件:speaker(音频流)、writing(文本)、turnCompleteusageerror