声音

🌐 Voice

Mastra 代理可以增强语音功能，使其能够说出响应并听取用户输入。你可以配置代理使用单一语音提供商，或在不同操作中结合多个提供商。

🌐 Mastra agents can be enhanced with voice capabilities, allowing them to speak responses and listen to user input. You can configure an agent to use either a single voice provider or combine multiple providers for different operations.

基本用法
Direct link to 基本用法

🌐 Basic usage

为代理添加语音的最简单方法是使用同一个提供商来进行说话和听取：

🌐 The simplest way to add voice to an agent is to use a single provider for both speaking and listening:

import { createReadStream } from "fs";
import path from "path";
import { Agent } from "@mastra/core/agent";
import { OpenAIVoice } from "@mastra/voice-openai";

// Initialize the voice provider with default settings
const voice = new OpenAIVoice();

// Create an agent with voice capabilities
export const agent = new Agent({
  id: "voice-agent",
  name: "Voice Agent",
  instructions: `You are a helpful assistant with both STT and TTS capabilities.`,
  model: "openai/gpt-5.1",
  voice,
});

// The agent can now use voice for interaction
const audioStream = await agent.voice.speak("Hello, I'm your AI assistant!", {
  filetype: "m4a",
});

playAudio(audioStream!);

try {
  const transcription = await agent.voice.listen(audioStream);
  console.log(transcription);
} catch (error) {
  console.error("Error transcribing audio:", error);
}

处理音频流
Direct link to 处理音频流

🌐 Working with Audio Streams

speak() 和 listen() 方法适用于 Node.js 流。以下是保存和加载音频文件的方法：

🌐 The speak() and listen() methods work with Node.js streams. Here's how to save and load audio files:

保存语音输出
Direct link to 保存语音输出

🌐 Saving Speech Output

speak 方法返回一个可以传输到文件或扬声器的流。

🌐 The speak method returns a stream that you can pipe to a file or speaker.

import { createWriteStream } from "fs";
import path from "path";

// Generate speech and save to file
const audio = await agent.voice.speak("Hello, World!");
const filePath = path.join(process.cwd(), "agent.mp3");
const writer = createWriteStream(filePath);

audio.pipe(writer);

await new Promise<void>((resolve, reject) => {
  writer.on("finish", () => resolve());
  writer.on("error", reject);
});

音频输入转录
Direct link to 音频输入转录

🌐 Transcribing Audio Input

listen 方法期望来自麦克风或文件的音频数据流。

🌐 The listen method expects a stream of audio data from a microphone or file.

import { createReadStream } from "fs";
import path from "path";

// Read audio file and transcribe
const audioFilePath = path.join(process.cwd(), "/agent.m4a");
const audioStream = createReadStream(audioFilePath);

try {
  console.log("Transcribing audio file...");
  const transcription = await agent.voice.listen(audioStream, {
    filetype: "m4a",
  });
  console.log("Transcription:", transcription);
} catch (error) {
  console.error("Error transcribing audio:", error);
}

语音到语音的语音交互
Direct link to 语音到语音的语音交互

🌐 Speech-to-Speech Voice Interactions

为了获得更动态和互动的语音体验，你可以使用支持语音对语音功能的实时语音提供商：

🌐 For more dynamic and interactive voice experiences, you can use real-time voice providers that support speech-to-speech capabilities:

import { Agent } from "@mastra/core/agent";
import { getMicrophoneStream } from "@mastra/node-audio";
import { OpenAIRealtimeVoice } from "@mastra/voice-openai-realtime";
import { search, calculate } from "../tools";

// Initialize the realtime voice provider
const voice = new OpenAIRealtimeVoice({
  apiKey: process.env.OPENAI_API_KEY,
  model: "gpt-5.1-realtime",
  speaker: "alloy",
});

// Create an agent with speech-to-speech voice capabilities
export const agent = new Agent({
  id: "speech-to-speech-agent",
  name: "Speech-to-Speech Agent",
  instructions: `You are a helpful assistant with speech-to-speech capabilities.`,
  model: "openai/gpt-5.1",
  tools: {
    // Tools configured on Agent are passed to voice provider
    search,
    calculate,
  },
  voice,
});

// Establish a WebSocket connection
await agent.voice.connect();

// Start a conversation
agent.voice.speak("Hello, I'm your AI assistant!");

// Stream audio from a microphone
const microphoneStream = getMicrophoneStream();
agent.voice.send(microphoneStream);

// When done with the conversation
agent.voice.close();

事件系统
Direct link to 事件系统

🌐 Event System

实时语音提供商会触发几个你可以监听的事件：

🌐 The realtime voice provider emits several events you can listen for:

// Listen for speech audio data sent from voice provider
agent.voice.on("speaking", ({ audio }) => {
  // audio contains ReadableStream or Int16Array audio data
});

// Listen for transcribed text sent from both voice provider and user
agent.voice.on("writing", ({ text, role }) => {
  console.log(`${role} said: ${text}`);
});

// Listen for errors
agent.voice.on("error", (error) => {
  console.error("Voice error:", error);
});

示例
Direct link to 示例

🌐 Examples

端到端语音交互
Direct link to 端到端语音交互

🌐 End-to-end voice interaction

此示例演示了两个代理之间的语音交互。混合语音代理使用多个提供商发问，并将其保存为音频文件。统一语音代理听取该文件，处理问题，生成回应，并将其语音输出。两个音频输出都保存到 audio 目录中。

🌐 This example demonstrates a voice interaction between two agents. The hybrid voice agent, which uses multiple providers, speaks a question, which is saved as an audio file. The unified voice agent listens to that file, processes the question, generates a response, and speaks it back. Both audio outputs are saved to the audio directory.

已创建以下文件：

🌐 The following files are created:

hybrid-question.mp3 – 混合代理的语音提问。
unified-response.mp3 – 统一代理的口头回应。

src/test-voice-agents.ts
import "dotenv/config";

import path from "path";
import { createReadStream } from "fs";
import { Agent } from "@mastra/core/agent";
import { CompositeVoice } from "@mastra/core/voice";
import { OpenAIVoice } from "@mastra/voice-openai";
import { Mastra } from "@mastra/core";

// Saves an audio stream to a file in the audio directory, creating the directory if it doesn't exist.
export const saveAudioToFile = async (
  audio: NodeJS.ReadableStream,
  filename: string,
): Promise<void> => {
  const audioDir = path.join(process.cwd(), "audio");
  const filePath = path.join(audioDir, filename);

  await fs.promises.mkdir(audioDir, { recursive: true });

  const writer = createWriteStream(filePath);
  audio.pipe(writer);
  return new Promise((resolve, reject) => {
    writer.on("finish", resolve);
    writer.on("error", reject);
  });
};

// Saves an audio stream to a file in the audio directory, creating the directory if it doesn't exist.
export const convertToText = async (
  input: string | NodeJS.ReadableStream,
): Promise<string> => {
  if (typeof input === "string") {
    return input;
  }

  const chunks: Buffer[] = [];
  return new Promise((resolve, reject) => {
    inputData.on("data", (chunk) => chunks.push(Buffer.from(chunk)));
    inputData.on("error", reject);
    inputData.on("end", () => resolve(Buffer.concat(chunks).toString("utf-8")));
  });
};

export const hybridVoiceAgent = new Agent({
  id: "hybrid-voice-agent",
  name: "Hybrid Voice Agent",
  model: "openai/gpt-5.1",
  instructions: "You can speak and listen using different providers.",
  voice: new CompositeVoice({
    input: new OpenAIVoice(),
    output: new OpenAIVoice(),
  }),
});

export const unifiedVoiceAgent = new Agent({
  id: "unified-voice-agent",
  name: "Unified Voice Agent",
  instructions: "You are an agent with both STT and TTS capabilities.",
  model: "openai/gpt-5.1",
  voice: new OpenAIVoice(),
});

export const mastra = new Mastra({
  agents: { hybridVoiceAgent, unifiedVoiceAgent },
});

const hybridVoiceAgent = mastra.getAgent("hybridVoiceAgent");
const unifiedVoiceAgent = mastra.getAgent("unifiedVoiceAgent");

const question = "What is the meaning of life in one sentence?";

const hybridSpoken = await hybridVoiceAgent.voice.speak(question);

await saveAudioToFile(hybridSpoken!, "hybrid-question.mp3");

const audioStream = createReadStream(
  path.join(process.cwd(), "audio", "hybrid-question.mp3"),
);
const unifiedHeard = await unifiedVoiceAgent.voice.listen(audioStream);

const inputText = await convertToText(unifiedHeard!);

const unifiedResponse = await unifiedVoiceAgent.generate(inputText);
const unifiedSpoken = await unifiedVoiceAgent.voice.speak(unifiedResponse.text);

await saveAudioToFile(unifiedSpoken!, "unified-response.mp3");

使用多个供应商
Direct link to 使用多个供应商

🌐 Using Multiple Providers

为了获得更多灵活性，你可以使用 CompositeVoice 类为说话和听力使用不同的提供商：

🌐 For more flexibility, you can use different providers for speaking and listening using the CompositeVoice class:

import { Agent } from "@mastra/core/agent";
import { CompositeVoice } from "@mastra/core/voice";
import { OpenAIVoice } from "@mastra/voice-openai";
import { PlayAIVoice } from "@mastra/voice-playai";

export const agent = new Agent({
  id: "voice-agent",
  name: "Voice Agent",
  instructions: `You are a helpful assistant with both STT and TTS capabilities.`,
  model: "openai/gpt-5.1",

  // Create a composite voice using OpenAI for listening and PlayAI for speaking
  voice: new CompositeVoice({
    input: new OpenAIVoice(),
    output: new PlayAIVoice(),
  }),
});

使用 AI SDK
Direct link to 使用 AI SDK

🌐 Using AI SDK

Mastra 支持在 CompositeVoice 中直接使用 AI SDK 的转录和语音模型，让你可以通过 AI SDK 生态系统访问各种提供商：

🌐 Mastra supports using AI SDK's transcription and speech models directly in CompositeVoice, giving you access to a wide range of providers through the AI SDK ecosystem:

import { Agent } from "@mastra/core/agent";
import { CompositeVoice } from "@mastra/core/voice";
import { openai } from "@ai-sdk/openai";
import { elevenlabs } from "@ai-sdk/elevenlabs";
import { groq } from "@ai-sdk/groq";

export const agent = new Agent({
  id: "aisdk-voice-agent",
  name: "AI SDK Voice Agent",
  instructions: `You are a helpful assistant with voice capabilities.`,
  model: "openai/gpt-5.1",

  // Pass AI SDK models directly to CompositeVoice
  voice: new CompositeVoice({
    input: openai.transcription('whisper-1'),      // AI SDK transcription model
    output: elevenlabs.speech('eleven_turbo_v2'),  // AI SDK speech model
  }),
});

// Use voice capabilities as usual
const audioStream = await agent.voice.speak("Hello!");
const transcribedText = await agent.voice.listen(audioStream);

混合搭配供应商
Direct link to 混合搭配供应商

🌐 Mix and Match Providers

你可以将 AI SDK 模型与 Mastra 语音提供商混合使用：

🌐 You can mix AI SDK models with Mastra voice providers:

import { CompositeVoice } from "@mastra/core/voice";
import { PlayAIVoice } from "@mastra/voice-playai";
import { openai } from "@ai-sdk/openai";

// Use AI SDK for transcription and Mastra provider for speech
const voice = new CompositeVoice({
  input: openai.transcription('whisper-1'),  // AI SDK
  output: new PlayAIVoice(),                  // Mastra provider
});

有关支持的 AI SDK 提供商及其功能的完整列表：

🌐 For the complete list of supported AI SDK providers and their capabilities:

支持的语音提供商
Direct link to 支持的语音提供商

🌐 Supported Voice Providers

Mastra 支持多种语音提供商，用于文本转语音（TTS）和语音转文本（STT）功能：

🌐 Mastra supports multiple voice providers for text-to-speech (TTS) and speech-to-text (STT) capabilities:

Provider	Package	Features	Reference
OpenAI	`@mastra/voice-openai`	TTS, STT	Documentation
OpenAI Realtime	`@mastra/voice-openai-realtime`	Realtime speech-to-speech	Documentation
ElevenLabs	`@mastra/voice-elevenlabs`	High-quality TTS	Documentation
PlayAI	`@mastra/voice-playai`	TTS	Documentation
Google	`@mastra/voice-google`	TTS, STT	Documentation
Deepgram	`@mastra/voice-deepgram`	STT	Documentation
Murf	`@mastra/voice-murf`	TTS	Documentation
Speechify	`@mastra/voice-speechify`	TTS	Documentation
Sarvam	`@mastra/voice-sarvam`	TTS, STT	Documentation
Azure	`@mastra/voice-azure`	TTS, STT	Documentation
Cloudflare	`@mastra/voice-cloudflare`	TTS	Documentation

下一步
Direct link to 下一步

🌐 Next Steps

语音 API 参考 - 语音功能的详细 API 文档
语音合成示例 - 互动故事生成器及其他 TTS 实现
语音转文字示例 - 语音备忘录应用及其他语音转文字实现
语音到语音示例 - 带通话分析的实时语音对话

基本用法Direct link to 基本用法

处理音频流Direct link to 处理音频流

保存语音输出Direct link to 保存语音输出

音频输入转录Direct link to 音频输入转录

语音到语音的语音交互Direct link to 语音到语音的语音交互

事件系统Direct link to 事件系统

示例Direct link to 示例

端到端语音交互Direct link to 端到端语音交互

使用多个供应商Direct link to 使用多个供应商

使用 AI SDKDirect link to 使用 AI SDK

混合搭配供应商Direct link to 混合搭配供应商

支持的语音提供商Direct link to 支持的语音提供商

下一步Direct link to 下一步