Mastra中的声音
🌐 Voice in Mastra
Mastra 的语音系统提供了一个统一的语音交互接口,使你的应用能够实现文本转语音(TTS)、语音转文本(STT)以及实时语音对话(STS)功能。
🌐 Mastra's Voice system provides a unified interface for voice interactions, enabling text-to-speech (TTS), speech-to-text (STT), and real-time speech-to-speech (STS) capabilities in your applications.
为代理添加语音Direct link to 为代理添加语音
🌐 Adding Voice to Agents
要了解如何将语音功能集成到你的代理中,请查看 向代理添加语音 文档。本部分涵盖了如何使用单个和多个语音提供商,以及实时交互。
🌐 To learn how to integrate voice capabilities into your agents, check out the Adding Voice to Agents documentation. This section covers how to use both single and multiple voice providers, as well as real-time interactions.
import { Agent } from "@mastra/core/agent";
import { OpenAIVoice } from "@mastra/voice-openai";
// Initialize OpenAI voice for TTS
const voiceAgent = new Agent({
id: "voice-agent",
name: "Voice Agent",
instructions:
"You are a voice assistant that can help users with their tasks.",
model: "openai/gpt-5.1",
voice: new OpenAIVoice(),
});
然后你可以使用以下语音功能:
🌐 You can then use the following voice capabilities:
文本转语音 (TTS)Direct link to 文本转语音 (TTS)
🌐 Text to Speech (TTS)
使用 Mastra 的 TTS 功能,将你的代理响应转化为自然流畅的语音。可从多种提供商中选择,如 OpenAI、ElevenLabs 等。
🌐 Turn your agent's responses into natural-sounding speech using Mastra's TTS capabilities. Choose from multiple providers like OpenAI, ElevenLabs, and more.
有关详细的配置选项和高级功能,请查看我们的文本转语音指南。
🌐 For detailed configuration options and advanced features, check out our Text-to-Speech guide.
- OpenAI
- Azure
- ElevenLabs
- PlayAI
- Cloudflare
- Deepgram
- Speechify
- Sarvam
- Murf
import { Agent } from "@mastra/core/agent";
import { OpenAIVoice } from "@mastra/voice-openai";
import { playAudio } from "@mastra/node-audio";
const voiceAgent = new Agent({
id: "voice-agent",
name: "Voice Agent",
instructions:
"You are a voice assistant that can help users with their tasks.",
model: "openai/gpt-5.1",
voice: new OpenAIVoice(),
});
const { text } = await voiceAgent.generate("What color is the sky?");
// Convert text to speech to an Audio Stream
const audioStream = await voiceAgent.voice.speak(text, {
speaker: "default", // Optional: specify a speaker
responseFormat: "wav", // Optional: specify a response format
});
playAudio(audioStream);
访问 OpenAI 语音参考 以获取有关 OpenAI 语音提供商的更多信息。
🌐 Visit the OpenAI Voice Reference for more information on the OpenAI voice provider.
import { Agent } from "@mastra/core/agent";
import { AzureVoice } from "@mastra/voice-azure";
import { playAudio } from "@mastra/node-audio";
const voiceAgent = new Agent({
id: "voice-agent",
name: "Voice Agent",
instructions:
"You are a voice assistant that can help users with their tasks.",
model: "openai/gpt-5.1",
voice: new AzureVoice(),
});
const { text } = await voiceAgent.generate("What color is the sky?");
// Convert text to speech to an Audio Stream
const audioStream = await voiceAgent.voice.speak(text, {
speaker: "en-US-JennyNeural", // Optional: specify a speaker
});
playAudio(audioStream);
请访问 Azure 语音参考 了解有关 Azure 语音提供程序的更多信息。
🌐 Visit the Azure Voice Reference for more information on the Azure voice provider.
import { Agent } from "@mastra/core/agent";
import { ElevenLabsVoice } from "@mastra/voice-elevenlabs";
import { playAudio } from "@mastra/node-audio";
const voiceAgent = new Agent({
id: "voice-agent",
name: "Voice Agent",
instructions:
"You are a voice assistant that can help users with their tasks.",
model: "openai/gpt-5.1",
voice: new ElevenLabsVoice(),
});
const { text } = await voiceAgent.generate("What color is the sky?");
// Convert text to speech to an Audio Stream
const audioStream = await voiceAgent.voice.speak(text, {
speaker: "default", // Optional: specify a speaker
});
playAudio(audioStream);
访问 ElevenLabs 语音参考 以获取有关 ElevenLabs 语音提供商的更多信息。
🌐 Visit the ElevenLabs Voice Reference for more information on the ElevenLabs voice provider.
import { Agent } from "@mastra/core/agent";
import { PlayAIVoice } from "@mastra/voice-playai";
import { playAudio } from "@mastra/node-audio";
const voiceAgent = new Agent({
id: "voice-agent",
name: "Voice Agent",
instructions:
"You are a voice assistant that can help users with their tasks.",
model: "openai/gpt-5.1",
voice: new PlayAIVoice(),
});
const { text } = await voiceAgent.generate("What color is the sky?");
// Convert text to speech to an Audio Stream
const audioStream = await voiceAgent.voice.speak(text, {
speaker: "default", // Optional: specify a speaker
});
playAudio(audioStream);
访问 PlayAI 语音参考 以获取有关 PlayAI 语音提供商的更多信息。
🌐 Visit the PlayAI Voice Reference for more information on the PlayAI voice provider.
import { Agent } from "@mastra/core/agent";
import { GoogleVoice } from "@mastra/voice-google";
import { playAudio } from "@mastra/node-audio";
const voiceAgent = new Agent({
id: "voice-agent",
name: "Voice Agent",
instructions:
"You are a voice assistant that can help users with their tasks.",
model: "openai/gpt-5.1",
voice: new GoogleVoice(),
});
const { text } = await voiceAgent.generate("What color is the sky?");
// Convert text to speech to an Audio Stream
const audioStream = await voiceAgent.voice.speak(text, {
speaker: "en-US-Studio-O", // Optional: specify a speaker
});
playAudio(audioStream);
访问 Google 语音参考 以获取有关 Google 语音提供商的更多信息。
🌐 Visit the Google Voice Reference for more information on the Google voice provider.
import { Agent } from "@mastra/core/agent";
import { CloudflareVoice } from "@mastra/voice-cloudflare";
import { playAudio } from "@mastra/node-audio";
const voiceAgent = new Agent({
id: "voice-agent",
name: "Voice Agent",
instructions:
"You are a voice assistant that can help users with their tasks.",
model: "openai/gpt-5.1",
voice: new CloudflareVoice(),
});
const { text } = await voiceAgent.generate("What color is the sky?");
// Convert text to speech to an Audio Stream
const audioStream = await voiceAgent.voice.speak(text, {
speaker: "default", // Optional: specify a speaker
});
playAudio(audioStream);
访问 Cloudflare 语音参考 以获取有关 Cloudflare 语音提供商的更多信息。
🌐 Visit the Cloudflare Voice Reference for more information on the Cloudflare voice provider.
import { Agent } from "@mastra/core/agent";
import { DeepgramVoice } from "@mastra/voice-deepgram";
import { playAudio } from "@mastra/node-audio";
const voiceAgent = new Agent({
id: "voice-agent",
name: "Voice Agent",
instructions:
"You are a voice assistant that can help users with their tasks.",
model: "openai/gpt-5.1",
voice: new DeepgramVoice(),
});
const { text } = await voiceAgent.generate("What color is the sky?");
// Convert text to speech to an Audio Stream
const audioStream = await voiceAgent.voice.speak(text, {
speaker: "aura-english-us", // Optional: specify a speaker
});
playAudio(audioStream);
访问 Deepgram 语音参考 以获取有关 Deepgram 语音提供商的更多信息。
🌐 Visit the Deepgram Voice Reference for more information on the Deepgram voice provider.
import { Agent } from "@mastra/core/agent";
import { SpeechifyVoice } from "@mastra/voice-speechify";
import { playAudio } from "@mastra/node-audio";
const voiceAgent = new Agent({
id: "voice-agent",
name: "Voice Agent",
instructions:
"You are a voice assistant that can help users with their tasks.",
model: "openai/gpt-5.1",
voice: new SpeechifyVoice(),
});
const { text } = await voiceAgent.generate("What color is the sky?");
// Convert text to speech to an Audio Stream
const audioStream = await voiceAgent.voice.speak(text, {
speaker: "matthew", // Optional: specify a speaker
});
playAudio(audioStream);
访问 Speechify 语音参考 以获取有关 Speechify 语音提供商的更多信息。
🌐 Visit the Speechify Voice Reference for more information on the Speechify voice provider.
import { Agent } from "@mastra/core/agent";
import { SarvamVoice } from "@mastra/voice-sarvam";
import { playAudio } from "@mastra/node-audio";
const voiceAgent = new Agent({
id: "voice-agent",
name: "Voice Agent",
instructions:
"You are a voice assistant that can help users with their tasks.",
model: "openai/gpt-5.1",
voice: new SarvamVoice(),
});
const { text } = await voiceAgent.generate("What color is the sky?");
// Convert text to speech to an Audio Stream
const audioStream = await voiceAgent.voice.speak(text, {
speaker: "default", // Optional: specify a speaker
});
playAudio(audioStream);
访问 Sarvam 语音参考 以获取有关 Sarvam 语音提供商的更多信息。
🌐 Visit the Sarvam Voice Reference for more information on the Sarvam voice provider.
import { Agent } from "@mastra/core/agent";
import { MurfVoice } from "@mastra/voice-murf";
import { playAudio } from "@mastra/node-audio";
const voiceAgent = new Agent({
id: "voice-agent",
name: "Voice Agent",
instructions:
"You are a voice assistant that can help users with their tasks.",
model: "openai/gpt-5.1",
voice: new MurfVoice(),
});
const { text } = await voiceAgent.generate("What color is the sky?");
// Convert text to speech to an Audio Stream
const audioStream = await voiceAgent.voice.speak(text, {
speaker: "default", // Optional: specify a speaker
});
playAudio(audioStream);
访问 Murf 语音参考 以获取有关 Murf 语音提供商的更多信息。
🌐 Visit the Murf Voice Reference for more information on the Murf voice provider.
语音转文字 (STT)Direct link to 语音转文字 (STT)
🌐 Speech to Text (STT)
使用 OpenAI、ElevenLabs 等多种服务提供商转录语音内容。有关详细的配置选项和更多信息,请查看 语音转文本。
🌐 Transcribe spoken content using various providers like OpenAI, ElevenLabs, and more. For detailed configuration options and more, check out Speech to Text.
你可以从这里下载一个音频示例文件。
🌐 You can download a sample audio file from here.
- OpenAI
- Azure
- ElevenLabs
- Cloudflare
- Deepgram
- Sarvam
import { Agent } from "@mastra/core/agent";
import { OpenAIVoice } from "@mastra/voice-openai";
import { createReadStream } from "fs";
const voiceAgent = new Agent({
id: "voice-agent",
name: "Voice Agent",
instructions:
"You are a voice assistant that can help users with their tasks.",
model: "openai/gpt-5.1",
voice: new OpenAIVoice(),
});
// Use an audio file from a URL
const audioStream = await createReadStream("./how_can_i_help_you.mp3");
// Convert audio to text
const transcript = await voiceAgent.voice.listen(audioStream);
console.log(`User said: ${transcript}`);
// Generate a response based on the transcript
const { text } = await voiceAgent.generate(transcript);
访问 OpenAI 语音参考 以获取有关 OpenAI 语音提供商的更多信息。
🌐 Visit the OpenAI Voice Reference for more information on the OpenAI voice provider.
import { createReadStream } from "fs";
import { Agent } from "@mastra/core/agent";
import { AzureVoice } from "@mastra/voice-azure";
import { createReadStream } from "fs";
const voiceAgent = new Agent({
id: "voice-agent",
name: "Voice Agent",
instructions:
"You are a voice assistant that can help users with their tasks.",
model: "openai/gpt-5.1",
voice: new AzureVoice(),
});
// Use an audio file from a URL
const audioStream = await createReadStream("./how_can_i_help_you.mp3");
// Convert audio to text
const transcript = await voiceAgent.voice.listen(audioStream);
console.log(`User said: ${transcript}`);
// Generate a response based on the transcript
const { text } = await voiceAgent.generate(transcript);
请访问 Azure 语音参考 了解有关 Azure 语音提供程序的更多信息。
🌐 Visit the Azure Voice Reference for more information on the Azure voice provider.
import { Agent } from "@mastra/core/agent";
import { ElevenLabsVoice } from "@mastra/voice-elevenlabs";
import { createReadStream } from "fs";
const voiceAgent = new Agent({
id: "voice-agent",
name: "Voice Agent",
instructions:
"You are a voice assistant that can help users with their tasks.",
model: "openai/gpt-5.1",
voice: new ElevenLabsVoice(),
});
// Use an audio file from a URL
const audioStream = await createReadStream("./how_can_i_help_you.mp3");
// Convert audio to text
const transcript = await voiceAgent.voice.listen(audioStream);
console.log(`User said: ${transcript}`);
// Generate a response based on the transcript
const { text } = await voiceAgent.generate(transcript);
访问 ElevenLabs 语音参考 以获取有关 ElevenLabs 语音提供商的更多信息。
🌐 Visit the ElevenLabs Voice Reference for more information on the ElevenLabs voice provider.
import { Agent } from "@mastra/core/agent";
import { GoogleVoice } from "@mastra/voice-google";
import { createReadStream } from "fs";
const voiceAgent = new Agent({
id: "voice-agent",
name: "Voice Agent",
instructions:
"You are a voice assistant that can help users with their tasks.",
model: "openai/gpt-5.1",
voice: new GoogleVoice(),
});
// Use an audio file from a URL
const audioStream = await createReadStream("./how_can_i_help_you.mp3");
// Convert audio to text
const transcript = await voiceAgent.voice.listen(audioStream);
console.log(`User said: ${transcript}`);
// Generate a response based on the transcript
const { text } = await voiceAgent.generate(transcript);
访问 Google 语音参考 以获取有关 Google 语音提供商的更多信息。
🌐 Visit the Google Voice Reference for more information on the Google voice provider.
import { Agent } from "@mastra/core/agent";
import { CloudflareVoice } from "@mastra/voice-cloudflare";
import { createReadStream } from "fs";
const voiceAgent = new Agent({
id: "voice-agent",
name: "Voice Agent",
instructions:
"You are a voice assistant that can help users with their tasks.",
model: "openai/gpt-5.1",
voice: new CloudflareVoice(),
});
// Use an audio file from a URL
const audioStream = await createReadStream("./how_can_i_help_you.mp3");
// Convert audio to text
const transcript = await voiceAgent.voice.listen(audioStream);
console.log(`User said: ${transcript}`);
// Generate a response based on the transcript
const { text } = await voiceAgent.generate(transcript);
访问 Cloudflare 语音参考 以获取有关 Cloudflare 语音提供商的更多信息。
🌐 Visit the Cloudflare Voice Reference for more information on the Cloudflare voice provider.
import { Agent } from "@mastra/core/agent";
import { DeepgramVoice } from "@mastra/voice-deepgram";
import { createReadStream } from "fs";
const voiceAgent = new Agent({
id: "voice-agent",
name: "Voice Agent",
instructions:
"You are a voice assistant that can help users with their tasks.",
model: "openai/gpt-5.1",
voice: new DeepgramVoice(),
});
// Use an audio file from a URL
const audioStream = await createReadStream("./how_can_i_help_you.mp3");
// Convert audio to text
const transcript = await voiceAgent.voice.listen(audioStream);
console.log(`User said: ${transcript}`);
// Generate a response based on the transcript
const { text } = await voiceAgent.generate(transcript);
访问 Deepgram 语音参考 以获取有关 Deepgram 语音提供商的更多信息。
🌐 Visit the Deepgram Voice Reference for more information on the Deepgram voice provider.
import { Agent } from "@mastra/core/agent";
import { SarvamVoice } from "@mastra/voice-sarvam";
import { createReadStream } from "fs";
const voiceAgent = new Agent({
id: "voice-agent",
name: "Voice Agent",
instructions:
"You are a voice assistant that can help users with their tasks.",
model: "openai/gpt-5.1",
voice: new SarvamVoice(),
});
// Use an audio file from a URL
const audioStream = await createReadStream("./how_can_i_help_you.mp3");
// Convert audio to text
const transcript = await voiceAgent.voice.listen(audioStream);
console.log(`User said: ${transcript}`);
// Generate a response based on the transcript
const { text } = await voiceAgent.generate(transcript);
访问 Sarvam 语音参考 以获取有关 Sarvam 语音提供商的更多信息。
🌐 Visit the Sarvam Voice Reference for more information on the Sarvam voice provider.
语音到语音(STS)Direct link to 语音到语音(STS)
🌐 Speech to Speech (STS)
创建具备语音到语音功能的对话体验。统一的 API 支持用户与 AI 代理之间的实时语音互动。有关详细的配置选项和高级功能,请参考 语音到语音。
🌐 Create conversational experiences with speech-to-speech capabilities. The unified API enables real-time voice interactions between users and AI agents. For detailed configuration options and advanced features, check out Speech to Speech.
- OpenAI
import { Agent } from "@mastra/core/agent";
import { playAudio, getMicrophoneStream } from "@mastra/node-audio";
import { OpenAIRealtimeVoice } from "@mastra/voice-openai-realtime";
const voiceAgent = new Agent({
id: "voice-agent",
name: "Voice Agent",
instructions:
"You are a voice assistant that can help users with their tasks.",
model: "openai/gpt-5.1",
voice: new OpenAIRealtimeVoice(),
});
// Listen for agent audio responses
voiceAgent.voice.on("speaker", ({ audio }) => {
playAudio(audio);
});
// Initiate the conversation
await voiceAgent.voice.speak("How can I help you today?");
// Send continuous audio from the microphone
const micStream = getMicrophoneStream();
await voiceAgent.voice.send(micStream);
访问 OpenAI 语音参考 以获取有关 OpenAI 语音提供商的更多信息。
🌐 Visit the OpenAI Voice Reference for more information on the OpenAI voice provider.
import { Agent } from "@mastra/core/agent";
import { playAudio, getMicrophoneStream } from "@mastra/node-audio";
import { GeminiLiveVoice } from "@mastra/voice-google-gemini-live";
const voiceAgent = new Agent({
id: "voice-agent",
name: "Voice Agent",
instructions:
"You are a voice assistant that can help users with their tasks.",
model: "openai/gpt-5.1",
voice: new GeminiLiveVoice({
// Live API mode
apiKey: process.env.GOOGLE_API_KEY,
model: "gemini-2.0-flash-exp",
speaker: "Puck",
debug: true,
// Vertex AI alternative:
// vertexAI: true,
// project: 'your-gcp-project',
// location: 'us-central1',
// serviceAccountKeyFile: '/path/to/service-account.json',
}),
});
// Connect before using speak/send
await voiceAgent.voice.connect();
// Listen for agent audio responses
voiceAgent.voice.on("speaker", ({ audio }) => {
playAudio(audio);
});
// Listen for text responses and transcriptions
voiceAgent.voice.on("writing", ({ text, role }) => {
console.log(`${role}: ${text}`);
});
// Initiate the conversation
await voiceAgent.voice.speak("How can I help you today?");
// Send continuous audio from the microphone
const micStream = getMicrophoneStream();
await voiceAgent.voice.send(micStream);
访问 Google Gemini Live 参考 以获取有关 Google Gemini Live 语音提供商的更多信息。
🌐 Visit the Google Gemini Live Reference for more information on the Google Gemini Live voice provider.
语音配置Direct link to 语音配置
🌐 Voice Configuration
每个语音提供商都可以配置不同的模型和选项。以下是所有支持提供商的详细配置选项:
🌐 Each voice provider can be configured with different models and options. Below are the detailed configuration options for all supported providers:
- OpenAI
- Azure
- ElevenLabs
- PlayAI
- Cloudflare
- Deepgram
- Speechify
- Sarvam
- Murf
- OpenAI Realtime
- Google Gemini Live
- AI SDK
// OpenAI Voice Configuration
const voice = new OpenAIVoice({
speechModel: {
name: "gpt-3.5-turbo", // Example model name
apiKey: process.env.OPENAI_API_KEY,
language: "en-US", // Language code
voiceType: "neural", // Type of voice model
},
listeningModel: {
name: "whisper-1", // Example model name
apiKey: process.env.OPENAI_API_KEY,
language: "en-US", // Language code
format: "wav", // Audio format
},
speaker: "alloy", // Example speaker name
});
访问 OpenAI 语音参考 以获取有关 OpenAI 语音提供商的更多信息。
🌐 Visit the OpenAI Voice Reference for more information on the OpenAI voice provider.
// Azure Voice Configuration
const voice = new AzureVoice({
speechModel: {
name: "en-US-JennyNeural", // Example model name
apiKey: process.env.AZURE_SPEECH_KEY,
region: process.env.AZURE_SPEECH_REGION,
language: "en-US", // Language code
style: "cheerful", // Voice style
pitch: "+0Hz", // Pitch adjustment
rate: "1.0", // Speech rate
},
listeningModel: {
name: "en-US", // Example model name
apiKey: process.env.AZURE_SPEECH_KEY,
region: process.env.AZURE_SPEECH_REGION,
format: "simple", // Output format
},
});
请访问 Azure 语音参考 了解有关 Azure 语音提供程序的更多信息。
🌐 Visit the Azure Voice Reference for more information on the Azure voice provider.
// ElevenLabs Voice Configuration
const voice = new ElevenLabsVoice({
speechModel: {
voiceId: "your-voice-id", // Example voice ID
model: "eleven_multilingual_v2", // Example model name
apiKey: process.env.ELEVENLABS_API_KEY,
language: "en", // Language code
emotion: "neutral", // Emotion setting
},
// ElevenLabs may not have a separate listening model
});
访问 ElevenLabs 语音参考 以获取有关 ElevenLabs 语音提供商的更多信息。
🌐 Visit the ElevenLabs Voice Reference for more information on the ElevenLabs voice provider.
// PlayAI Voice Configuration
const voice = new PlayAIVoice({
speechModel: {
name: "playai-voice", // Example model name
speaker: "emma", // Example speaker name
apiKey: process.env.PLAYAI_API_KEY,
language: "en-US", // Language code
speed: 1.0, // Speech speed
},
// PlayAI may not have a separate listening model
});
访问 PlayAI 语音参考 以获取有关 PlayAI 语音提供商的更多信息。
🌐 Visit the PlayAI Voice Reference for more information on the PlayAI voice provider.
// Google Voice Configuration
const voice = new GoogleVoice({
speechModel: {
name: "en-US-Studio-O", // Example model name
apiKey: process.env.GOOGLE_API_KEY,
languageCode: "en-US", // Language code
gender: "FEMALE", // Voice gender
speakingRate: 1.0, // Speaking rate
},
listeningModel: {
name: "en-US", // Example model name
sampleRateHertz: 16000, // Sample rate
},
});
访问 Google 语音参考 以获取有关 Google 语音提供商的更多信息。
🌐 Visit the Google Voice Reference for more information on the Google voice provider.
// Cloudflare Voice Configuration
const voice = new CloudflareVoice({
speechModel: {
name: "cloudflare-voice", // Example model name
accountId: process.env.CLOUDFLARE_ACCOUNT_ID,
apiToken: process.env.CLOUDFLARE_API_TOKEN,
language: "en-US", // Language code
format: "mp3", // Audio format
},
// Cloudflare may not have a separate listening model
});
访问 Cloudflare 语音参考 以获取有关 Cloudflare 语音提供商的更多信息。
🌐 Visit the Cloudflare Voice Reference for more information on the Cloudflare voice provider.
// Deepgram Voice Configuration
const voice = new DeepgramVoice({
speechModel: {
name: "nova-2", // Example model name
speaker: "aura-english-us", // Example speaker name
apiKey: process.env.DEEPGRAM_API_KEY,
language: "en-US", // Language code
tone: "formal", // Tone setting
},
listeningModel: {
name: "nova-2", // Example model name
format: "flac", // Audio format
},
});
访问 Deepgram 语音参考 以获取有关 Deepgram 语音提供商的更多信息。
🌐 Visit the Deepgram Voice Reference for more information on the Deepgram voice provider.
// Speechify Voice Configuration
const voice = new SpeechifyVoice({
speechModel: {
name: "speechify-voice", // Example model name
speaker: "matthew", // Example speaker name
apiKey: process.env.SPEECHIFY_API_KEY,
language: "en-US", // Language code
speed: 1.0, // Speech speed
},
// Speechify may not have a separate listening model
});
访问 Speechify 语音参考 以获取有关 Speechify 语音提供商的更多信息。
🌐 Visit the Speechify Voice Reference for more information on the Speechify voice provider.
// Sarvam Voice Configuration
const voice = new SarvamVoice({
speechModel: {
name: "sarvam-voice", // Example model name
apiKey: process.env.SARVAM_API_KEY,
language: "en-IN", // Language code
style: "conversational", // Style setting
},
// Sarvam may not have a separate listening model
});
访问 Sarvam 语音参考 以获取有关 Sarvam 语音提供商的更多信息。
🌐 Visit the Sarvam Voice Reference for more information on the Sarvam voice provider.
// Murf Voice Configuration
const voice = new MurfVoice({
speechModel: {
name: "murf-voice", // Example model name
apiKey: process.env.MURF_API_KEY,
language: "en-US", // Language code
emotion: "happy", // Emotion setting
},
// Murf may not have a separate listening model
});
访问 Murf 语音参考 以获取有关 Murf 语音提供商的更多信息。
🌐 Visit the Murf Voice Reference for more information on the Murf voice provider.
// OpenAI Realtime Voice Configuration
const voice = new OpenAIRealtimeVoice({
speechModel: {
name: "gpt-3.5-turbo", // Example model name
apiKey: process.env.OPENAI_API_KEY,
language: "en-US", // Language code
},
listeningModel: {
name: "whisper-1", // Example model name
apiKey: process.env.OPENAI_API_KEY,
format: "ogg", // Audio format
},
speaker: "alloy", // Example speaker name
});
有关 OpenAI 实时语音提供商的更多信息,请参阅 OpenAI 实时语音参考。
🌐 For more information on the OpenAI Realtime voice provider, refer to the OpenAI Realtime Voice Reference.
// Google Gemini Live Voice Configuration
const voice = new GeminiLiveVoice({
speechModel: {
name: "gemini-2.0-flash-exp", // Example model name
apiKey: process.env.GOOGLE_API_KEY,
},
speaker: "Puck", // Example speaker name
// Google Gemini Live is a realtime bidirectional API without separate speech and listening models
});
访问 Google Gemini Live 参考 以获取有关 Google Gemini Live 语音提供商的更多信息。
🌐 Visit the Google Gemini Live Reference for more information on the Google Gemini Live voice provider.
// AI SDK Voice Configuration
import { CompositeVoice } from "@mastra/core/voice";
import { openai } from "@ai-sdk/openai";
import { elevenlabs } from "@ai-sdk/elevenlabs";
// Use AI SDK models directly - no need to install separate packages
const voice = new CompositeVoice({
input: openai.transcription('whisper-1'), // AI SDK transcription
output: elevenlabs.speech('eleven_turbo_v2'), // AI SDK speech
});
// Works seamlessly with your agent
const voiceAgent = new Agent({
id: "aisdk-voice-agent",
name: "AI SDK Voice Agent",
instructions: "You are a helpful assistant with voice capabilities.",
model: "openai/gpt-5.1",
voice,
});
使用多个语音提供商Direct link to 使用多个语音提供商
🌐 Using Multiple Voice Providers
这个示例演示了如何在 Mastra 中创建并使用两种不同的语音提供商:OpenAI 用于语音转文本(STT),PlayAI 用于文本转语音(TTS)。
🌐 This example demonstrates how to create and use two different voice providers in Mastra: OpenAI for speech-to-text (STT) and PlayAI for text-to-speech (TTS).
首先创建语音提供商的实例,并进行必要的配置。
🌐 Start by creating instances of the voice providers with any necessary configuration.
import { OpenAIVoice } from "@mastra/voice-openai";
import { PlayAIVoice } from "@mastra/voice-playai";
import { CompositeVoice } from "@mastra/core/voice";
import { playAudio, getMicrophoneStream } from "@mastra/node-audio";
// Initialize OpenAI voice for STT
const input = new OpenAIVoice({
listeningModel: {
name: "whisper-1",
apiKey: process.env.OPENAI_API_KEY,
},
});
// Initialize PlayAI voice for TTS
const output = new PlayAIVoice({
speechModel: {
name: "playai-voice",
apiKey: process.env.PLAYAI_API_KEY,
},
});
// Combine the providers using CompositeVoice
const voice = new CompositeVoice({
input,
output,
});
// Implement voice interactions using the combined voice provider
const audioStream = getMicrophoneStream(); // Assume this function gets audio input
const transcript = await voice.listen(audioStream);
// Log the transcribed text
console.log("Transcribed text:", transcript);
// Convert text to speech
const responseAudio = await voice.speak(`You said: ${transcript}`, {
speaker: "default", // Optional: specify a speaker,
responseFormat: "wav", // Optional: specify a response format
});
// Play the audio response
playAudio(responseAudio);
使用 AI SDK 模型提供商Direct link to 使用 AI SDK 模型提供商
🌐 Using AI SDK Model Providers
你也可以直接使用 CompositeVoice 的 AI SDK 模型:
🌐 You can also use AI SDK models directly with CompositeVoice:
import { CompositeVoice } from "@mastra/core/voice";
import { openai } from "@ai-sdk/openai";
import { elevenlabs } from "@ai-sdk/elevenlabs";
import { playAudio, getMicrophoneStream } from "@mastra/node-audio";
// Use AI SDK models directly - no provider setup needed
const voice = new CompositeVoice({
input: openai.transcription('whisper-1'), // AI SDK transcription
output: elevenlabs.speech('eleven_turbo_v2'), // AI SDK speech
});
// Works the same way as Mastra providers
const audioStream = getMicrophoneStream();
const transcript = await voice.listen(audioStream);
console.log("Transcribed text:", transcript);
// Convert text to speech
const responseAudio = await voice.speak(`You said: ${transcript}`, {
speaker: "Rachel", // ElevenLabs voice
});
playAudio(responseAudio);
你也可以将 AI SDK 模型与 Mastra 提供商混合使用:
🌐 You can also mix AI SDK models with Mastra providers:
import { CompositeVoice } from "@mastra/core/voice";
import { PlayAIVoice } from "@mastra/voice-playai";
import { groq } from "@ai-sdk/groq";
const voice = new CompositeVoice({
input: groq.transcription('whisper-large-v3'), // AI SDK for STT
output: new PlayAIVoice(), // Mastra provider for TTS
});
有关 CompositeVoice 的更多信息,请参阅 CompositeVoice 参考。
🌐 For more information on the CompositeVoice, refer to the CompositeVoice Reference.
更多资源Direct link to 更多资源
🌐 More Resources