voice.speak()
speak() 方法是所有 Mastra 语音提供商中可用的核心功能,它可以将文本转换为语音。该方法接收文本输入,并返回可播放或保存的音频流。
🌐 The speak() method is a core function available in all Mastra voice providers that converts text to speech. It takes text input and returns an audio stream that can be played or saved.
参数Direct link to 参数
🌐 Parameters
input:
options?:
options.speaker?:
返回值Direct link to 返回值
🌐 Return Value
返回一个 Promise<NodeJS.ReadableStream | void>,其中:
🌐 Returns a Promise<NodeJS.ReadableStream | void> where:
NodeJS.ReadableStream:可以播放或保存的音频数据流void:当使用通过事件而不是直接返回音频的实时语音提供商时
特定于提供商的选项Direct link to 特定于提供商的选项
🌐 Provider-Specific Options
每个语音提供商可能支持其实现特有的附加选项。以下是一些示例:
🌐 Each voice provider may support additional options specific to their implementation. Here are some examples:
OpenAIDirect link to OpenAI
options.speed?:
ElevenLabsDirect link to ElevenLabs
options.stability?:
options.similarity_boost?:
谷歌Direct link to 谷歌
options.languageCode?:
options.audioConfig?:
MurfDirect link to Murf
options.properties.rate?:
options.properties.pitch?:
options.properties.format?:
使用示例Direct link to 使用示例
🌐 Usage Example
import { OpenAIVoice } from "@mastra/voice-openai";
// Initialize a voice provider
const voice = new OpenAIVoice({
speaker: "alloy", // Default voice
});
// Basic usage with default settings
const audioStream = await voice.speak("Hello, world!");
// Using a different voice for this specific request
const audioStreamWithDifferentVoice = await voice.speak("Hello again!", {
speaker: "nova",
});
// Using provider-specific options
const audioStreamWithOptions = await voice.speak("Hello with options!", {
speaker: "echo",
speed: 1.2, // OpenAI-specific option
});
// Using a text stream as input
import { Readable } from "stream";
const textStream = Readable.from(["Hello", " from", " a", " stream!"]);
const audioStreamFromTextStream = await voice.speak(textStream);
与复合语音一起使用Direct link to 与复合语音一起使用
🌐 Using with CompositeVoice
在使用 CompositeVoice 时,speak() 方法会委托给配置的语音提供商:
🌐 When using CompositeVoice, the speak() method delegates to the configured speaking provider:
import { CompositeVoice } from "@mastra/core/voice";
import { OpenAIVoice } from "@mastra/voice-openai";
import { PlayAIVoice } from "@mastra/voice-playai";
const voice = new CompositeVoice({
output: new PlayAIVoice(),
input: new OpenAIVoice(),
});
// This will use the PlayAIVoice provider
const audioStream = await voice.speak("Hello, world!");
使用 AI SDK 模型提供商Direct link to 使用 AI SDK 模型提供商
🌐 Using AI SDK Model Providers
你也可以直接使用 CompositeVoice 的 AI SDK 语音模型:
🌐 You can also use AI SDK speech models directly with CompositeVoice:
import { CompositeVoice } from "@mastra/core/voice";
import { openai } from "@ai-sdk/openai";
import { elevenlabs } from "@ai-sdk/elevenlabs";
// Use AI SDK speech models
const voice = new CompositeVoice({
output: elevenlabs.speech('eleven_turbo_v2'), // AI SDK model
input: openai.transcription('whisper-1'), // AI SDK model
});
// Works the same way
const audioStream = await voice.speak("Hello from AI SDK!");
// Provider-specific options can be passed through
const audioWithOptions = await voice.speak("Hello with options!", {
speaker: 'Rachel', // ElevenLabs voice
providerOptions: {
elevenlabs: {
stability: 0.5,
similarity_boost: 0.75,
}
}
});
有关 AI SDK 集成的更多详细信息,请参阅 CompositeVoice 参考。
🌐 See the CompositeVoice reference for more details on AI SDK integration.
实时语音提供商Direct link to 实时语音提供商
🌐 Realtime Voice Providers
在使用像 OpenAIRealtimeVoice 这样的实时语音提供商时,speak() 方法的表现有所不同:
🌐 When using realtime voice providers like OpenAIRealtimeVoice, the speak() method behaves differently:
- 它不是返回音频流,而是触发一个包含音频数据的 'speaking' 事件
- 你需要注册一个事件监听器来接收音频片段
import { OpenAIRealtimeVoice } from "@mastra/voice-openai-realtime";
import Speaker from "@mastra/node-speaker";
const speaker = new Speaker({
sampleRate: 24100, // Audio sample rate in Hz - standard for high-quality audio on MacBook Pro
channels: 1, // Mono audio output (as opposed to stereo which would be 2)
bitDepth: 16, // Bit depth for audio quality - CD quality standard (16-bit resolution)
});
const voice = new OpenAIRealtimeVoice();
await voice.connect();
// Register event listener for audio chunks
voice.on("speaker", (stream) => {
// Handle audio chunk (e.g., play it or save it)
stream.pipe(speaker);
});
// This will emit 'speaking' events instead of returning a stream
await voice.speak("Hello, this is realtime speech!");
注意Direct link to 注意
🌐 Notes
speak()的行为在不同提供商之间可能略有不同,但所有实现都遵循相同的基本接口。- 在使用实时语音提供商时,该方法可能不会直接返回音频流,而是会触发一个“speaking”事件。
- 如果输入提供的是文本流,提供商通常会在处理之前将其转换为字符串。
- 返回流的音频格式取决于提供商。常见的格式包括 MP3、WAV 和 OGG。
- 为了获得最佳性能,使用完音频流后,请考虑关闭或结束它。