Skip to main content

voice.speak()

speak() 方法是所有 Mastra 语音提供商中可用的核心功能,它可以将文本转换为语音。该方法接收文本输入,并返回可播放或保存的音频流。

🌐 The speak() method is a core function available in all Mastra voice providers that converts text to speech. It takes text input and returns an audio stream that can be played or saved.

参数
Direct link to 参数

🌐 Parameters

input:

string | NodeJS.ReadableStream
Text to convert to speech. Can be a string or a readable stream of text.

options?:

object
Options for speech synthesis

options.speaker?:

string
Voice ID to use for this specific request. Overrides the default speaker set in the constructor.

返回值
Direct link to 返回值

🌐 Return Value

返回一个 Promise<NodeJS.ReadableStream | void>,其中:

🌐 Returns a Promise<NodeJS.ReadableStream | void> where:

  • NodeJS.ReadableStream:可以播放或保存的音频数据流
  • void:当使用通过事件而不是直接返回音频的实时语音提供商时

特定于提供商的选项
Direct link to 特定于提供商的选项

🌐 Provider-Specific Options

每个语音提供商可能支持其实现特有的附加选项。以下是一些示例:

🌐 Each voice provider may support additional options specific to their implementation. Here are some examples:

OpenAI
Direct link to OpenAI

options.speed?:

number
= 1.0
Speech speed multiplier. Values between 0.25 and 4.0 are supported.

ElevenLabs
Direct link to ElevenLabs

options.stability?:

number
= 0.5
Voice stability. Higher values result in more stable, less expressive speech.

options.similarity_boost?:

number
= 0.75
Voice clarity and similarity to the original voice.

谷歌
Direct link to 谷歌

🌐 Google

options.languageCode?:

string
Language code for the voice (e.g., 'en-US').

options.audioConfig?:

object
= { audioEncoding: 'LINEAR16' }
Audio configuration options from Google Cloud Text-to-Speech API.

Murf
Direct link to Murf

options.properties.rate?:

number
Speech rate multiplier.

options.properties.pitch?:

number
Voice pitch adjustment.

options.properties.format?:

'MP3' | 'WAV' | 'FLAC' | 'ALAW' | 'ULAW'
Output audio format.

使用示例
Direct link to 使用示例

🌐 Usage Example

import { OpenAIVoice } from "@mastra/voice-openai";
// Initialize a voice provider
const voice = new OpenAIVoice({
speaker: "alloy", // Default voice
});
// Basic usage with default settings
const audioStream = await voice.speak("Hello, world!");
// Using a different voice for this specific request
const audioStreamWithDifferentVoice = await voice.speak("Hello again!", {
speaker: "nova",
});
// Using provider-specific options
const audioStreamWithOptions = await voice.speak("Hello with options!", {
speaker: "echo",
speed: 1.2, // OpenAI-specific option
});
// Using a text stream as input
import { Readable } from "stream";
const textStream = Readable.from(["Hello", " from", " a", " stream!"]);
const audioStreamFromTextStream = await voice.speak(textStream);

与复合语音一起使用
Direct link to 与复合语音一起使用

🌐 Using with CompositeVoice

在使用 CompositeVoice 时,speak() 方法会委托给配置的语音提供商:

🌐 When using CompositeVoice, the speak() method delegates to the configured speaking provider:

import { CompositeVoice } from "@mastra/core/voice";
import { OpenAIVoice } from "@mastra/voice-openai";
import { PlayAIVoice } from "@mastra/voice-playai";

const voice = new CompositeVoice({
output: new PlayAIVoice(),
input: new OpenAIVoice(),
});

// This will use the PlayAIVoice provider
const audioStream = await voice.speak("Hello, world!");

使用 AI SDK 模型提供商
Direct link to 使用 AI SDK 模型提供商

🌐 Using AI SDK Model Providers

你也可以直接使用 CompositeVoice 的 AI SDK 语音模型:

🌐 You can also use AI SDK speech models directly with CompositeVoice:

import { CompositeVoice } from "@mastra/core/voice";
import { openai } from "@ai-sdk/openai";
import { elevenlabs } from "@ai-sdk/elevenlabs";

// Use AI SDK speech models
const voice = new CompositeVoice({
output: elevenlabs.speech('eleven_turbo_v2'), // AI SDK model
input: openai.transcription('whisper-1'), // AI SDK model
});

// Works the same way
const audioStream = await voice.speak("Hello from AI SDK!");

// Provider-specific options can be passed through
const audioWithOptions = await voice.speak("Hello with options!", {
speaker: 'Rachel', // ElevenLabs voice
providerOptions: {
elevenlabs: {
stability: 0.5,
similarity_boost: 0.75,
}
}
});

有关 AI SDK 集成的更多详细信息,请参阅 CompositeVoice 参考

🌐 See the CompositeVoice reference for more details on AI SDK integration.

实时语音提供商
Direct link to 实时语音提供商

🌐 Realtime Voice Providers

在使用像 OpenAIRealtimeVoice 这样的实时语音提供商时,speak() 方法的表现有所不同:

🌐 When using realtime voice providers like OpenAIRealtimeVoice, the speak() method behaves differently:

  • 它不是返回音频流,而是触发一个包含音频数据的 'speaking' 事件
  • 你需要注册一个事件监听器来接收音频片段
import { OpenAIRealtimeVoice } from "@mastra/voice-openai-realtime";
import Speaker from "@mastra/node-speaker";

const speaker = new Speaker({
sampleRate: 24100, // Audio sample rate in Hz - standard for high-quality audio on MacBook Pro
channels: 1, // Mono audio output (as opposed to stereo which would be 2)
bitDepth: 16, // Bit depth for audio quality - CD quality standard (16-bit resolution)
});

const voice = new OpenAIRealtimeVoice();
await voice.connect();
// Register event listener for audio chunks
voice.on("speaker", (stream) => {
// Handle audio chunk (e.g., play it or save it)
stream.pipe(speaker);
});
// This will emit 'speaking' events instead of returning a stream
await voice.speak("Hello, this is realtime speech!");

注意
Direct link to 注意

🌐 Notes

  • speak() 的行为在不同提供商之间可能略有不同,但所有实现都遵循相同的基本接口。
  • 在使用实时语音提供商时,该方法可能不会直接返回音频流,而是会触发一个“speaking”事件。
  • 如果输入提供的是文本流,提供商通常会在处理之前将其转换为字符串。
  • 返回流的音频格式取决于提供商。常见的格式包括 MP3、WAV 和 OGG。
  • 为了获得最佳性能,使用完音频流后,请考虑关闭或结束它。