Skip to main content

文字转语音(TTS)

🌐 Text-to-Speech (TTS)

Mastra 的文本转语音(TTS)提供了一个统一的 API,可使用各种服务提供商将文本合成成语音。通过将 TTS 集成到你的应用中,你可以通过自然的语音交互提升用户体验,提高视力障碍用户的可访问性,并创建更具吸引力的多模态界面。

🌐 Text-to-Speech (TTS) in Mastra offers a unified API for synthesizing spoken audio from text using various providers. By incorporating TTS into your applications, you can enhance user experience with natural voice interactions, improve accessibility for users with visual impairments, and create more engaging multimodal interfaces.

TTS 是任何语音应用的核心组件。与 STT(语音转文本)结合,它构成了语音交互系统的基础。更新的模型支持 STS(语音转语音),可用于实时交互,但成本较高($)。

🌐 TTS is a core component of any voice application. Combined with STT (Speech-to-Text), it forms the foundation of voice interaction systems. Newer models support STS (Speech-to-Speech) which can be used for real-time interactions but come at high cost ($).

配置
Direct link to 配置

🌐 Configuration

要在 Mastra 中使用 TTS,你需要在初始化语音提供商时提供一个 speechModel。这包括以下参数:

🌐 To use TTS in Mastra, you need to provide a speechModel when initializing the voice provider. This includes parameters such as:

  • name:要使用的具体 TTS 模型。
  • apiKey:用于身份验证的 API 密钥。
  • 供应商特定选项:特定语音提供商可能需要或支持的附加选项。

speaker 选项允许你为语音合成选择不同的声音。每个提供商都提供多种具有独特特性的声音选项,涵盖声音多样性质量声音个性多语言支持

🌐 The speaker option allows you to select different voices for speech synthesis. Each provider offers a variety of voice options with distinct characteristics for Voice diversity, Quality, Voice personality, and Multilingual support

注意:所有这些参数都是可选的。你可以使用语音提供商提供的默认设置,这取决于你使用的具体提供商。

const voice = new OpenAIVoice({
speechModel: {
name: "tts-1-hd",
apiKey: process.env.OPENAI_API_KEY,
},
speaker: "alloy",
});

// If using default settings the configuration can be simplified to:
const voice = new OpenAIVoice();

可用提供商
Direct link to 可用提供商

🌐 Available Providers

Mastra 支持多种文本转语音服务提供商,每个提供商都有其独特的功能和语音选项。你可以选择最适合你应用需求的提供商:

🌐 Mastra supports a wide range of Text-to-Speech providers, each with their own unique capabilities and voice options. You can choose the provider that best suits your application's needs:

  • OpenAI - 拥有自然语调和表达的高品质语音
  • Azure - 微软的语音服务,提供多种声音和语言
  • ElevenLabs - 拥有情感和精细控制的超逼真语音
  • PlayAI - 专注于拥有多种风格的自然语音
  • Google - 谷歌的多语言语音合成
  • Cloudflare - 面向边缘的语音合成,适用于低延迟应用
  • Deepgram - 具有高准确率的 AI 语音技术
  • Speechify - 针对可读性和可访问性优化的文字转语音
  • Sarvam - 专注于印度语言和口音
  • Murf - 具有可自定义参数的工作室级配音

每个提供程序都作为一个单独的包实现,你可以根据需要安装:

🌐 Each provider is implemented as a separate package that you can install as needed:

pnpm add @mastra/voice-openai@latest  # Example for OpenAI

使用 Speak 方法
Direct link to 使用 Speak 方法

🌐 Using the Speak Method

TTS 的主要方法是 speak() 方法,它可以将文本转换为语音。该方法可以接受选项,允许你指定说话人以及其他特定于提供商的选项。使用方法如下:

🌐 The primary method for TTS is the speak() method, which converts text to speech. This method can accept options that allows you to specify the speaker and other provider-specific options. Here's how to use it:

import { Agent } from "@mastra/core/agent";
import { OpenAIVoice } from "@mastra/voice-openai";

const voice = new OpenAIVoice();

const agent = new Agent({
id: "voice-agent",
name: "Voice Agent",
instructions:
"You are a voice assistant that can help users with their tasks.",
model: "openai/gpt-5.1",
voice,
});

const { text } = await agent.generate("What color is the sky?");

// Convert text to speech to an Audio Stream
const readableStream = await voice.speak(text, {
speaker: "default", // Optional: specify a speaker
properties: {
speed: 1.0, // Optional: adjust speech speed
pitch: "default", // Optional: specify pitch if supported
},
});

查看 为代理添加语音 文档,了解如何在代理中使用 TTS。

🌐 Check out the Adding Voice to Agents documentation to learn how to use TTS in an agent.