语音转文字(STT)
🌐 Speech-to-Text (STT)
Mastra中的语音转文字(STT)提供了一个标准化的接口,可用于将音频输入转换为文本,适用于多个服务提供商。STT有助于创建支持语音的应用,这些应用可以响应人类语音,实现免提交互,为残障用户提供可访问性,并实现更自然的人机界面。
🌐 Speech-to-Text (STT) in Mastra provides a standardized interface for converting audio input into text across multiple service providers. STT helps create voice-enabled applications that can respond to human speech, enabling hands-free interaction, accessibility for users with disabilities, and more natural human-computer interfaces.
配置Direct link to 配置
🌐 Configuration
要在 Mastra 中使用 STT,你需要在初始化语音提供商时提供 listeningModel。这包括以下参数:
🌐 To use STT in Mastra, you need to provide a listeningModel when initializing the voice provider. This includes parameters such as:
name:要使用的具体 STT 模型。apiKey:用于身份验证的 API 密钥。- 供应商特定选项:特定语音提供商可能需要或支持的附加选项。
注意:所有这些参数都是可选的。你可以使用语音提供商提供的默认设置,这取决于你使用的具体提供商。
const voice = new OpenAIVoice({
listeningModel: {
name: "whisper-1",
apiKey: process.env.OPENAI_API_KEY,
},
});
// If using default settings the configuration can be simplified to:
const voice = new OpenAIVoice();
可用提供商Direct link to 可用提供商
🌐 Available Providers
Mastra 支持多种语音转文字提供商,每个提供商都有其自身的功能和优势:
🌐 Mastra supports several Speech-to-Text providers, each with their own capabilities and strengths:
- OpenAI - 使用 Whisper 模型进行高精度转录
- Azure - 微软的企业级可靠语音识别
- ElevenLabs - 支持多种语言的先进语音识别
- 谷歌 - 谷歌的语音识别,支持多种语言
- Cloudflare - 面向低延迟应用的边缘优化语音识别
- Deepgram - 由人工智能驱动的语音识别,能高精度识别各种口音
- Sarvam - 专注于印度语言和口音
每个提供程序都作为一个单独的包实现,你可以根据需要安装:
🌐 Each provider is implemented as a separate package that you can install as needed:
pnpm add @mastra/voice-openai@latest # Example for OpenAI
使用聆听方法Direct link to 使用聆听方法
🌐 Using the Listen Method
语音转文本(STT)的主要方法是 listen() 方法,它可以将语音音频转换为文本。使用方法如下:
🌐 The primary method for STT is the listen() method, which converts spoken audio into text. Here's how to use it:
import { Agent } from "@mastra/core/agent";
import { OpenAIVoice } from "@mastra/voice-openai";
import { getMicrophoneStream } from "@mastra/node-audio";
const voice = new OpenAIVoice();
const agent = new Agent({
id: "voice-agent",
name: "Voice Agent",
instructions:
"You are a voice assistant that provides recommendations based on user input.",
model: "openai/gpt-5.1",
voice,
});
const audioStream = getMicrophoneStream(); // Assume this function gets audio input
const transcript = await agent.voice.listen(audioStream, {
filetype: "m4a", // Optional: specify the audio file type
});
console.log(`User said: ${transcript}`);
const { text } = await agent.generate(
`Based on what the user said, provide them a recommendation: ${transcript}`,
);
console.log(`Recommendation: ${text}`);
查看 为代理添加语音 文档,了解如何在代理中使用语音转文字(STT)。
🌐 Check out the Adding Voice to Agents documentation to learn how to use STT in an agent.