Skip to main content

语音转文字(STT)

🌐 Speech-to-Text (STT)

Mastra中的语音转文字(STT)提供了一个标准化的接口,可用于将音频输入转换为文本,适用于多个服务提供商。STT有助于创建支持语音的应用,这些应用可以响应人类语音,实现免提交互,为残障用户提供可访问性,并实现更自然的人机界面。

🌐 Speech-to-Text (STT) in Mastra provides a standardized interface for converting audio input into text across multiple service providers. STT helps create voice-enabled applications that can respond to human speech, enabling hands-free interaction, accessibility for users with disabilities, and more natural human-computer interfaces.

配置
Direct link to 配置

🌐 Configuration

要在 Mastra 中使用 STT,你需要在初始化语音提供商时提供 listeningModel。这包括以下参数:

🌐 To use STT in Mastra, you need to provide a listeningModel when initializing the voice provider. This includes parameters such as:

  • name:要使用的具体 STT 模型。
  • apiKey:用于身份验证的 API 密钥。
  • 供应商特定选项:特定语音提供商可能需要或支持的附加选项。

注意:所有这些参数都是可选的。你可以使用语音提供商提供的默认设置,这取决于你使用的具体提供商。

const voice = new OpenAIVoice({
listeningModel: {
name: "whisper-1",
apiKey: process.env.OPENAI_API_KEY,
},
});

// If using default settings the configuration can be simplified to:
const voice = new OpenAIVoice();

可用提供商
Direct link to 可用提供商

🌐 Available Providers

Mastra 支持多种语音转文字提供商,每个提供商都有其自身的功能和优势:

🌐 Mastra supports several Speech-to-Text providers, each with their own capabilities and strengths:

  • OpenAI - 使用 Whisper 模型进行高精度转录
  • Azure - 微软的企业级可靠语音识别
  • ElevenLabs - 支持多种语言的先进语音识别
  • 谷歌 - 谷歌的语音识别,支持多种语言
  • Cloudflare - 面向低延迟应用的边缘优化语音识别
  • Deepgram - 由人工智能驱动的语音识别,能高精度识别各种口音
  • Sarvam - 专注于印度语言和口音

每个提供程序都作为一个单独的包实现,你可以根据需要安装:

🌐 Each provider is implemented as a separate package that you can install as needed:

pnpm add @mastra/voice-openai@latest  # Example for OpenAI

使用聆听方法
Direct link to 使用聆听方法

🌐 Using the Listen Method

语音转文本(STT)的主要方法是 listen() 方法,它可以将语音音频转换为文本。使用方法如下:

🌐 The primary method for STT is the listen() method, which converts spoken audio into text. Here's how to use it:

import { Agent } from "@mastra/core/agent";
import { OpenAIVoice } from "@mastra/voice-openai";
import { getMicrophoneStream } from "@mastra/node-audio";

const voice = new OpenAIVoice();

const agent = new Agent({
id: "voice-agent",
name: "Voice Agent",
instructions:
"You are a voice assistant that provides recommendations based on user input.",
model: "openai/gpt-5.1",
voice,
});

const audioStream = getMicrophoneStream(); // Assume this function gets audio input

const transcript = await agent.voice.listen(audioStream, {
filetype: "m4a", // Optional: specify the audio file type
});

console.log(`User said: ${transcript}`);

const { text } = await agent.generate(
`Based on what the user said, provide them a recommendation: ${transcript}`,
);

console.log(`Recommendation: ${text}`);

查看 为代理添加语音 文档,了解如何在代理中使用语音转文字(STT)。

🌐 Check out the Adding Voice to Agents documentation to learn how to use STT in an agent.