语音转文字（STT）

🌐 Speech-to-Text (STT)

Mastra中的语音转文字（STT）提供了一个标准化的接口，可用于将音频输入转换为文本，适用于多个服务提供商。STT有助于创建支持语音的应用，这些应用可以响应人类语音，实现免提交互，为残障用户提供可访问性，并实现更自然的人机界面。

🌐 Speech-to-Text (STT) in Mastra provides a standardized interface for converting audio input into text across multiple service providers. STT helps create voice-enabled applications that can respond to human speech, enabling hands-free interaction, accessibility for users with disabilities, and more natural human-computer interfaces.

配置
Direct link to 配置

🌐 Configuration

要在 Mastra 中使用 STT，你需要在初始化语音提供商时提供 listeningModel。这包括以下参数：

🌐 To use STT in Mastra, you need to provide a listeningModel when initializing the voice provider. This includes parameters such as:

name：要使用的具体 STT 模型。
apiKey：用于身份验证的 API 密钥。
供应商特定选项：特定语音提供商可能需要或支持的附加选项。

注意：所有这些参数都是可选的。你可以使用语音提供商提供的默认设置，这取决于你使用的具体提供商。

const voice = new OpenAIVoice({
  listeningModel: {
    name: "whisper-1",
    apiKey: process.env.OPENAI_API_KEY,
  },
});

// If using default settings the configuration can be simplified to:
const voice = new OpenAIVoice();

可用提供商
Direct link to 可用提供商

🌐 Available Providers

Mastra 支持多种语音转文字提供商，每个提供商都有其自身的功能和优势：

🌐 Mastra supports several Speech-to-Text providers, each with their own capabilities and strengths:

OpenAI - 使用 Whisper 模型进行高精度转录
Azure - 微软的企业级可靠语音识别
ElevenLabs - 支持多种语言的先进语音识别
谷歌 - 谷歌的语音识别，支持多种语言
Cloudflare - 面向低延迟应用的边缘优化语音识别
Deepgram - 由人工智能驱动的语音识别，能高精度识别各种口音
Sarvam - 专注于印度语言和口音

每个提供程序都作为一个单独的包实现，你可以根据需要安装：

🌐 Each provider is implemented as a separate package that you can install as needed:

pnpm add @mastra/voice-openai@latest  # Example for OpenAI

使用聆听方法
Direct link to 使用聆听方法

🌐 Using the Listen Method

语音转文本（STT）的主要方法是 listen() 方法，它可以将语音音频转换为文本。使用方法如下：

🌐 The primary method for STT is the listen() method, which converts spoken audio into text. Here's how to use it:

import { Agent } from "@mastra/core/agent";
import { OpenAIVoice } from "@mastra/voice-openai";
import { getMicrophoneStream } from "@mastra/node-audio";

const voice = new OpenAIVoice();

const agent = new Agent({
  id: "voice-agent",
  name: "Voice Agent",
  instructions:
    "You are a voice assistant that provides recommendations based on user input.",
  model: "openai/gpt-5.1",
  voice,
});

const audioStream = getMicrophoneStream(); // Assume this function gets audio input

const transcript = await agent.voice.listen(audioStream, {
  filetype: "m4a", // Optional: specify the audio file type
});

console.log(`User said: ${transcript}`);

const { text } = await agent.generate(
  `Based on what the user said, provide them a recommendation: ${transcript}`,
);

console.log(`Recommendation: ${text}`);

查看为代理添加语音文档，了解如何在代理中使用语音转文字（STT）。

🌐 Check out the Adding Voice to Agents documentation to learn how to use STT in an agent.

配置Direct link to 配置

可用提供商Direct link to 可用提供商

使用聆听方法Direct link to 使用聆听方法

配置
Direct link to 配置

可用提供商
Direct link to 可用提供商

使用聆听方法
Direct link to 使用聆听方法