voice.listen()

listen() 方法是所有 Mastra 语音提供商中可用的核心功能，它可以将语音转换为文本。它以音频流作为输入，并返回转录的文本。

🌐 The listen() method is a core function available in all Mastra voice providers that converts speech to text. It takes an audio stream as input and returns the transcribed text.

参数
Direct link to 参数

🌐 Parameters

audioStream:

NodeJS.ReadableStream

Audio stream to transcribe. This can be a file stream or a microphone stream.

options?:

object

Provider-specific options for speech recognition

返回值
Direct link to 返回值

🌐 Return Value

返回以下之一：

🌐 Returns one of the following:

Promise<string>：一个解析为转录文本的承诺
Promise<NodeJS.ReadableStream>：一个返回转录文本流的承诺（用于流式转录）
Promise<void>：对于实时提供商，它们会触发“写入”事件，而不是直接返回文本

特定于提供商的选项
Direct link to 特定于提供商的选项

🌐 Provider-Specific Options

每个语音提供商可能支持其实现特有的附加选项。以下是一些示例：

🌐 Each voice provider may support additional options specific to their implementation. Here are some examples:

OpenAI
Direct link to OpenAI

options.filetype?:

string

= 'mp3'

Audio file format (e.g., 'mp3', 'wav', 'm4a')

options.prompt?:

string

Text to guide the model's transcription

options.language?:

string

Language code (e.g., 'en', 'fr', 'de')

谷歌
Direct link to 谷歌

🌐 Google

options.stream?:

boolean

= false

Whether to use streaming recognition

options.config?:

object

= { encoding: 'LINEAR16', languageCode: 'en-US' }

Recognition configuration from Google Cloud Speech-to-Text API

Deepgram
Direct link to Deepgram

options.model?:

string

= 'nova-2'

Deepgram model to use for transcription

options.language?:

string

= 'en'

Language code for transcription

使用示例
Direct link to 使用示例

🌐 Usage Example

import { OpenAIVoice } from "@mastra/voice-openai";
import { getMicrophoneStream } from "@mastra/node-audio";
import { createReadStream } from "fs";
import path from "path";

// Initialize a voice provider
const voice = new OpenAIVoice({
  listeningModel: {
    name: "whisper-1",
    apiKey: process.env.OPENAI_API_KEY,
  },
});

// Basic usage with a file stream
const audioFilePath = path.join(process.cwd(), "audio.mp3");
const audioStream = createReadStream(audioFilePath);
const transcript = await voice.listen(audioStream, {
  filetype: "mp3",
});
console.log("Transcribed text:", transcript);

// Using a microphone stream
const microphoneStream = getMicrophoneStream(); // Assume this function gets audio input
const transcription = await voice.listen(microphoneStream);

// With provider-specific options
const transcriptWithOptions = await voice.listen(audioStream, {
  language: "en",
  prompt: "This is a conversation about artificial intelligence.",
});

与复合语音一起使用
Direct link to 与复合语音一起使用

🌐 Using with CompositeVoice

在使用 CompositeVoice 时，listen() 方法会委托给配置的监听提供程序：

🌐 When using CompositeVoice, the listen() method delegates to the configured listening provider:

import { CompositeVoice } from "@mastra/core/voice";
import { OpenAIVoice } from "@mastra/voice-openai";
import { PlayAIVoice } from "@mastra/voice-playai";

const voice = new CompositeVoice({
  input: new OpenAIVoice(),
  output: new PlayAIVoice(),
});

// This will use the OpenAIVoice provider
const transcript = await voice.listen(audioStream);

使用 AI SDK 模型提供商
Direct link to 使用 AI SDK 模型提供商

🌐 Using AI SDK Model Providers

你也可以直接使用 CompositeVoice 的 AI SDK 转录模型：

🌐 You can also use AI SDK transcription models directly with CompositeVoice:

import { CompositeVoice } from "@mastra/core/voice";
import { openai } from "@ai-sdk/openai";
import { groq } from "@ai-sdk/groq";

// Use AI SDK transcription models
const voice = new CompositeVoice({
  input: openai.transcription('whisper-1'),  // AI SDK model
  output: new PlayAIVoice(),                 // Mastra provider
});

// Works the same way
const transcript = await voice.listen(audioStream);

// Provider-specific options can be passed through
const transcriptWithOptions = await voice.listen(audioStream, {
  providerOptions: {
    openai: {
      language: 'en',
      prompt: 'This is about AI',
    }
  }
});

有关 AI SDK 集成的更多详细信息，请参阅 CompositeVoice 参考。

🌐 See the CompositeVoice reference for more details on AI SDK integration.

实时语音提供商
Direct link to 实时语音提供商

🌐 Realtime Voice Providers

在使用像 OpenAIRealtimeVoice 这样的实时语音提供商时，listen() 方法的表现有所不同：

🌐 When using realtime voice providers like OpenAIRealtimeVoice, the listen() method behaves differently:

它不会返回转录文本，而是发出包含转录文本的“写入”事件
你需要注册一个事件监听器以接收转录内容

import { OpenAIRealtimeVoice } from "@mastra/voice-openai-realtime";
import { getMicrophoneStream } from "@mastra/node-audio";

const voice = new OpenAIRealtimeVoice();
await voice.connect();

// Register event listener for transcription
voice.on("writing", ({ text, role }) => {
  console.log(`${role}: ${text}`);
});

// This will emit 'writing' events instead of returning text
const microphoneStream = getMicrophoneStream();
await voice.listen(microphoneStream);

注意
Direct link to 注意

🌐 Notes

并非所有语音提供商都支持语音转文字功能（例如，PlayAI、Speechify）
listen() 的行为在不同提供商之间可能略有差异，但所有实现都遵循相同的基本接口
使用实时语音提供商时，该方法可能不会直接返回文本，而是触发一个“writing”事件
支持的音频格式取决于提供商。常见的格式包括 MP3、WAV 和 M4A
一些提供商支持流式转录，即在转录的同时返回文本
为了获得最佳性能，完成使用音频流后，请考虑关闭或结束它

🌐 Related Methods

voice.speak() - 将文本转换为语音
voice.send() - 实时向语音提供商发送音频数据
voice.on() - 为语音事件注册一个事件监听器

参数Direct link to 参数

audioStream:

options?:

返回值Direct link to 返回值

特定于提供商的选项Direct link to 特定于提供商的选项

OpenAIDirect link to OpenAI

options.filetype?:

options.prompt?:

options.language?:

谷歌Direct link to 谷歌

options.stream?:

options.config?:

DeepgramDirect link to Deepgram

options.model?:

options.language?:

使用示例Direct link to 使用示例

与复合语音一起使用Direct link to 与复合语音一起使用

使用 AI SDK 模型提供商Direct link to 使用 AI SDK 模型提供商

实时语音提供商Direct link to 实时语音提供商

注意Direct link to 注意

相关方法Direct link to 相关方法

参数
Direct link to 参数

返回值
Direct link to 返回值

特定于提供商的选项
Direct link to 特定于提供商的选项

OpenAI
Direct link to OpenAI

谷歌
Direct link to 谷歌

Deepgram
Direct link to Deepgram

使用示例
Direct link to 使用示例

与复合语音一起使用
Direct link to 与复合语音一起使用

使用 AI SDK 模型提供商
Direct link to 使用 AI SDK 模型提供商

实时语音提供商
Direct link to 实时语音提供商

注意
Direct link to 注意

相关方法
Direct link to 相关方法