Skip to main content

voice.listen()

listen() 方法是所有 Mastra 语音提供商中可用的核心功能,它可以将语音转换为文本。它以音频流作为输入,并返回转录的文本。

🌐 The listen() method is a core function available in all Mastra voice providers that converts speech to text. It takes an audio stream as input and returns the transcribed text.

参数
Direct link to 参数

🌐 Parameters

audioStream:

NodeJS.ReadableStream
Audio stream to transcribe. This can be a file stream or a microphone stream.

options?:

object
Provider-specific options for speech recognition

返回值
Direct link to 返回值

🌐 Return Value

返回以下之一:

🌐 Returns one of the following:

  • Promise<string>:一个解析为转录文本的承诺
  • Promise<NodeJS.ReadableStream>:一个返回转录文本流的承诺(用于流式转录)
  • Promise<void>:对于实时提供商,它们会触发“写入”事件,而不是直接返回文本

特定于提供商的选项
Direct link to 特定于提供商的选项

🌐 Provider-Specific Options

每个语音提供商可能支持其实现特有的附加选项。以下是一些示例:

🌐 Each voice provider may support additional options specific to their implementation. Here are some examples:

OpenAI
Direct link to OpenAI

options.filetype?:

string
= 'mp3'
Audio file format (e.g., 'mp3', 'wav', 'm4a')

options.prompt?:

string
Text to guide the model's transcription

options.language?:

string
Language code (e.g., 'en', 'fr', 'de')

谷歌
Direct link to 谷歌

🌐 Google

options.stream?:

boolean
= false
Whether to use streaming recognition

options.config?:

object
= { encoding: 'LINEAR16', languageCode: 'en-US' }
Recognition configuration from Google Cloud Speech-to-Text API

Deepgram
Direct link to Deepgram

options.model?:

string
= 'nova-2'
Deepgram model to use for transcription

options.language?:

string
= 'en'
Language code for transcription

使用示例
Direct link to 使用示例

🌐 Usage Example

import { OpenAIVoice } from "@mastra/voice-openai";
import { getMicrophoneStream } from "@mastra/node-audio";
import { createReadStream } from "fs";
import path from "path";

// Initialize a voice provider
const voice = new OpenAIVoice({
listeningModel: {
name: "whisper-1",
apiKey: process.env.OPENAI_API_KEY,
},
});

// Basic usage with a file stream
const audioFilePath = path.join(process.cwd(), "audio.mp3");
const audioStream = createReadStream(audioFilePath);
const transcript = await voice.listen(audioStream, {
filetype: "mp3",
});
console.log("Transcribed text:", transcript);

// Using a microphone stream
const microphoneStream = getMicrophoneStream(); // Assume this function gets audio input
const transcription = await voice.listen(microphoneStream);

// With provider-specific options
const transcriptWithOptions = await voice.listen(audioStream, {
language: "en",
prompt: "This is a conversation about artificial intelligence.",
});

与复合语音一起使用
Direct link to 与复合语音一起使用

🌐 Using with CompositeVoice

在使用 CompositeVoice 时,listen() 方法会委托给配置的监听提供程序:

🌐 When using CompositeVoice, the listen() method delegates to the configured listening provider:

import { CompositeVoice } from "@mastra/core/voice";
import { OpenAIVoice } from "@mastra/voice-openai";
import { PlayAIVoice } from "@mastra/voice-playai";

const voice = new CompositeVoice({
input: new OpenAIVoice(),
output: new PlayAIVoice(),
});

// This will use the OpenAIVoice provider
const transcript = await voice.listen(audioStream);

使用 AI SDK 模型提供商
Direct link to 使用 AI SDK 模型提供商

🌐 Using AI SDK Model Providers

你也可以直接使用 CompositeVoice 的 AI SDK 转录模型:

🌐 You can also use AI SDK transcription models directly with CompositeVoice:

import { CompositeVoice } from "@mastra/core/voice";
import { openai } from "@ai-sdk/openai";
import { groq } from "@ai-sdk/groq";

// Use AI SDK transcription models
const voice = new CompositeVoice({
input: openai.transcription('whisper-1'), // AI SDK model
output: new PlayAIVoice(), // Mastra provider
});

// Works the same way
const transcript = await voice.listen(audioStream);

// Provider-specific options can be passed through
const transcriptWithOptions = await voice.listen(audioStream, {
providerOptions: {
openai: {
language: 'en',
prompt: 'This is about AI',
}
}
});

有关 AI SDK 集成的更多详细信息,请参阅 CompositeVoice 参考

🌐 See the CompositeVoice reference for more details on AI SDK integration.

实时语音提供商
Direct link to 实时语音提供商

🌐 Realtime Voice Providers

在使用像 OpenAIRealtimeVoice 这样的实时语音提供商时,listen() 方法的表现有所不同:

🌐 When using realtime voice providers like OpenAIRealtimeVoice, the listen() method behaves differently:

  • 它不会返回转录文本,而是发出包含转录文本的“写入”事件
  • 你需要注册一个事件监听器以接收转录内容
import { OpenAIRealtimeVoice } from "@mastra/voice-openai-realtime";
import { getMicrophoneStream } from "@mastra/node-audio";

const voice = new OpenAIRealtimeVoice();
await voice.connect();

// Register event listener for transcription
voice.on("writing", ({ text, role }) => {
console.log(`${role}: ${text}`);
});

// This will emit 'writing' events instead of returning text
const microphoneStream = getMicrophoneStream();
await voice.listen(microphoneStream);

注意
Direct link to 注意

🌐 Notes

  • 并非所有语音提供商都支持语音转文字功能(例如,PlayAI、Speechify)
  • listen() 的行为在不同提供商之间可能略有差异,但所有实现都遵循相同的基本接口
  • 使用实时语音提供商时,该方法可能不会直接返回文本,而是触发一个“writing”事件
  • 支持的音频格式取决于提供商。常见的格式包括 MP3、WAV 和 M4A
  • 一些提供商支持流式转录,即在转录的同时返回文本
  • 为了获得最佳性能,完成使用音频流后,请考虑关闭或结束它

🌐 Related Methods