voice.listen()
listen() 方法是所有 Mastra 语音提供商中可用的核心功能,它可以将语音转换为文本。它以音频流作为输入,并返回转录的文本。
🌐 The listen() method is a core function available in all Mastra voice providers that converts speech to text. It takes an audio stream as input and returns the transcribed text.
参数Direct link to 参数
🌐 Parameters
audioStream:
options?:
返回值Direct link to 返回值
🌐 Return Value
返回以下之一:
🌐 Returns one of the following:
Promise<string>:一个解析为转录文本的承诺Promise<NodeJS.ReadableStream>:一个返回转录文本流的承诺(用于流式转录)Promise<void>:对于实时提供商,它们会触发“写入”事件,而不是直接返回文本
特定于提供商的选项Direct link to 特定于提供商的选项
🌐 Provider-Specific Options
每个语音提供商可能支持其实现特有的附加选项。以下是一些示例:
🌐 Each voice provider may support additional options specific to their implementation. Here are some examples:
OpenAIDirect link to OpenAI
options.filetype?:
options.prompt?:
options.language?:
谷歌Direct link to 谷歌
options.stream?:
options.config?:
DeepgramDirect link to Deepgram
options.model?:
options.language?:
使用示例Direct link to 使用示例
🌐 Usage Example
import { OpenAIVoice } from "@mastra/voice-openai";
import { getMicrophoneStream } from "@mastra/node-audio";
import { createReadStream } from "fs";
import path from "path";
// Initialize a voice provider
const voice = new OpenAIVoice({
listeningModel: {
name: "whisper-1",
apiKey: process.env.OPENAI_API_KEY,
},
});
// Basic usage with a file stream
const audioFilePath = path.join(process.cwd(), "audio.mp3");
const audioStream = createReadStream(audioFilePath);
const transcript = await voice.listen(audioStream, {
filetype: "mp3",
});
console.log("Transcribed text:", transcript);
// Using a microphone stream
const microphoneStream = getMicrophoneStream(); // Assume this function gets audio input
const transcription = await voice.listen(microphoneStream);
// With provider-specific options
const transcriptWithOptions = await voice.listen(audioStream, {
language: "en",
prompt: "This is a conversation about artificial intelligence.",
});
与复合语音一起使用Direct link to 与复合语音一起使用
🌐 Using with CompositeVoice
在使用 CompositeVoice 时,listen() 方法会委托给配置的监听提供程序:
🌐 When using CompositeVoice, the listen() method delegates to the configured listening provider:
import { CompositeVoice } from "@mastra/core/voice";
import { OpenAIVoice } from "@mastra/voice-openai";
import { PlayAIVoice } from "@mastra/voice-playai";
const voice = new CompositeVoice({
input: new OpenAIVoice(),
output: new PlayAIVoice(),
});
// This will use the OpenAIVoice provider
const transcript = await voice.listen(audioStream);
使用 AI SDK 模型提供商Direct link to 使用 AI SDK 模型提供商
🌐 Using AI SDK Model Providers
你也可以直接使用 CompositeVoice 的 AI SDK 转录模型:
🌐 You can also use AI SDK transcription models directly with CompositeVoice:
import { CompositeVoice } from "@mastra/core/voice";
import { openai } from "@ai-sdk/openai";
import { groq } from "@ai-sdk/groq";
// Use AI SDK transcription models
const voice = new CompositeVoice({
input: openai.transcription('whisper-1'), // AI SDK model
output: new PlayAIVoice(), // Mastra provider
});
// Works the same way
const transcript = await voice.listen(audioStream);
// Provider-specific options can be passed through
const transcriptWithOptions = await voice.listen(audioStream, {
providerOptions: {
openai: {
language: 'en',
prompt: 'This is about AI',
}
}
});
有关 AI SDK 集成的更多详细信息,请参阅 CompositeVoice 参考。
🌐 See the CompositeVoice reference for more details on AI SDK integration.
实时语音提供商Direct link to 实时语音提供商
🌐 Realtime Voice Providers
在使用像 OpenAIRealtimeVoice 这样的实时语音提供商时,listen() 方法的表现有所不同:
🌐 When using realtime voice providers like OpenAIRealtimeVoice, the listen() method behaves differently:
- 它不会返回转录文本,而是发出包含转录文本的“写入”事件
- 你需要注册一个事件监听器以接收转录内容
import { OpenAIRealtimeVoice } from "@mastra/voice-openai-realtime";
import { getMicrophoneStream } from "@mastra/node-audio";
const voice = new OpenAIRealtimeVoice();
await voice.connect();
// Register event listener for transcription
voice.on("writing", ({ text, role }) => {
console.log(`${role}: ${text}`);
});
// This will emit 'writing' events instead of returning text
const microphoneStream = getMicrophoneStream();
await voice.listen(microphoneStream);
注意Direct link to 注意
🌐 Notes
- 并非所有语音提供商都支持语音转文字功能(例如,PlayAI、Speechify)
listen()的行为在不同提供商之间可能略有差异,但所有实现都遵循相同的基本接口- 使用实时语音提供商时,该方法可能不会直接返回文本,而是触发一个“writing”事件
- 支持的音频格式取决于提供商。常见的格式包括 MP3、WAV 和 M4A
- 一些提供商支持流式转录,即在转录的同时返回文本
- 为了获得最佳性能,完成使用音频流后,请考虑关闭或结束它
相关方法Direct link to 相关方法
🌐 Related Methods
- voice.speak() - 将文本转换为语音
- voice.send() - 实时向语音提供商发送音频数据
- voice.on() - 为语音事件注册一个事件监听器