Skip to main content

谷歌

🌐 Google

Mastra 中的 Google 语音实现提供了文本转语音(TTS)和语音转文本(STT)功能,使用 Google Cloud 服务。它支持多种语音、语言、先进的音频配置选项,并支持标准 API 密钥认证以及企业部署的 Vertex AI 模式。

🌐 The Google Voice implementation in Mastra provides both text-to-speech (TTS) and speech-to-text (STT) capabilities using Google Cloud services. It supports multiple voices, languages, advanced audio configuration options, and both standard API key authentication and Vertex AI mode for enterprise deployments.

使用示例
Direct link to 使用示例

🌐 Usage Example

import { GoogleVoice } from "@mastra/voice-google";

// Initialize with default configuration (uses GOOGLE_API_KEY environment variable)
const voice = new GoogleVoice();

// Text-to-Speech
const audioStream = await voice.speak("Hello, world!", {
languageCode: "en-US",
audioConfig: {
audioEncoding: "LINEAR16",
},
});

// Speech-to-Text
const transcript = await voice.listen(audioStream, {
config: {
encoding: "LINEAR16",
languageCode: "en-US",
},
});

// Get available voices for a specific language
const voices = await voice.getSpeakers({ languageCode: "en-US" });

构造函数参数
Direct link to 构造函数参数

🌐 Constructor Parameters

speechModel?:

GoogleModelConfig
= { apiKey: process.env.GOOGLE_API_KEY }
Configuration for text-to-speech functionality

listeningModel?:

GoogleModelConfig
= { apiKey: process.env.GOOGLE_API_KEY }
Configuration for speech-to-text functionality

speaker?:

string
= 'en-US-Casual-K'
Default voice ID to use for text-to-speech

vertexAI?:

boolean
= false
Enable Vertex AI mode for enterprise deployments. Uses project-based authentication instead of API keys. Requires 'project' to be set.

project?:

string
Google Cloud project ID (required when vertexAI is true). Falls back to GOOGLE_CLOUD_PROJECT environment variable.

location?:

string
= 'us-central1'
Google Cloud region for Vertex AI. Falls back to GOOGLE_CLOUD_LOCATION environment variable.

GoogleModelConfig
Direct link to GoogleModelConfig

apiKey?:

string
Google Cloud API key. Falls back to GOOGLE_API_KEY environment variable. Not used when vertexAI is true.

keyFilename?:

string
Path to service account JSON key file. Falls back to GOOGLE_APPLICATION_CREDENTIALS environment variable.

credentials?:

object
In-memory service account credentials object with client_email and private_key properties.

方法
Direct link to 方法

🌐 Methods

speak()
Direct link to speak()

使用 Google 云文本转语音服务将文本转换为语音。

🌐 Converts text to speech using Google Cloud Text-to-Speech service.

input:

string | NodeJS.ReadableStream
Text to convert to speech. If a stream is provided, it will be converted to text first.

options?:

object
Speech synthesis options

options.speaker?:

string
Voice ID to use for this request

options.languageCode?:

string
Language code for the voice (e.g., 'en-US'). Defaults to the language code from the speaker ID or 'en-US'

options.audioConfig?:

ISynthesizeSpeechRequest['audioConfig']
= { audioEncoding: 'LINEAR16' }
Audio configuration options from Google Cloud Text-to-Speech API

返回:Promise<NodeJS.ReadableStream>

🌐 Returns: Promise<NodeJS.ReadableStream>

listen()
Direct link to listen()

使用 Google 云语音转文本服务将语音转换为文本。

🌐 Converts speech to text using Google Cloud Speech-to-Text service.

audioStream:

NodeJS.ReadableStream
Audio stream to transcribe

options?:

object
Recognition options

options.stream?:

boolean
Whether to use streaming recognition

options.config?:

IRecognitionConfig
= { encoding: 'LINEAR16', languageCode: 'en-US' }
Recognition configuration from Google Cloud Speech-to-Text API

返回:Promise<string>

🌐 Returns: Promise<string>

getSpeakers()
Direct link to getSpeakers()

返回一个可用语音选项的数组,每个节点包含:

🌐 Returns an array of available voice options, where each node contains:

voiceId:

string
Unique identifier for the voice

languageCodes:

string[]
List of language codes supported by this voice

isUsingVertexAI()
Direct link to isUsingVertexAI()

检查 Vertex AI 模式是否已启用。

🌐 Checks if Vertex AI mode is enabled.

返回:如果使用 Vertex AI,则为 boolean - true,否则为 false

🌐 Returns: boolean - true if using Vertex AI, false otherwise

getProject()
Direct link to getProject()

获取已配置的 Google Cloud 项目 ID。

🌐 Gets the configured Google Cloud project ID.

返回值:string | undefined - 项目 ID,如果未设置则为 undefined

🌐 Returns: string | undefined - The project ID or undefined if not set

getLocation()
Direct link to getLocation()

获取配置的 Google Cloud 位置/区域。

🌐 Gets the configured Google Cloud location/region.

返回:string - 位置(默认:'us-central1'

🌐 Returns: string - The location (default: 'us-central1')

验证
Direct link to 验证

🌐 Authentication

Google Voice 提供商支持两种身份验证方法:

🌐 The Google Voice provider supports two authentication methods:

标准模式(API 密钥)
Direct link to 标准模式(API 密钥)

🌐 Standard Mode (API Key)

使用 Google Cloud API 密钥进行身份验证。适用于开发和简单的使用场景。

🌐 Uses a Google Cloud API key for authentication. Suitable for development and simple use cases.

// Using environment variable (GOOGLE_API_KEY)
const voice = new GoogleVoice();

// Using explicit API key
const voice = new GoogleVoice({
speechModel: { apiKey: "your-api-key" },
listeningModel: { apiKey: "your-api-key" },
speaker: "en-US-Casual-K",
});

Vertex AI 模式(服务账户)
Direct link to Vertex AI 模式(服务账户)

🌐 Vertex AI Mode (Service Account)

使用基于 Google Cloud 项目的服务账户进行身份验证。推荐用于生产环境和企业部署。

🌐 Uses Google Cloud project-based authentication with service accounts. Recommended for production and enterprise deployments.

好处:

  • 更好的安全性(代码中没有 API 密钥)
  • 基于身份和访问管理的访问控制
  • 项目级别的计费和配额
  • 审计日志
  • 企业功能

配置选项:

// Using Application Default Credentials (ADC)
// Set GOOGLE_APPLICATION_CREDENTIALS and GOOGLE_CLOUD_PROJECT env vars
const voice = new GoogleVoice({
vertexAI: true,
project: "your-gcp-project",
location: "us-central1", // Optional, defaults to 'us-central1'
});

// Using service account key file
const voice = new GoogleVoice({
vertexAI: true,
project: "your-gcp-project",
speechModel: {
keyFilename: "/path/to/service-account.json",
},
listeningModel: {
keyFilename: "/path/to/service-account.json",
},
});

// Using in-memory credentials
const voice = new GoogleVoice({
vertexAI: true,
project: "your-gcp-project",
speechModel: {
credentials: {
client_email: "service-account@project.iam.gserviceaccount.com",
private_key: "-----BEGIN PRIVATE KEY-----\n...\n-----END PRIVATE KEY-----",
},
},
});

所需权限:

IAM 角色:

用于文本转语音:

🌐 For Text-to-Speech:

  • roles/texttospeech.admin - 文本转语音管理员(完全访问权限)
  • roles/texttospeech.editor - 文本转语音编辑器(创建和管理)
  • roles/texttospeech.viewer - 文本转语音查看器(只读)

用于语音转文字:

🌐 For Speech-to-Text:

  • roles/speech.client - 语音转文字客户端

OAuth 范围:

用于同步文本到语音合成:

🌐 For synchronous Text-to-Speech synthesis:

  • https://www.googleapis.com/auth/cloud-platform - 全面访问 Google 云平台服务

针对长音频的文本转语音操作:

🌐 For long-audio Text-to-Speech operations:

  • locations.longAudioSynthesize - 创建长音频合成操作
  • operations.get - 获取操作状态
  • operations.list - 列表操作

重要提示
Direct link to 重要提示

🌐 Important Notes

  1. 身份验证:需要谷歌云 API 密钥(标准模式)或服务账号凭据(Vertex AI 模式)。
  2. 环境变量
    • GOOGLE_API_KEY - 标准模式的API密钥
    • GOOGLE_CLOUD_PROJECT - Vertex AI 模式的项目 ID
    • GOOGLE_CLOUD_LOCATION - Vertex AI 模型的位置(默认为 'us-central1')
    • GOOGLE_APPLICATION_CREDENTIALS - 服务账号密钥文件的路径
  3. 默认语音设置为 'en-US-Casual-K'
  4. 文本转语音和语音转文本服务都使用 LINEAR16 作为默认音频编码。
  5. speak() 方法通过 Google Cloud 文本转语音 API 支持高级音频配置。
  6. listen() 方法通过 Google Cloud 语音转文字 API 支持多种识别配置。
  7. 可用的语音可以使用 getSpeakers() 方法按语言代码进行筛选。
  8. Vertex AI 模式提供企业功能,包括 IAM 控制、审计日志和项目级计费。