语音
实时语音交互,支持文本转语音、语音转文本和实时双工流式传输。
文本转语音(TTS)
将文本转换为语音音频。
const audio = await client.voice.tts("agent-id", {
text: "Hello! How can I help you today?",
voiceName: "aria",
language: "en",
outputFormat: "mp3",
});
// audio.data contains the audio bytes语音转文本(STT)
将音频转录为文本。
const result = await client.voice.stt("agent-id", {
audio: base64AudioData,
audioFormat: "wav",
language: "en",
});
console.log(result.text);实时语音流式传输
实时双工语音对话。获取令牌后打开双向流。
// 1. Get a streaming token
const token = await client.voice.getToken("agent-id", {
voiceName: "aria",
userId: "user-123",
});
// 2. Connect to live stream
const stream = await client.voice.stream(token);
// Send audio chunks
stream.sendAudio(audioChunk);
// Or send text for the agent to speak
stream.sendText("Tell me about your day");
// Receive events
for await (const event of stream) {
if (event.type === "audio") {
playAudio(event.data);
} else if (event.type === "transcript") {
console.log(event.text);
}
}
// End session
stream.endSession();WebSocket 传输
实时流式传输由 WebSocket 驱动,支持实时双工音频。客户端向上游发送麦克风音频块, 同时接收合成语音和转录文本,实现自然的对话流。
浏览语音目录
列出可用的语音。
const voices = await client.voices.list({
language: "en",
gender: "female",
});
for (const voice of voices.items) {
console.log(voice.name, voice.language, voice.gender);
}