js如何实现文字转语音

JavaScript 通过 Web Speech API、Text-to-Speech（TTS）库、服务器端 TTS 服务可以实现文字转语音。本文将详细介绍这三种方法中的一种——Web Speech API，以及如何利用它来实现文字转语音功能。

一、Web Speech API 简介

Web Speech API 是一组允许网页使用语音识别和合成技术的 API。使用 Web Speech API，开发者可以让网页应用程序具备语音输入和输出功能。它主要包括两个部分：SpeechSynthesis（语音合成）和 SpeechRecognition（语音识别）。本文将重点介绍如何利用 SpeechSynthesis 实现文字转语音。

1、SpeechSynthesis 的基本用法

SpeechSynthesis 是 Web Speech API 的一个接口，它允许我们将文字转化为语音并播放出来。使用它非常简单，只需创建一个 SpeechSynthesisUtterance 实例，并调用 window.speechSynthesis.speak 方法即可。

const msg = new SpeechSynthesisUtterance('Hello World');
window.speechSynthesis.speak(msg);

2、设置语音属性

除了基本的文字转语音功能，SpeechSynthesis 还允许我们设置语音的属性，如音调（pitch）、语速（rate）和音量（volume）。这些属性可以通过设置 SpeechSynthesisUtterance 实例的相应属性来实现。

const msg = new SpeechSynthesisUtterance('Hello World');
msg.pitch = 1.5;  // 设置音调
msg.rate = 1;     // 设置语速
msg.volume = 1;   // 设置音量
window.speechSynthesis.speak(msg);

3、选择不同的语音

Web Speech API 还允许我们选择不同的语音（voice）。可以通过调用 window.speechSynthesis.getVoices 方法获取所有可用的语音，然后在创建 SpeechSynthesisUtterance 实例时选择其中一个。

const msg = new SpeechSynthesisUtterance('Hello World');
const voices = window.speechSynthesis.getVoices();
msg.voice = voices.find(voice => voice.name === 'Google US English');
window.speechSynthesis.speak(msg);

二、实际应用示例

1、创建一个简单的文字转语音网页应用

下面我们将创建一个简单的网页应用，它包含一个文本输入框和一个按钮，用户可以输入文字并点击按钮将文字转化为语音。

<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="UTF-8">
  <meta name="viewport" content="width=device-width, initial-scale=1.0">
  <title>Text to Speech</title>
</head>
<body>
  <h1>Text to Speech</h1>
  <textarea id="text" rows="10" cols="50" placeholder="Enter text here..."></textarea>
  <br>
  <button onclick="speak()">Speak</button>
  <script>
    function speak() {
      const text = document.getElementById('text').value;
      const msg = new SpeechSynthesisUtterance(text);
      window.speechSynthesis.speak(msg);
    }
  </script>
</body>
</html>

2、添加语音属性和选择语音功能

为了让我们的应用更加实用，我们可以添加一些控件让用户设置语音的属性和选择不同的语音。

<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="UTF-8">
  <meta name="viewport" content="width=device-width, initial-scale=1.0">
  <title>Text to Speech</title>
</head>
<body>
  <h1>Text to Speech</h1>
  <textarea id="text" rows="10" cols="50" placeholder="Enter text here..."></textarea>
  <br>
  <label for="pitch">Pitch: </label>
  <input type="range" id="pitch" min="0" max="2" step="0.1" value="1">
  <br>
  <label for="rate">Rate: </label>
  <input type="range" id="rate" min="0.1" max="2" step="0.1" value="1">
  <br>
  <label for="volume">Volume: </label>
  <input type="range" id="volume" min="0" max="1" step="0.1" value="1">
  <br>
  <label for="voice">Voice: </label>
  <select id="voice"></select>
  <br>
  <button onclick="speak()">Speak</button>
  <script>
    const voiceSelect = document.getElementById('voice');
    const voices = window.speechSynthesis.getVoices();
    voices.forEach(voice => {
      const option = document.createElement('option');
      option.value = voice.name;
      option.text = voice.name;
      voiceSelect.add(option);
    });
    function speak() {
      const text = document.getElementById('text').value;
      const msg = new SpeechSynthesisUtterance(text);
      msg.pitch = document.getElementById('pitch').value;
      msg.rate = document.getElementById('rate').value;
      msg.volume = document.getElementById('volume').value;
      msg.voice = voices.find(voice => voice.name === voiceSelect.value);
      window.speechSynthesis.speak(msg);
    }
  </script>
</body>
</html>

3、处理语音播放事件

为了提高用户体验，我们还可以处理一些语音播放的事件。例如，当语音播放完毕时，我们可以显示一个提示信息。

<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="UTF-8">
  <meta name="viewport" content="width=device-width, initial-scale=1.0">
  <title>Text to Speech</title>
</head>
<body>
  <h1>Text to Speech</h1>
  <textarea id="text" rows="10" cols="50" placeholder="Enter text here..."></textarea>
  <br>
  <label for="pitch">Pitch: </label>
  <input type="range" id="pitch" min="0" max="2" step="0.1" value="1">
  <br>
  <label for="rate">Rate: </label>
  <input type="range" id="rate" min="0.1" max="2" step="0.1" value="1">
  <br>
  <label for="volume">Volume: </label>
  <input type="range" id="volume" min="0" max="1" step="0.1" value="1">
  <br>
  <label for="voice">Voice: </label>
  <select id="voice"></select>
  <br>
  <button onclick="speak()">Speak</button>
  <p id="status"></p>
  <script>
    const voiceSelect = document.getElementById('voice');
    const status = document.getElementById('status');
    const voices = window.speechSynthesis.getVoices();
    voices.forEach(voice => {
      const option = document.createElement('option');
      option.value = voice.name;
      option.text = voice.name;
      voiceSelect.add(option);
    });
    function speak() {
      const text = document.getElementById('text').value;
      const msg = new SpeechSynthesisUtterance(text);
      msg.pitch = document.getElementById('pitch').value;
      msg.rate = document.getElementById('rate').value;
      msg.volume = document.getElementById('volume').value;
      msg.voice = voices.find(voice => voice.name === voiceSelect.value);
      msg.onend = function(event) {
        status.textContent = 'Speech finished';
      };
      window.speechSynthesis.speak(msg);
    }
  </script>
</body>
</html>

三、使用第三方 Text-to-Speech 库

除了 Web Speech API，我们还可以使用一些第三方 Text-to-Speech 库来实现文字转语音功能。这些库通常提供更多的功能和更好的兼容性，但可能需要额外的配置和依赖。

1、使用 ResponsiveVoice.js

ResponsiveVoice.js 是一个非常流行的 Text-to-Speech 库，支持多种语言和语音。使用它非常简单，只需引入它的 JavaScript 文件并调用相应的方法即可。

<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="UTF-8">
  <meta name="viewport" content="width=device-width, initial-scale=1.0">
  <title>Text to Speech</title>
  <script src="https://code.responsivevoice.org/responsivevoice.js?key=YOUR_KEY"></script>
</head>
<body>
  <h1>Text to Speech</h1>
  <textarea id="text" rows="10" cols="50" placeholder="Enter text here..."></textarea>
  <br>
  <button onclick="speak()">Speak</button>
  <script>
    function speak() {
      const text = document.getElementById('text').value;
      responsiveVoice.speak(text, "UK English Male");
    }
  </script>
</body>
</html>

2、使用 Google Cloud Text-to-Speech

Google Cloud Text-to-Speech 是 Google 提供的一个强大的 Text-to-Speech 服务，支持多种语言和语音。使用它需要先创建一个 Google Cloud 项目并启用 Text-to-Speech API，然后获取 API 密钥。

const textToSpeech = require('@google-cloud/text-to-speech');
const fs = require('fs');
const util = require('util');
async function main() {
  const client = new textToSpeech.TextToSpeechClient();
  const text = 'Hello, world!';
  const request = {
    input: {text: text},
    voice: {languageCode: 'en-US', ssmlGender: 'NEUTRAL'},
    audioConfig: {audioEncoding: 'MP3'},
  };
  const [response] = await client.synthesizeSpeech(request);
  const writeFile = util.promisify(fs.writeFile);
  await writeFile('output.mp3', response.audioContent, 'binary');
  console.log('Audio content written to file: output.mp3');
}
main();

四、总结

JavaScript 提供了多种实现文字转语音的方法，包括 Web Speech API 和第三方 Text-to-Speech 库。Web Speech API 是最简单和直接的方法，它允许我们在浏览器中轻松实现文字转语音功能。而第三方库如 ResponsiveVoice.js 和 Google Cloud Text-to-Speech 则提供了更多的功能和更好的兼容性。根据实际需求选择合适的方法，可以让你的网页应用变得更加智能和人性化。