如何用python语音转文字

使用Python进行语音转文字的方法有多种，包括使用Google Speech Recognition API、IBM Watson Speech to Text API、Microsoft Azure Speech API、以及开源的CMU Sphinx等。 其中，Google Speech Recognition API 是最常用和简单的方法之一。本文将详细介绍如何使用这些工具实现语音转文字的功能，并提供代码示例。

一、GOOGLE SPEECH RECOGNITION API

Google Speech Recognition API 是一个强大且易于使用的工具。它支持多种语言，并且能够处理背景噪音。使用此API需要安装 speech_recognition 库。

1. 安装 SpeechRecognition 库

首先，安装 SpeechRecognition 库。你可以使用以下命令在终端或命令提示符中进行安装：

pip install SpeechRecognition

2. 录制音频

你可以使用 pyaudio 库来录制音频。首先，安装 pyaudio：

pip install pyaudio

3. 语音转文字代码示例

以下是一个示例代码，展示了如何使用 speech_recognition 库将语音转换为文字：

import speech_recognition as sr
def recognize_speech_from_mic(recognizer, microphone):
    """Transcribe speech from recorded from `microphone`."""
    # check that recognizer and microphone arguments are appropriate type
    if not isinstance(recognizer, sr.Recognizer):
        raise TypeError("`recognizer` must be `Recognizer` instance")
    if not isinstance(microphone, sr.Microphone):
        raise TypeError("`microphone` must be `Microphone` instance")
    # adjust the recognizer sensitivity to ambient noise and record audio from the microphone
    with microphone as source:
        recognizer.adjust_for_ambient_noise(source)
        audio = recognizer.listen(source)
    # set up the response object
    response = {
        "success": True,
        "error": None,
        "transcription": None
    }
    # try recognizing the speech in the recording
    try:
        response["transcription"] = recognizer.recognize_google(audio)
    except sr.RequestError:
        # API was unreachable or unresponsive
        response["success"] = False
        response["error"] = "API unavailable"
    except sr.UnknownValueError:
        # speech was unintelligible
        response["error"] = "Unable to recognize speech"
    return response
if __name__ == "__main__":
    recognizer = sr.Recognizer()
    microphone = sr.Microphone()
    print("Please speak something...")
    response = recognize_speech_from_mic(recognizer, microphone)
    if response["success"]:
        print("You said: {}".format(response["transcription"]))
    else:
        print("I didn't catch that. What did you say?\nError: {}".format(response["error"]))

二、IBM WATSON SPEECH TO TEXT API

IBM Watson Speech to Text API 是另一个功能强大的语音识别工具，支持多种语言和方言，并且提供了高级的自定义选项。

1. 获取 API 密钥

首先，你需要注册 IBM Cloud 并获取 API 密钥。注册完成后，你可以在 IBM Cloud 仪表板中创建一个 Speech to Text 服务实例，并获取 API 密钥和服务 URL。

2. 安装 IBM Watson SDK

安装 ibm_watson 库：

pip install ibm_watson

3. 语音转文字代码示例

以下是一个示例代码，展示了如何使用 IBM Watson Speech to Text API 将语音转换为文字：

import json
from ibm_watson import SpeechToTextV1
from ibm_cloud_sdk_core.authenticators import IAMAuthenticator
Replace 'YOUR_API_KEY' and 'YOUR_SERVICE_URL' with your actual API key and service URL
api_key = 'YOUR_API_KEY'
service_url = 'YOUR_SERVICE_URL'
authenticator = IAMAuthenticator(api_key)
speech_to_text = SpeechToTextV1(authenticator=authenticator)
speech_to_text.set_service_url(service_url)
with open('audio-file.wav', 'rb') as audio_file:
    result = speech_to_text.recognize(
        audio=audio_file,
        content_type='audio/wav',
    ).get_result()
print(json.dumps(result, indent=2))

三、MICROSOFT AZURE SPEECH API

Microsoft Azure Speech API 是一个强大的语音识别工具，支持多种语言，并且提供了高级功能，如语音合成和语音识别自定义。

1. 获取 API 密钥

首先，你需要注册 Microsoft Azure 并获取 API 密钥。注册完成后，你可以在 Azure Portal 中创建一个 Speech 服务实例，并获取 API 密钥和服务区域。

2. 安装 Azure SDK

安装 azure-cognitiveservices-speech 库：

pip install azure-cognitiveservices-speech

3. 语音转文字代码示例

以下是一个示例代码，展示了如何使用 Microsoft Azure Speech API 将语音转换为文字：

import azure.cognitiveservices.speech as speechsdk
Replace 'YOUR_SUBSCRIPTION_KEY' and 'YOUR_SERVICE_REGION' with your actual subscription key and service region
subscription_key = 'YOUR_SUBSCRIPTION_KEY'
service_region = 'YOUR_SERVICE_REGION'
speech_config = speechsdk.SpeechConfig(subscription=subscription_key, region=service_region)
audio_input = speechsdk.AudioConfig(filename="audio-file.wav")
speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config, audio_config=audio_input)
result = speech_recognizer.recognize_once()
if result.reason == speechsdk.ResultReason.RecognizedSpeech:
    print("Recognized: {}".format(result.text))
elif result.reason == speechsdk.ResultReason.NoMatch:
    print("No speech could be recognized")
elif result.reason == speechsdk.ResultReason.Canceled:
    cancellation_details = result.cancellation_details
    print("Speech Recognition canceled: {}".format(cancellation_details.reason))
    if cancellation_details.reason == speechsdk.CancellationReason.Error:
        print("Error details: {}".format(cancellation_details.error_details))

四、CMU SPHINX

CMU Sphinx 是一个开源的语音识别系统，适用于不希望依赖第三方API的应用程序。

1. 安装 pocketsphinx 库

安装 pocketsphinx 库：

pip install pocketsphinx

2. 语音转文字代码示例

以下是一个示例代码，展示了如何使用 CMU Sphinx 将语音转换为文字：

import speech_recognition as sr
def recognize_speech_from_audio(audio_file):
    recognizer = sr.Recognizer()
    with sr.AudioFile(audio_file) as source:
        audio = recognizer.record(source)
    try:
        text = recognizer.recognize_sphinx(audio)
        print("You said: " + text)
    except sr.UnknownValueError:
        print("Sphinx could not understand audio")
    except sr.RequestError as e:
        print("Sphinx error; {0}".format(e))
if __name__ == "__main__":
    recognize_speech_from_audio('audio-file.wav')