如何用python输入声音

用Python输入声音的几种方法包括：使用pyaudio库、使用sounddevice库、使用wave库。 其中，pyaudio 是一个非常流行的库，因为它提供了对音频流的实时处理能力，能够非常方便地从麦克风捕获音频数据。下面将详细介绍如何使用pyaudio库来输入声音。

一、安装pyaudio库

在开始之前，首先需要安装pyaudio库。可以使用以下命令来安装：

pip install pyaudio

二、使用pyaudio捕获音频

1、导入必要的库

首先，我们需要导入pyaudio以及其他一些必要的库：

import pyaudio
import wave
import numpy as np

2、设置音频流参数

接下来，我们需要设置一些参数，比如采样率、帧大小、通道数等：

FORMAT = pyaudio.paInt16 # 16-bit resolution CHANNELS = 1 # 1 channel RATE = 44100 # 44.1kHz sampling rate CHUNK = 1024 # 1024 samples per frame RECORD_SECONDS = 5 # Duration of recording WAVE_OUTPUT_FILENAME = "output.wav" # Output filename

3、初始化pyaudio对象

然后，我们需要初始化pyaudio对象，并打开一个音频流：

audio = pyaudio.PyAudio()
Start Recording
stream = audio.open(format=FORMAT, channels=CHANNELS,
                    rate=RATE, input=True,
                    frames_per_buffer=CHUNK)
print("Recording...")
frames = []

4、读取音频数据

接下来，我们需要从音频流中读取数据，并将其保存到一个列表中：

for i in range(0, int(RATE / CHUNK * RECORD_SECONDS)):
    data = stream.read(CHUNK)
    frames.append(data)
print("Finished recording.")

5、停止和关闭音频流

录音完成后，我们需要停止并关闭音频流，然后终止pyaudio对象：

# Stop Recording
stream.stop_stream()
stream.close()
audio.terminate()

6、保存音频数据到文件

最后，我们需要将捕获的音频数据保存到一个WAV文件中：

waveFile = wave.open(WAVE_OUTPUT_FILENAME, 'wb')
waveFile.setnchannels(CHANNELS)
waveFile.setsampwidth(audio.get_sample_size(FORMAT))
waveFile.setframerate(RATE)
waveFile.writeframes(b''.join(frames))
waveFile.close()

三、使用sounddevice库捕获音频

除了pyaudio库，sounddevice库也是一个非常方便的库，它可以直接捕获音频数据，并且代码更加简洁。下面是使用sounddevice库的示例：

1、安装sounddevice库

可以使用以下命令来安装sounddevice库：

pip install sounddevice

2、捕获音频数据

使用sounddevice库捕获音频数据的代码如下：

import sounddevice as sd
import numpy as np
import wave
fs = 44100  # Sample rate
seconds = 5  # Duration of recording
print("Recording...")
myrecording = sd.rec(int(seconds * fs), samplerate=fs, channels=1, dtype='int16')
sd.wait()  # Wait until recording is finished
print("Finished recording.")
Save as WAV file
write('output.wav', fs, myrecording)

以上代码使用了sounddevice库的rec函数来捕获音频数据，并使用wait函数来等待录音完成。最后，将捕获的音频数据保存到一个WAV文件中。

四、使用wave库处理音频文件

wave库是Python标准库中的一个模块，它提供了对WAV文件的读写操作。我们可以使用wave库来处理捕获的音频数据。

1、读取WAV文件

我们可以使用wave库读取一个WAV文件，并获取其参数和数据：

import wave
import numpy as np
Open the WAV file
waveFile = wave.open('output.wav', 'rb')
Get the parameters of the WAV file
n_channels = waveFile.getnchannels()
sample_width = waveFile.getsampwidth()
frame_rate = waveFile.getframerate()
n_frames = waveFile.getnframes()
Read the data from the WAV file
frames = waveFile.readframes(n_frames)
waveFile.close()
Convert the data to numpy array
audio_data = np.frombuffer(frames, dtype=np.int16)

2、写入WAV文件

我们还可以使用wave库将numpy数组数据写入一个WAV文件：

import wave
import numpy as np
Create a new WAV file
waveFile = wave.open('output_modified.wav', 'wb')
Set the parameters of the WAV file
waveFile.setnchannels(n_channels)
waveFile.setsampwidth(sample_width)
waveFile.setframerate(frame_rate)
Convert the numpy array to bytes
frames = audio_data.tobytes()
Write the data to the WAV file
waveFile.writeframes(frames)
waveFile.close()

五、音频数据处理

捕获音频数据后，我们可能需要对其进行处理，比如滤波、变调、特征提取等。下面介绍一些常见的音频数据处理方法。

1、滤波

滤波是音频信号处理中非常常见的一种操作。我们可以使用scipy库中的信号处理模块对音频数据进行滤波：

import scipy.signal as signal
Design a low-pass filter
fs = 44100  # Sample rate
cutoff = 1000  # Cutoff frequency
nyquist = 0.5 * fs
normal_cutoff = cutoff / nyquist
b, a = signal.butter(5, normal_cutoff, btype='low', analog=False)
Apply the filter to the audio data
filtered_audio = signal.filtfilt(b, a, audio_data)

2、变调

变调是指改变音频信号的频率成分。我们可以使用librosa库对音频数据进行变调：

import librosa
Load the audio data
y, sr = librosa.load('output.wav', sr=44100)
Change the pitch by 4 semitones
y_shifted = librosa.effects.pitch_shift(y, sr, n_steps=4)
Save the modified audio data
librosa.output.write_wav('output_shifted.wav', y_shifted, sr)

3、特征提取

在音频信号处理中，特征提取是非常重要的一步。我们可以使用librosa库提取音频信号的特征，比如MFCC（梅尔频率倒谱系数）：

import librosa
Load the audio data
y, sr = librosa.load('output.wav', sr=44100)
Extract MFCC features
mfccs = librosa.feature.mfcc(y=y, sr=sr, n_mfcc=13)
Print the shape of the MFCC features
print(mfccs.shape)

六、音频数据可视化

音频数据可视化可以帮助我们更好地理解音频信号。我们可以使用matplotlib库对音频数据进行可视化。

1、绘制波形图

波形图是音频信号的时域表示，可以显示音频信号的振幅随时间的变化：

import matplotlib.pyplot as plt
Load the audio data
y, sr = librosa.load('output.wav', sr=44100)
Plot the waveform
plt.figure(figsize=(14, 5))
librosa.display.waveshow(y, sr=sr)
plt.title('Waveform')
plt.xlabel('Time (s)')
plt.ylabel('Amplitude')
plt.show()

2、绘制频谱图

频谱图是音频信号的频域表示，可以显示音频信号的频率成分：

import matplotlib.pyplot as plt
Compute the Short-Time Fourier Transform (STFT)
D = np.abs(librosa.stft(y))
Convert the amplitude to decibels
DB = librosa.amplitude_to_db(D, ref=np.max)
Plot the spectrogram
plt.figure(figsize=(14, 5))
librosa.display.specshow(DB, sr=sr, x_axis='time', y_axis='log')
plt.colorbar(format='%+2.0f dB')
plt.title('Spectrogram')
plt.xlabel('Time (s)')
plt.ylabel('Frequency (Hz)')
plt.show()

七、实时音频处理

实时音频处理是指在音频数据捕获的同时进行处理。我们可以使用pyaudio库实现实时音频处理。

1、实时音频捕获

首先，我们需要使用pyaudio库捕获实时音频数据：

import pyaudio
import numpy as np
FORMAT = pyaudio.paInt16  # 16-bit resolution
CHANNELS = 1  # 1 channel
RATE = 44100  # 44.1kHz sampling rate
CHUNK = 1024  # 1024 samples per frame
audio = pyaudio.PyAudio()
Start Recording
stream = audio.open(format=FORMAT, channels=CHANNELS,
                    rate=RATE, input=True,
                    frames_per_buffer=CHUNK)
print("Recording...")

2、实时音频处理

接下来，我们需要在捕获音频数据的同时对其进行处理。下面是一个简单的示例，演示如何计算实时音频数据的音量：

try:
    while True:
        data = stream.read(CHUNK)
        audio_data = np.frombuffer(data, dtype=np.int16)
        volume = np.linalg.norm(audio_data) / CHUNK
        print("Volume:", volume)
except KeyboardInterrupt:
    print("Finished recording.")
    stream.stop_stream()
    stream.close()
    audio.terminate()

八、总结

本文详细介绍了如何使用Python输入声音，包括使用pyaudio库、sounddevice库以及wave库。同时，我们还介绍了音频数据的处理方法，包括滤波、变调、特征提取等。另外，我们还展示了如何对音频数据进行可视化，以及如何进行实时音频处理。通过本文的学习，你应该已经掌握了如何使用Python进行音频数据的捕获和处理。希望这些内容对你有所帮助。