python如何批量下载图片

要在Python中批量下载图片，可以使用requests库来处理HTTP请求、使用os库创建文件夹、使用concurrent.futures实现并发下载、使用PIL库来验证图片的有效性。以下将详细介绍这些方法并提供代码示例。

一、导入所需库

在开始批量下载图片之前，首先需要确保安装了必要的库。最常用的库包括requests、os、concurrent.futures和Pillow。可以使用以下命令安装：

pip install requests pillow

二、创建函数以下载单个图像

为了下载图像，我们可以创建一个函数，该函数使用requests库发送HTTP请求，并将图像数据写入本地文件。如果下载失败，我们也需要处理异常。

import requests
import os
def download_image(url, folder_path):
    try:
        response = requests.get(url, stream=True)
        if response.status_code == 200:
            file_name = os.path.join(folder_path, url.split("/")[-1])
            with open(file_name, 'wb') as file:
                for chunk in response:
                    file.write(chunk)
            print(f'Successfully downloaded {file_name}')
        else:
            print(f'FAIled to download {url}')
    except Exception as e:
        print(f'An error occurred: {e}')

三、创建文件夹以存储下载的图像

在下载图片之前，我们需要确保本地有一个文件夹来存储这些图片。如果文件夹不存在，我们需要创建它。

def create_folder(folder_path):
    if not os.path.exists(folder_path):
        os.makedirs(folder_path)
        print(f'Created directory: {folder_path}')
    else:
        print(f'Directory already exists: {folder_path}')

四、批量下载图像

利用concurrent.futures库，我们可以并发下载多张图像。这将大大加快下载速度，特别是在处理大量图像时。

from concurrent.futures import ThreadPoolExecutor
def download_images(url_list, folder_path):
    create_folder(folder_path)
    with ThreadPoolExecutor(max_workers=5) as executor:
        futures = [executor.submit(download_image, url, folder_path) for url in url_list]
        for future in futures:
            future.result()

五、验证下载的图像

有时下载的文件可能不是有效的图像文件。我们可以使用Pillow库中的Image模块来验证图像。

from PIL import Image
def validate_image(file_path):
    try:
        with Image.open(file_path) as img:
            img.verify()
        print(f'Image is valid: {file_path}')
    except (IOError, SyntaxError) as e:
        print(f'Invalid image: {file_path}, error: {e}')

六、完整示例

下面是一个完整的示例，将所有步骤结合在一起：

import requests
import os
from concurrent.futures import ThreadPoolExecutor
from PIL import Image
def download_image(url, folder_path):
    try:
        response = requests.get(url, stream=True)
        if response.status_code == 200:
            file_name = os.path.join(folder_path, url.split("/")[-1])
            with open(file_name, 'wb') as file:
                for chunk in response:
                    file.write(chunk)
            print(f'Successfully downloaded {file_name}')
            validate_image(file_name)
        else:
            print(f'Failed to download {url}')
    except Exception as e:
        print(f'An error occurred: {e}')
def create_folder(folder_path):
    if not os.path.exists(folder_path):
        os.makedirs(folder_path)
        print(f'Created directory: {folder_path}')
    else:
        print(f'Directory already exists: {folder_path}')
def download_images(url_list, folder_path):
    create_folder(folder_path)
    with ThreadPoolExecutor(max_workers=5) as executor:
        futures = [executor.submit(download_image, url, folder_path) for url in url_list]
        for future in futures:
            future.result()
def validate_image(file_path):
    try:
        with Image.open(file_path) as img:
            img.verify()
        print(f'Image is valid: {file_path}')
    except (IOError, SyntaxError) as e:
        print(f'Invalid image: {file_path}, error: {e}')
示例用法
url_list = [
    'https://example.com/image1.jpg',
    'https://example.com/image2.jpg',
    'https://example.com/image3.jpg'
]
folder_path = 'downloaded_images'
download_images(url_list, folder_path)