如何用python批量下载百度图片

使用Python批量下载百度图片的方法有很多种，包括使用第三方库，如requests和BeautifulSoup，以及使用自动化工具，如selenium。你需要首先获取图片的URL，然后使用HTTP请求下载图片文件。在这里，我们将使用requests和BeautifulSoup来演示如何批量下载百度图片。通过爬取百度图片搜索页面来获取图片的URL，然后将这些图片保存到本地。

要使用Python批量下载百度图片，首先需要掌握基本的网页爬虫技术，以及了解百度图片搜索结果页面的结构。以下是详细的步骤和示例代码：

一、准备工作

在开始之前，你需要安装一些必要的Python库。你可以使用以下命令来安装这些库：

pip install requests pip install beautifulsoup4

requests库用于发送HTTP请求，BeautifulSoup库用于解析HTML内容。

二、发送搜索请求并解析HTML

首先，我们需要发送一个搜索请求到百度图片，并获取搜索结果页面的HTML内容。然后，我们可以使用BeautifulSoup来解析这些HTML内容，并提取图片的URL。

import requests
from bs4 import BeautifulSoup
def fetch_image_urls(query, num_images):
    headers = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'
    }
    search_url = f"https://image.baidu.com/search/acjson?tn=resultjson_com&ipn=rj&word={query}&pn=0&rn={num_images}"
    response = requests.get(search_url, headers=headers)
    response.raise_for_status()
    return response.json()
def parse_image_urls(json_data):
    image_urls = []
    for item in json_data.get('data', []):
        if 'thumbURL' in item:
            image_urls.append(item['thumbURL'])
    return image_urls
query = "猫"
num_images = 10
json_data = fetch_image_urls(query, num_images)
image_urls = parse_image_urls(json_data)
print(image_urls)

三、下载图片

获取图片的URL后，我们可以使用requests库来下载这些图片，并将它们保存到本地文件系统中。

import os
def download_image(url, save_dir):
    response = requests.get(url)
    response.raise_for_status()
    file_name = os.path.join(save_dir, url.split('/')[-1])
    with open(file_name, 'wb') as file:
        file.write(response.content)
save_dir = "images"
os.makedirs(save_dir, exist_ok=True)
for url in image_urls:
    try:
        download_image(url, save_dir)
        print(f"Downloaded {url}")
    except Exception as e:
        print(f"Failed to download {url}: {e}")

四、完整代码

以下是完整的Python代码，用于批量下载百度图片：

import requests
from bs4 import BeautifulSoup
import os
def fetch_image_urls(query, num_images):
    headers = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'
    }
    search_url = f"https://image.baidu.com/search/acjson?tn=resultjson_com&ipn=rj&word={query}&pn=0&rn={num_images}"
    response = requests.get(search_url, headers=headers)
    response.raise_for_status()
    return response.json()
def parse_image_urls(json_data):
    image_urls = []
    for item in json_data.get('data', []):
        if 'thumbURL' in item:
            image_urls.append(item['thumbURL'])
    return image_urls
def download_image(url, save_dir):
    response = requests.get(url)
    response.raise_for_status()
    file_name = os.path.join(save_dir, url.split('/')[-1])
    with open(file_name, 'wb') as file:
        file.write(response.content)
def main(query, num_images, save_dir):
    os.makedirs(save_dir, exist_ok=True)
    json_data = fetch_image_urls(query, num_images)
    image_urls = parse_image_urls(json_data)
    for url in image_urls:
        try:
            download_image(url, save_dir)
            print(f"Downloaded {url}")
        except Exception as e:
            print(f"Failed to download {url}: {e}")
if __name__ == "__main__":
    query = "猫"
    num_images = 10
    save_dir = "images"
    main(query, num_images, save_dir)

五、注意事项

遵守网站的robots.txt规则和使用条款：在进行网页爬虫操作时，请务必遵守网站的robots.txt规则和使用条款，以避免对服务器造成不必要的负担。
设置适当的延迟：在爬取网页时，添加适当的延迟，以避免频繁的请求对服务器造成压力。
处理异常情况：在下载图片时，需要处理各种可能的异常情况，例如网络连接问题、图片URL无效等。

通过以上步骤，你可以使用Python批量下载百度图片。这个过程涉及到发送HTTP请求、解析HTML内容、提取图片URL以及下载图片文件。希望这篇文章能对你有所帮助，让你更好地理解和应用Python进行网页爬虫和图片下载。