python 如何从网上抓图片

Python可以通过requests库、BeautifulSoup库、以及Pillow库来从网上抓取图片、处理图片、保存图片。requests库用于发送HTTP请求、获取网页内容；BeautifulSoup库用于解析HTML文档、提取所需内容；Pillow库用于处理和保存图片。在本文中，我们将详细介绍如何使用这些库来完成从网上抓取图片的任务。

一、安装必要的库

在开始之前，我们需要安装必要的Python库。如果您还没有安装这些库，可以使用以下命令进行安装：

pip install requests pip install beautifulsoup4 pip install pillow

二、发送HTTP请求并获取网页内容

首先，我们需要使用requests库发送HTTP请求并获取目标网页的内容。

import requests
url = 'https://example.com'  # 替换为目标网页的URL
response = requests.get(url)
if response.status_code == 200:
    html_content = response.text
else:
    print(f"Failed to retrieve content. Status code: {response.status_code}")

在这个代码片段中，我们发送了一个GET请求到目标网页，并检查响应的状态码。如果状态码为200，表示请求成功，我们可以获取网页的HTML内容。

三、解析HTML文档并提取图片URL

接下来，我们使用BeautifulSoup库来解析获取的HTML文档，并提取所有图片的URL。

from bs4 import BeautifulSoup
soup = BeautifulSoup(html_content, 'html.parser')
img_tags = soup.find_all('img')
img_urls = []
for img in img_tags:
    img_url = img.get('src')
    if img_url:
        img_urls.append(img_url)

在这个代码片段中，我们使用BeautifulSoup解析HTML文档，并找到所有的<img>标签。然后，我们遍历这些<img>标签，并提取其src属性的值，这些值就是图片的URL。

四、下载并保存图片

现在，我们已经获取了所有图片的URL，接下来我们使用requests库下载这些图片，并使用Pillow库保存它们。

from PIL import Image
from io import BytesIO
for i, img_url in enumerate(img_urls):
    response = requests.get(img_url)
    if response.status_code == 200:
        img_data = response.content
        img = Image.open(BytesIO(img_data))
        img.save(f'image_{i}.jpg')
    else:
        print(f"Failed to download image. Status code: {response.status_code}")

在这个代码片段中，我们遍历所有图片的URL，并发送GET请求下载图片数据。如果请求成功，我们使用Pillow库将图片数据保存为文件。

五、处理图片

除了下载和保存图片，我们还可以使用Pillow库对图片进行处理。比如，我们可以调整图片的大小、裁剪图片、添加水印等。

调整图片大小

img = Image.open('image_0.jpg')
resized_img = img.resize((200, 200))  # 调整图片大小为200x200像素
resized_img.save('resized_image_0.jpg')

裁剪图片

img = Image.open('image_0.jpg')
cropped_img = img.crop((100, 100, 400, 400))  # 裁剪图片，左上角为(100, 100)，右下角为(400, 400)
cropped_img.save('cropped_image_0.jpg')

添加水印

from PIL import ImageDraw, ImageFont
img = Image.open('image_0.jpg')
draw = ImageDraw.Draw(img)
font = ImageFont.truetype("arial.ttf", 36)
draw.text((10, 10), "Watermark", font=font, fill=(255, 255, 255, 128))
img.save('watermarked_image_0.jpg')

六、处理异常情况

在实际操作中，我们可能会遇到一些异常情况，比如网络连接失败、图片下载失败、图片格式不支持等。我们需要处理这些异常情况，以保证程序的健壮性。

import os
for i, img_url in enumerate(img_urls):
    try:
        response = requests.get(img_url, timeout=10)
        response.raise_for_status()
        img_data = response.content
        img = Image.open(BytesIO(img_data))
        img.save(f'image_{i}.jpg')
    except requests.exceptions.RequestException as e:
        print(f"Failed to download image. URL: {img_url}, Error: {e}")
    except IOError as e:
        print(f"Failed to save image. URL: {img_url}, Error: {e}")

在这个代码片段中，我们使用了异常处理机制来捕获和处理可能出现的异常情况。

七、进一步优化

在处理大量图片时，可以考虑使用多线程或异步编程来加快下载速度。

使用多线程

import threading
def download_image(img_url, i):
    try:
        response = requests.get(img_url, timeout=10)
        response.raise_for_status()
        img_data = response.content
        img = Image.open(BytesIO(img_data))
        img.save(f'image_{i}.jpg')
    except requests.exceptions.RequestException as e:
        print(f"Failed to download image. URL: {img_url}, Error: {e}")
    except IOError as e:
        print(f"Failed to save image. URL: {img_url}, Error: {e}")
threads = []
for i, img_url in enumerate(img_urls):
    thread = threading.Thread(target=download_image, args=(img_url, i))
    threads.append(thread)
    thread.start()
for thread in threads:
    thread.join()

使用异步编程

import aiohttp
import asyncio
async def download_image(session, img_url, i):
    try:
        async with session.get(img_url) as response:
            if response.status == 200:
                img_data = await response.read()
                img = Image.open(BytesIO(img_data))
                img.save(f'image_{i}.jpg')
            else:
                print(f"Failed to download image. Status code: {response.status}")
    except aiohttp.ClientError as e:
        print(f"Failed to download image. URL: {img_url}, Error: {e}")
    except IOError as e:
        print(f"Failed to save image. URL: {img_url}, Error: {e}")
async def main():
    async with aiohttp.ClientSession() as session:
        tasks = [download_image(session, img_url, i) for i, img_url in enumerate(img_urls)]
        await asyncio.gather(*tasks)
asyncio.run(main())

八、总结

在本文中，我们详细介绍了如何使用Python从网上抓取图片。我们首先介绍了如何安装必要的库，然后演示了如何发送HTTP请求并获取网页内容，解析HTML文档并提取图片URL，下载并保存图片，以及处理图片。我们还介绍了如何处理异常情况，并提供了一些优化方法，如使用多线程和异步编程。通过这些步骤，您可以轻松地从网上抓取图片，并对图片进行处理和保存。