python 如何下载图片格式

Python 下载图片格式可以使用多种方法，包括requests库、urllib库、以及Pillow库等。

requests库是最常用的下载图片工具之一。它简单易用，能够处理大多数HTTP请求。下面将详细介绍如何使用requests库下载图片。

一、使用requests库下载图片

使用requests库下载图片的步骤如下：

安装requests库：首先需要确保已安装requests库。如果没有安装，可以使用以下命令进行安装：
```
pip install requests
```

编写下载代码：

import requests
def download_image(url, file_path):
    try:
        response = requests.get(url)
        response.raise_for_status()  # 检查请求是否成功
        with open(file_path, 'wb') as f:
            f.write(response.content)
        print(f"Image downloaded successfully: {file_path}")
    except requests.exceptions.RequestException as e:
        print(f"Error downloading image: {e}")
示例用法
image_url = 'https://example.com/image.jpg'
save_path = 'downloaded_image.jpg'
download_image(image_url, save_path)

以上代码中，通过requests.get()方法发送HTTP GET请求获取图片内容，并将其以二进制模式写入本地文件。

二、使用urllib库下载图片

urllib库是Python标准库的一部分，不需要额外安装。它也可以用于下载图片。以下是使用urllib库下载图片的方法：

导入urllib库：

import urllib.request
def download_image(url, file_path):
    try:
        urllib.request.urlretrieve(url, file_path)
        print(f"Image downloaded successfully: {file_path}")
    except Exception as e:
        print(f"Error downloading image: {e}")
示例用法
image_url = 'https://example.com/image.jpg'
save_path = 'downloaded_image.jpg'
download_image(image_url, save_path)

使用urllib.request.urlretrieve()方法直接从URL中下载图片并保存到本地。

三、使用Pillow库处理图片

Pillow库不仅可以用于下载图片，还可以对图片进行处理。Pillow是Python图像处理库PIL的一个分支，提供了强大的图像处理功能。

安装Pillow库：
```
pip install pillow
```

编写下载和处理图片的代码：

from PIL import Image
import requests
from io import BytesIO
def download_and_process_image(url, file_path):
    try:
        response = requests.get(url)
        response.raise_for_status()
        image = Image.open(BytesIO(response.content))
        image.save(file_path)
        print(f"Image downloaded and processed successfully: {file_path}")
    except requests.exceptions.RequestException as e:
        print(f"Error downloading image: {e}")
    except Exception as e:
        print(f"Error processing image: {e}")
示例用法
image_url = 'https://example.com/image.jpg'
save_path = 'processed_image.jpg'
download_and_process_image(image_url, save_path)

在这个例子中，通过requests库获取图片内容，并使用Pillow库对其进行处理和保存。

四、使用aiohttp库实现异步下载

对于需要同时下载大量图片的场景，使用异步下载可以显著提高效率。aiohttp库是Python中常用的异步HTTP客户端库。

安装aiohttp库：
```
pip install aiohttp
```

编写异步下载代码：

import aiohttp
import asyncio
import os
async def download_image(session, url, file_path):
    try:
        async with session.get(url) as response:
            response.raise_for_status()
            with open(file_path, 'wb') as f:
                f.write(await response.read())
            print(f"Image downloaded successfully: {file_path}")
    except aiohttp.ClientError as e:
        print(f"Error downloading image: {e}")
async def main(urls, save_dir):
    async with aiohttp.ClientSession() as session:
        tasks = []
        for i, url in enumerate(urls):
            file_path = os.path.join(save_dir, f'image_{i}.jpg')
            tasks.append(download_image(session, url, file_path))
        await asyncio.gather(*tasks)
示例用法
image_urls = ['https://example.com/image1.jpg', 'https://example.com/image2.jpg']
save_directory = './downloaded_images'
os.makedirs(save_directory, exist_ok=True)
asyncio.run(main(image_urls, save_directory))

在这个示例中，使用aiohttp.ClientSession()创建一个会话，然后使用asyncio.gather()同时执行多个下载任务。

五、使用Scrapy框架下载图片

Scrapy是一个强大的爬虫框架，可以用于从网站上抓取数据，包括图片。

安装Scrapy：
```
pip install scrapy
```

编写Scrapy爬虫：

创建一个新的Scrapy项目：

scrapy startproject image_downloader

创建一个新的爬虫：

cd image_downloader scrapy genspider imagespider example.com

编辑image_downloader/spiders/imagespider.py文件：

import scrapy
from scrapy.pipelines.images import ImagesPipeline
from scrapy.exceptions import DropItem
class ImageSpider(scrapy.Spider):
    name = 'imagespider'
    start_urls = ['https://example.com']
    def parse(self, response):
        image_urls = response.css('img::attr(src)').getall()
        for url in image_urls:
            yield {'image_urls': [url]}
class MyImagesPipeline(ImagesPipeline):
    def get_media_requests(self, item, info):
        for image_url in item['image_urls']:
            yield scrapy.Request(image_url)
    def file_path(self, request, response=None, info=None):
        image_guid = request.url.split('/')[-1]
        return f'full/{image_guid}'
    def item_completed(self, results, item, info):
        image_paths = [x['path'] for ok, x in results if ok]
        if not image_paths:
            raise DropItem("Item contains no images")
        item['image_paths'] = image_paths
        return item
修改 settings.py 文件，启用 ImagesPipeline
在 settings.py 中添加:
ITEM_PIPELINES = {'image_downloader.spiders.imagespider.MyImagesPipeline': 1}
IMAGES_STORE = 'downloaded_images'

运行爬虫：

scrapy crawl imagespider

在这个示例中，通过Scrapy框架创建一个爬虫，用于从目标网站上抓取图片并保存到本地。

六、使用BeautifulSoup解析网页下载图片

BeautifulSoup是一个强大的网页解析库，通常与requests库结合使用，以便从网页中提取图片链接并下载。

安装BeautifulSoup和requests库：
```
pip install beautifulsoup4 requests
```

编写解析和下载代码：

import requests
from bs4 import BeautifulSoup
import os
def download_image(url, file_path):
    try:
        response = requests.get(url)
        response.raise_for_status()
        with open(file_path, 'wb') as f:
            f.write(response.content)
        print(f"Image downloaded successfully: {file_path}")
    except requests.exceptions.RequestException as e:
        print(f"Error downloading image: {e}")
def download_images_from_page(url, save_dir):
    try:
        response = requests.get(url)
        response.raise_for_status()
        soup = BeautifulSoup(response.content, 'html.parser')
        img_tags = soup.find_all('img')
        os.makedirs(save_dir, exist_ok=True)
        for i, img in enumerate(img_tags):
            img_url = img.get('src')
            if img_url:
                if not img_url.startswith('http'):
                    img_url = os.path.join(url, img_url)
                file_path = os.path.join(save_dir, f'image_{i}.jpg')
                download_image(img_url, file_path)
    except requests.exceptions.RequestException as e:
        print(f"Error fetching page: {e}")
示例用法
page_url = 'https://example.com'
save_directory = './downloaded_images'
download_images_from_page(page_url, save_directory)

在这个示例中，通过BeautifulSoup解析网页内容，提取所有图片链接，然后使用requests库下载这些图片。

七、使用Selenium自动化下载图片

Selenium是一个自动化测试工具，可以用于控制浏览器执行各种操作，包括下载图片。

安装Selenium和浏览器驱动：
```
pip install selenium
```
根据所用浏览器下载相应的驱动程序（如chromedriver）。

编写自动化下载代码：

from selenium import webdriver
from selenium.webdriver.common.by import By
import requests
import os
def download_image(url, file_path):
    try:
        response = requests.get(url)
        response.raise_for_status()
        with open(file_path, 'wb') as f:
            f.write(response.content)
        print(f"Image downloaded successfully: {file_path}")
    except requests.exceptions.RequestException as e:
        print(f"Error downloading image: {e}")
def download_images_with_selenium(url, save_dir):
    driver = webdriver.Chrome(executable_path='path/to/chromedriver')
    driver.get(url)
    img_elements = driver.find_elements(By.TAG_NAME, 'img')
    os.makedirs(save_dir, exist_ok=True)
    for i, img in enumerate(img_elements):
        img_url = img.get_attribute('src')
        if img_url:
            file_path = os.path.join(save_dir, f'image_{i}.jpg')
            download_image(img_url, file_path)
    driver.quit()
示例用法
page_url = 'https://example.com'
save_directory = './downloaded_images'
download_images_with_selenium(page_url, save_directory)

在这个示例中，通过Selenium控制浏览器打开指定页面，获取所有图片元素的链接并下载。

八、使用正则表达式提取图片链接并下载

有时网页结构复杂，使用正则表达式提取图片链接可能更为高效。

编写提取和下载代码：

import re
import requests
import os
def download_image(url, file_path):
    try:
        response = requests.get(url)
        response.raise_for_status()
        with open(file_path, 'wb') as f:
            f.write(response.content)
        print(f"Image downloaded successfully: {file_path}")
    except requests.exceptions.RequestException as e:
        print(f"Error downloading image: {e}")
def download_images_with_regex(url, save_dir):
    try:
        response = requests.get(url)
        response.raise_for_status()
        img_urls = re.findall(r'<img[^>]+src="([^">]+)"', response.text)
        os.makedirs(save_dir, exist_ok=True)
        for i, img_url in enumerate(img_urls):
            if not img_url.startswith('http'):
                img_url = os.path.join(url, img_url)
            file_path = os.path.join(save_dir, f'image_{i}.jpg')
            download_image(img_url, file_path)
    except requests.exceptions.RequestException as e:
        print(f"Error fetching page: {e}")
示例用法
page_url = 'https://example.com'
save_directory = './downloaded_images'
download_images_with_regex(page_url, save_directory)

在这个示例中，通过正则表达式从网页内容中提取所有图片链接，然后使用requests库下载这些图片。

九、使用第三方API接口下载图片

有些网站提供API接口，可以通过API获取图片链接并下载。

编写下载代码：

import requests
import os
def download_image(url, file_path):
    try:
        response = requests.get(url)
        response.raise_for_status()
        with open(file_path, 'wb') as f:
            f.write(response.content)
        print(f"Image downloaded successfully: {file_path}")
    except requests.exceptions.RequestException as e:
        print(f"Error downloading image: {e}")
def download_images_from_api(api_url, save_dir):
    try:
        response = requests.get(api_url)
        response.raise_for_status()
        img_urls = response.json().get('image_urls', [])
        os.makedirs(save_dir, exist_ok=True)
        for i, img_url in enumerate(img_urls):
            file_path = os.path.join(save_dir, f'image_{i}.jpg')
            download_image(img_url, file_path)
    except requests.exceptions.RequestException as e:
        print(f"Error fetching API: {e}")
示例用法
api_url = 'https://api.example.com/get_images'
save_directory = './downloaded_images'
download_images_from_api(api_url, save_directory)

在这个示例中，通过调用API接口获取图片链接列表，然后使用requests库下载这些图片。

十、总结

综上所述，使用requests库、使用urllib库、使用Pillow库、使用aiohttp库实现异步下载、使用Scrapy框架下载图片、使用BeautifulSoup解析网页下载图片、使用Selenium自动化下载图片、使用正则表达式提取图片链接并下载、使用第三方API接口下载图片这些方法都可以用于在Python中下载图片。不同的方法适用于不同的场景，可以根据需求选择合适的方法。

无论选择哪种方法，下载图片时都需要处理网络请求的异常情况，并确保正确保存图片文件。希望这篇文章能帮助你更好地理解和使用Python下载图片的各种方法。