python 如何写软件下载

Python写软件下载的方法有很多种，常见的方法包括：使用标准库提供的HTTP请求模块（如urllib、requests）、使用第三方库（如wget）、使用多线程或异步编程提高下载速度等。其中，requests库因其简洁和易用性，最为常用。下面我们详细介绍如何使用requests库实现软件下载。

首先，我们需要确保已经安装了requests库。如果还没有安装，可以使用以下命令进行安装：

pip install requests

使用requests库下载文件

使用requests库下载文件的基本步骤如下：

导入requests库
发送HTTP请求
保存文件

以下是一个简单的示例代码：

import requests
url = 'http://example.com/somefile.zip'
local_filename = 'somefile.zip'
with requests.get(url, stream=True) as r:
    r.raise_for_status()
    with open(local_filename, 'wb') as f:
        for chunk in r.iter_content(chunk_size=8192):
            f.write(chunk)

这个代码会从指定的URL下载文件，并将其保存到本地。我们使用stream=True参数来逐块读取文件内容，并将其写入本地文件中。这样可以避免一次性将整个文件加载到内存中，适用于大文件下载。

一、使用requests库下载文件

1、基本使用方法

如前所述，requests库提供了一种简洁易用的方式来发送HTTP请求并处理响应。对于文件下载，我们可以使用requests.get()方法并设置stream=True来实现流式下载。

以下是一个更详细的示例代码：

import requests
def download_file(url, local_filename):
    try:
        with requests.get(url, stream=True) as r:
            r.raise_for_status()
            with open(local_filename, 'wb') as f:
                for chunk in r.iter_content(chunk_size=8192):
                    if chunk:  # filter out keep-alive new chunks
                        f.write(chunk)
        print(f"File downloaded successfully: {local_filename}")
    except requests.exceptions.HTTPError as http_err:
        print(f"HTTP error occurred: {http_err}")
    except Exception as err:
        print(f"Other error occurred: {err}")
url = 'http://example.com/somefile.zip'
local_filename = 'somefile.zip'
download_file(url, local_filename)

在这个示例中，我们定义了一个download_file函数，并在其中处理了可能出现的HTTP错误和其他异常情况。这样可以提高代码的鲁棒性，确保在出现错误时能够及时捕获并处理。

2、支持断点续传

对于大文件下载，断点续传是一个非常重要的功能。断点续传可以在下载过程中出现网络问题时，从中断的位置继续下载，而不是重新开始下载整个文件。我们可以通过设置HTTP头部中的Range字段来实现断点续传。

以下是支持断点续传的示例代码：

import os
import requests
def download_file_with_resume(url, local_filename):
    headers = {}
    if os.path.exists(local_filename):
        existing_file_size = os.path.getsize(local_filename)
        headers['Range'] = f'bytes={existing_file_size}-'
    else:
        existing_file_size = 0
    try:
        with requests.get(url, headers=headers, stream=True) as r:
            r.raise_for_status()
            with open(local_filename, 'ab') as f:
                for chunk in r.iter_content(chunk_size=8192):
                    if chunk:
                        f.write(chunk)
        print(f"File downloaded successfully: {local_filename}")
    except requests.exceptions.HTTPError as http_err:
        print(f"HTTP error occurred: {http_err}")
    except Exception as err:
        print(f"Other error occurred: {err}")
url = 'http://example.com/somefile.zip'
local_filename = 'somefile.zip'
download_file_with_resume(url, local_filename)

在这个示例中，我们首先检查本地文件是否存在以及其大小。如果文件已经存在，我们会在HTTP请求头中设置Range字段，指定从已有文件大小的位置开始下载。然后，我们使用'ab'模式打开文件，以追加方式写入新下载的内容。

二、使用多线程提高下载速度

对于大文件下载，单线程下载速度可能会比较慢。我们可以使用多线程下载来提高下载速度。多线程下载的基本思路是将文件分成多个部分，并行下载各个部分，最后将所有部分合并成一个完整的文件。

1、基本原理

我们可以使用Python的threading模块来实现多线程下载。具体步骤如下：

获取文件大小
计算每个线程需要下载的文件部分
创建多个线程并行下载
将所有部分合并成一个完整文件

以下是一个示例代码：

import os
import threading
import requests
def download_chunk(url, start, end, local_filename):
    headers = {'Range': f'bytes={start}-{end}'}
    response = requests.get(url, headers=headers, stream=True)
    with open(local_filename, 'r+b') as f:
        f.seek(start)
        f.write(response.content)
def download_file_multithreaded(url, local_filename, num_threads=4):
    response = requests.head(url)
    file_size = int(response.headers['Content-Length'])
    chunk_size = file_size // num_threads
    with open(local_filename, 'wb') as f:
        f.truncate(file_size)
    threads = []
    for i in range(num_threads):
        start = i * chunk_size
        end = file_size - 1 if i == num_threads - 1 else (i + 1) * chunk_size - 1
        thread = threading.Thread(target=download_chunk, args=(url, start, end, local_filename))
        threads.append(thread)
        thread.start()
    for thread in threads:
        thread.join()
    print(f"File downloaded successfully: {local_filename}")
url = 'http://example.com/somefile.zip'
local_filename = 'somefile.zip'
download_file_multithreaded(url, local_filename)

在这个示例中，我们首先通过HTTP HEAD请求获取文件大小，然后计算每个线程需要下载的文件部分。我们创建多个线程并行下载各个部分，并使用seek方法将下载的内容写入文件的指定位置。最后，我们等待所有线程完成下载，并打印下载成功的消息。

2、处理异常情况

在多线程下载过程中，可能会遇到各种异常情况，如网络中断、服务器错误等。为了提高代码的鲁棒性，我们需要在下载过程中处理这些异常。

以下是处理异常情况的示例代码：

import os
import threading
import requests
def download_chunk(url, start, end, local_filename):
    headers = {'Range': f'bytes={start}-{end}'}
    try:
        response = requests.get(url, headers=headers, stream=True)
        response.raise_for_status()
        with open(local_filename, 'r+b') as f:
            f.seek(start)
            f.write(response.content)
    except requests.exceptions.HTTPError as http_err:
        print(f"HTTP error occurred: {http_err}")
    except Exception as err:
        print(f"Other error occurred: {err}")
def download_file_multithreaded(url, local_filename, num_threads=4):
    response = requests.head(url)
    file_size = int(response.headers['Content-Length'])
    chunk_size = file_size // num_threads
    with open(local_filename, 'wb') as f:
        f.truncate(file_size)
    threads = []
    for i in range(num_threads):
        start = i * chunk_size
        end = file_size - 1 if i == num_threads - 1 else (i + 1) * chunk_size - 1
        thread = threading.Thread(target=download_chunk, args=(url, start, end, local_filename))
        threads.append(thread)
        thread.start()
    for thread in threads:
        thread.join()
    print(f"File downloaded successfully: {local_filename}")
url = 'http://example.com/somefile.zip'
local_filename = 'somefile.zip'
download_file_multithreaded(url, local_filename)

在这个示例中，我们在download_chunk函数中添加了异常处理代码，确保在出现HTTP错误或其他异常时能够及时捕获并处理。这样可以提高代码的鲁棒性，确保在各种异常情况下能够正常工作。

三、使用异步编程提高下载速度

除了多线程下载，我们还可以使用异步编程来提高下载速度。Python的asyncio库提供了一种高效的异步编程方式，可以在单线程中实现并发操作，从而提高下载速度。

1、基本原理

异步编程的基本思路是使用协程来执行异步操作。我们可以使用aiohttp库来发送异步HTTP请求，并使用asyncio库来管理协程的调度。

以下是一个示例代码：

import os
import aiohttp
import asyncio
async def download_chunk(session, url, start, end, local_filename):
    headers = {'Range': f'bytes={start}-{end}'}
    async with session.get(url, headers=headers) as response:
        response.raise_for_status()
        with open(local_filename, 'r+b') as f:
            f.seek(start)
            while True:
                chunk = await response.content.read(8192)
                if not chunk:
                    break
                f.write(chunk)
async def download_file_async(url, local_filename, num_threads=4):
    async with aiohttp.ClientSession() as session:
        async with session.head(url) as response:
            file_size = int(response.headers['Content-Length'])
        chunk_size = file_size // num_threads
        with open(local_filename, 'wb') as f:
            f.truncate(file_size)
        tasks = []
        for i in range(num_threads):
            start = i * chunk_size
            end = file_size - 1 if i == num_threads - 1 else (i + 1) * chunk_size - 1
            task = download_chunk(session, url, start, end, local_filename)
            tasks.append(asyncio.create_task(task))
        await asyncio.gather(*tasks)
    print(f"File downloaded successfully: {local_filename}")
url = 'http://example.com/somefile.zip'
local_filename = 'somefile.zip'
asyncio.run(download_file_async(url, local_filename))

在这个示例中，我们使用aiohttp库发送异步HTTP请求，并使用asyncio库管理协程的调度。我们首先通过HTTP HEAD请求获取文件大小，然后计算每个协程需要下载的文件部分。我们创建多个协程并行下载各个部分，并使用seek方法将下载的内容写入文件的指定位置。最后，我们等待所有协程完成下载，并打印下载成功的消息。

2、处理异常情况

在异步下载过程中，我们同样需要处理各种异常情况，如网络中断、服务器错误等。为了提高代码的鲁棒性，我们需要在下载过程中处理这些异常。

以下是处理异常情况的示例代码：

import os
import aiohttp
import asyncio
async def download_chunk(session, url, start, end, local_filename):
    headers = {'Range': f'bytes={start}-{end}'}
    try:
        async with session.get(url, headers=headers) as response:
            response.raise_for_status()
            with open(local_filename, 'r+b') as f:
                f.seek(start)
                while True:
                    chunk = await response.content.read(8192)
                    if not chunk:
                        break
                    f.write(chunk)
    except aiohttp.ClientResponseError as http_err:
        print(f"HTTP error occurred: {http_err}")
    except Exception as err:
        print(f"Other error occurred: {err}")
async def download_file_async(url, local_filename, num_threads=4):
    async with aiohttp.ClientSession() as session:
        async with session.head(url) as response:
            file_size = int(response.headers['Content-Length'])
        chunk_size = file_size // num_threads
        with open(local_filename, 'wb') as f:
            f.truncate(file_size)
        tasks = []
        for i in range(num_threads):
            start = i * chunk_size
            end = file_size - 1 if i == num_threads - 1 else (i + 1) * chunk_size - 1
            task = download_chunk(session, url, start, end, local_filename)
            tasks.append(asyncio.create_task(task))
        await asyncio.gather(*tasks)
    print(f"File downloaded successfully: {local_filename}")
url = 'http://example.com/somefile.zip'
local_filename = 'somefile.zip'
asyncio.run(download_file_async(url, local_filename))

在这个示例中，我们在download_chunk协程中添加了异常处理代码，确保在出现HTTP错误或其他异常时能够及时捕获并处理。这样可以提高代码的鲁棒性，确保在各种异常情况下能够正常工作。

四、使用第三方库下载文件

除了requests库和aiohttp库，我们还可以使用一些专门用于下载文件的第三方库，如wget库。wget库提供了一种更为简便的方式来下载文件，适用于一些简单的下载场景。

1、安装wget库

首先，我们需要安装wget库。如果还没有安装，可以使用以下命令进行安装：

pip install wget

2、使用wget库下载文件

使用wget库下载文件非常简单，只需要调用wget.download()方法即可。以下是一个示例代码：

import wget
url = 'http://example.com/somefile.zip'
local_filename = 'somefile.zip'
wget.download(url, local_filename)
print(f"File downloaded successfully: {local_filename}")

在这个示例中，我们只需要调用wget.download()方法，并传入URL和本地文件名，即可完成文件下载。wget库会自动处理HTTP请求、断点续传等细节，使得代码更加简洁。

3、处理异常情况

虽然wget库在大多数情况下都能正常工作，但我们仍然需要处理一些可能出现的异常情况，如网络中断、服务器错误等。以下是处理异常情况的示例代码：

import wget
def download_file(url, local_filename):
    try:
        wget.download(url, local_filename)
        print(f"File downloaded successfully: {local_filename}")
    except Exception as err:
        print(f"Error occurred: {err}")
url = 'http://example.com/somefile.zip'
local_filename = 'somefile.zip'
download_file(url, local_filename)