如何用python下载

使用Python下载文件的方法包括：使用requests库下载文件、使用urllib库下载文件、使用wget库下载文件。其中，最为常用和简单的是使用requests库下载文件，因为它具有良好的文档和社区支持。requests库提供了一个简单的API，可以轻松地进行HTTP请求，并处理响应。下面将详细介绍如何使用requests库下载文件。

使用requests库下载文件是Python中最常用的方法之一。首先，你需要确保已经安装了requests库。可以使用以下命令安装：

pip install requests

安装完成后，可以使用以下代码下载文件：

import requests
url = 'http://example.com/file.txt'  # 替换为你的文件URL
response = requests.get(url)
with open('file.txt', 'wb') as file:
    file.write(response.content)

这种方法的优势在于简单易用，并且可以很方便地处理不同的HTTP请求和响应格式。requests库还提供了许多高级功能，例如设置请求头、处理Cookies、认证等，使得它成为一个非常强大的HTTP客户端库。

接下来，我们将详细探讨如何用Python下载文件，涵盖多种方法和注意事项。

一、使用 `REQUESTS` 库下载文件

requests库是Python最流行的HTTP库之一，其简单易用的API使得下载文件变得非常简单。

1. 安装和基本使用

首先，确保安装了requests库：

pip install requests

然后，使用requests.get()方法来请求文件，并将其内容写入本地文件：

import requests
url = 'http://example.com/file.txt'  # 替换为你的文件URL
response = requests.get(url)
with open('file.txt', 'wb') as file:
    file.write(response.content)

2. 处理大文件

对于大文件，建议使用流式下载，以避免占用过多内存：

import requests
url = 'http://example.com/largefile.zip'
with requests.get(url, stream=True) as response:
    with open('largefile.zip', 'wb') as file:
        for chunk in response.iter_content(chunk_size=8192):
            file.write(chunk)

3. 处理HTTP请求头

有时候，需要设置HTTP请求头，例如模拟浏览器请求：

import requests
url = 'http://example.com/file.txt'
headers = {'User-Agent': 'Mozilla/5.0'}
response = requests.get(url, headers=headers)
with open('file.txt', 'wb') as file:
    file.write(response.content)

二、使用 `URLLIB` 库下载文件

urllib是Python内置的库，不需要额外安装，可以用于基本的HTTP请求。

1. 基本使用

使用urllib.request模块下载文件：

import urllib.request
url = 'http://example.com/file.txt'
urllib.request.urlretrieve(url, 'file.txt')

2. 处理异常

在下载过程中，可能会发生网络错误，可以使用try-except块来处理：

import urllib.request
from urllib.error import URLError, HTTPError
url = 'http://example.com/file.txt'
try:
    urllib.request.urlretrieve(url, 'file.txt')
except HTTPError as e:
    print('HTTP error:', e.code)
except URLError as e:
    print('URL error:', e.reason)
except Exception as e:
    print('Unexpected error:', str(e))

三、使用 `WGET` 库下载文件

wget是一个轻量级的Python库，专门用于下载文件。

1. 安装和基本使用

首先，需要安装wget库：

pip install wget

使用wget.download()方法下载文件：

import wget
url = 'http://example.com/file.txt'
filename = wget.download(url)
print(f'Downloaded file: {filename}')

四、使用 `HTTP` 库下载文件

http.client是Python的内置库，可以用于低级别的HTTP请求。

1. 基本使用

使用http.client模块下载文件：

import http.client
conn = http.client.HTTPSConnection("example.com")
conn.request("GET", "/file.txt")
response = conn.getresponse()
with open('file.txt', 'wb') as file:
    file.write(response.read())
conn.close()

五、处理下载过程中的各种问题

1. 网络异常处理

在进行文件下载时，网络问题可能导致下载失败。可以通过捕获异常来处理：

import requests
from requests.exceptions import HTTPError, ConnectionError, Timeout
url = 'http://example.com/file.txt'
try:
    response = requests.get(url, timeout=10)
    response.rAIse_for_status()
    with open('file.txt', 'wb') as file:
        file.write(response.content)
except HTTPError as http_err:
    print(f'HTTP error occurred: {http_err}')
except ConnectionError as conn_err:
    print(f'Connection error occurred: {conn_err}')
except Timeout as timeout_err:
    print(f'Timeout error occurred: {timeout_err}')
except Exception as err:
    print(f'Other error occurred: {err}')

2. 断点续传

对于大文件下载，可能需要支持断点续传。可以通过设置HTTP头中的Range字段来实现：

import requests
url = 'http://example.com/largefile.zip'
headers = {'Range': 'bytes=1000-'}  # 从第1000字节开始下载
response = requests.get(url, headers=headers, stream=True)
with open('largefile.zip', 'ab') as file:  # 使用'ab'模式追加写入
    for chunk in response.iter_content(chunk_size=8192):
        file.write(chunk)

3. 使用代理下载

在一些网络环境中，可能需要通过代理服务器下载文件，可以通过设置代理参数来实现：

import requests
url = 'http://example.com/file.txt'
proxies = {
    'http': 'http://10.10.1.10:3128',
    'https': 'http://10.10.1.10:1080',
}
response = requests.get(url, proxies=proxies)
with open('file.txt', 'wb') as file:
    file.write(response.content)

六、总结

使用Python下载文件的方法多种多样，从简单的requests库到内置的urllib和http.client，再到专用的wget库，每种方法都有其适用的场景和优缺点。在实际应用中，可以根据具体的需求选择合适的方法。例如，requests库是一个通用的选择，适用于大多数HTTP请求场景；而对于需要处理特殊协议或更低级别控制的情况，可以考虑使用http.client或其他第三方库。

在下载过程中，需要注意处理可能出现的网络异常、支持断点续传、使用代理等问题，以确保下载过程的稳定性和可靠性。通过合理选择和组合这些方法，可以高效地完成文件下载任务。