python如何下载文件到创建的文件夹

要在Python中下载文件到创建的文件夹，可以使用requests库进行文件下载和os库进行文件夹管理、requests库、os库、确保文件夹存在、使用requests库下载文件。

在Python中下载文件并将其保存到指定的文件夹是一个常见的任务，特别是在数据采集和自动化工作流中。首先，你需要确保目标文件夹存在，如果不存在则需要创建它。接着，使用requests库来下载文件并保存到指定文件夹中。

一、导入所需的库

在开始之前，确保你已经安装了requests库。如果没有安装，可以使用以下命令进行安装：

pip install requests

然后在你的Python脚本中导入所需的库：

import requests
import os

二、创建目标文件夹

在下载文件之前，需要检查目标文件夹是否存在，如果不存在则创建它：

def create_folder(folder_path):
    if not os.path.exists(folder_path):
        os.makedirs(folder_path)
        print(f"Folder '{folder_path}' created successfully.")
    else:
        print(f"Folder '{folder_path}' already exists.")

三、下载文件并保存到指定文件夹

使用requests库下载文件并将其保存到目标文件夹中：

def download_file(url, folder_path, file_name):
    response = requests.get(url)
    if response.status_code == 200:
        file_path = os.path.join(folder_path, file_name)
        with open(file_path, 'wb') as file:
            file.write(response.content)
            print(f"File '{file_name}' downloaded successfully and saved to '{folder_path}'.")
    else:
        print(f"Failed to download file from {url}. Status code: {response.status_code}")

四、综合示例

将上述步骤综合在一起，构建一个完整的示例：

import requests
import os
def create_folder(folder_path):
    if not os.path.exists(folder_path):
        os.makedirs(folder_path)
        print(f"Folder '{folder_path}' created successfully.")
    else:
        print(f"Folder '{folder_path}' already exists.")
def download_file(url, folder_path, file_name):
    response = requests.get(url)
    if response.status_code == 200:
        file_path = os.path.join(folder_path, file_name)
        with open(file_path, 'wb') as file:
            file.write(response.content)
            print(f"File '{file_name}' downloaded successfully and saved to '{folder_path}'.")
    else:
        print(f"Failed to download file from {url}. Status code: {response.status_code}")
if __name__ == "__main__":
    folder_path = "downloads"
    create_folder(folder_path)
    url = "https://example.com/sample.txt"
    file_name = "sample.txt"
    download_file(url, folder_path, file_name)

五、处理大文件下载

对于较大的文件，直接使用requests.get可能会导致内存问题。可以使用流式下载来解决这个问题：

def download_large_file(url, folder_path, file_name):
    response = requests.get(url, stream=True)
    file_path = os.path.join(folder_path, file_name)
    with open(file_path, 'wb') as file:
        for chunk in response.iter_content(chunk_size=8192):
            if chunk:
                file.write(chunk)
                file.flush()
    print(f"Large file '{file_name}' downloaded successfully and saved to '{folder_path}'.")

通过这种方式，可以有效地节省内存，并确保文件下载的稳定性。

六、处理异常和错误

在实际应用中，处理网络异常和错误是非常重要的，下面是一些常见的错误处理方法：

def download_file_with_error_handling(url, folder_path, file_name):
    try:
        response = requests.get(url, timeout=10)
        response.raise_for_status()
        file_path = os.path.join(folder_path, file_name)
        with open(file_path, 'wb') as file:
            file.write(response.content)
            print(f"File '{file_name}' downloaded successfully and saved to '{folder_path}'.")
    except requests.exceptions.RequestException as e:
        print(f"Error downloading file: {e}")

通过这种方式，可以捕获并处理网络请求中的各种异常，确保程序的稳定性和可靠性。

七、多线程下载

对于需要下载多个文件的场景，可以使用多线程来加快下载速度：

import threading
def download_file_thread(url, folder_path, file_name):
    response = requests.get(url)
    if response.status_code == 200:
        file_path = os.path.join(folder_path, file_name)
        with open(file_path, 'wb') as file:
            file.write(response.content)
            print(f"File '{file_name}' downloaded successfully and saved to '{folder_path}'.")
    else:
        print(f"Failed to download file from {url}. Status code: {response.status_code}")
def download_files_in_parallel(urls, folder_path):
    threads = []
    for url in urls:
        file_name = url.split("/")[-1]
        thread = threading.Thread(target=download_file_thread, args=(url, folder_path, file_name))
        threads.append(thread)
        thread.start()
    for thread in threads:
        thread.join()
    print("All files downloaded.")
if __name__ == "__main__":
    folder_path = "downloads"
    create_folder(folder_path)
    urls = [
        "https://example.com/file1.txt",
        "https://example.com/file2.txt",
        "https://example.com/file3.txt"
    ]
    download_files_in_parallel(urls, folder_path)