python如何增快读写文件的速度

使用多线程或多进程、优化文件读取方式、减少磁盘 I/O 操作、增加缓存或使用内存映射文件。在这些方法中，使用内存映射文件是一种极为有效的方式，可以显著提高文件读写速度。内存映射文件（Memory-Mapped Files）允许我们将文件的内容映射到内存中，从而大大加快对文件的读写操作。通过使用 mmap 模块，我们可以直接在内存中访问文件的内容，从而避免频繁的磁盘 I/O 操作，提高性能。

一、使用多线程或多进程

多线程

使用多线程可以在一定程度上提高文件读写的速度，特别是当文件操作不是单一的顺序操作时。Python 的 threading 模块提供了方便的多线程支持。虽然 Python 的全局解释器锁（GIL）在某些情况下限制了多线程的性能，但对于 I/O 密集型操作，多线程仍然是有帮助的。

import threading
def read_file(file_path):
    with open(file_path, 'r') as file:
        return file.read()
def write_file(file_path, data):
    with open(file_path, 'w') as file:
        file.write(data)
thread1 = threading.Thread(target=read_file, args=('file1.txt',))
thread2 = threading.Thread(target=write_file, args=('file2.txt', 'some data'))
thread1.start()
thread2.start()
thread1.join()
thread2.join()

多进程

相比多线程，多进程可以有效地避免 GIL 的限制，从而更好地利用多核 CPU 的优势。Python 提供了 multiprocessing 模块来支持多进程编程。

import multiprocessing
def read_file(file_path):
    with open(file_path, 'r') as file:
        return file.read()
def write_file(file_path, data):
    with open(file_path, 'w') as file:
        file.write(data)
process1 = multiprocessing.Process(target=read_file, args=('file1.txt',))
process2 = multiprocessing.Process(target=write_file, args=('file2.txt', 'some data'))
process1.start()
process2.start()
process1.join()
process2.join()

二、优化文件读取方式

使用合适的缓冲区大小

在读取大文件时，使用合适的缓冲区大小可以显著提高读取速度。通过调整缓冲区大小，可以减少系统调用的次数，从而提高效率。

buffer_size = 1024 * 1024  # 1MB
with open('large_file.txt', 'r', buffering=buffer_size) as file:
    while True:
        chunk = file.read(buffer_size)
        if not chunk:
            break
        # Process the chunk

使用 `with` 语句

使用 with 语句打开文件可以确保文件在使用后自动关闭，从而避免资源泄漏，提升性能。

with open('file.txt', 'r') as file:
    data = file.read()

三、减少磁盘 I/O 操作

批量读写

尽量减少对磁盘的频繁读写操作，通过批量处理数据可以显著提升性能。例如，可以将多个写操作合并为一次写操作。

data_to_write = []
for i in range(1000):
    data_to_write.append(f"Line {i}\n")
with open('output.txt', 'w') as file:
    file.writelines(data_to_write)

内存缓存

将数据缓存到内存中，避免频繁的磁盘 I/O 操作。可以使用 Python 的 io 模块来实现内存缓存。

import io
buffer = io.StringIO()
for i in range(1000):
    buffer.write(f"Line {i}\n")
with open('output.txt', 'w') as file:
    file.write(buffer.getvalue())

四、增加缓存或使用内存映射文件

内存映射文件

内存映射文件允许将文件的内容映射到内存中，从而大大加快文件的读写速度。Python 提供了 mmap 模块来支持内存映射文件。

import mmap
with open('large_file.txt', 'r+b') as f:
    mmapped_file = mmap.mmap(f.fileno(), 0)
    print(mmapped_file.read(100))  # Read first 100 bytes
    mmapped_file.close()

使用 `lru_cache`

对于重复读取的文件，可以使用 functools.lru_cache 来缓存读取结果，从而提高性能。

from functools import lru_cache
@lru_cache(maxsize=128)
def read_file(file_path):
    with open(file_path, 'r') as file:
        return file.read()
data = read_file('file.txt')