java如何解析excel文件格式

Java解析Excel文件的关键步骤包括：使用Apache POI库、创建Workbook对象、获取Sheet对象、遍历行和单元格、处理不同的数据类型。以下将详细描述如何使用Java解析Excel文件。

使用Apache POI库：Apache POI是处理Excel文件的最佳Java库之一，支持读取和写入Excel文件。首先，确保你已经添加了POI库的依赖。

<dependency>
    <groupId>org.apache.poi</groupId>
    <artifactId>poi-ooxml</artifactId>
    <version>5.0.0</version>
</dependency>

创建Workbook对象：Workbook是Excel文件的顶层对象，表示整个Excel文件。根据文件的扩展名（.xls或.xlsx），使用不同的类创建Workbook对象。

FileInputStream file = new FileInputStream(new File("path/to/excel/file.xlsx"));
Workbook workbook = new XSSFWorkbook(file);

获取Sheet对象：Sheet表示Excel文件中的一个工作表，可以通过索引或名字获取。

Sheet sheet = workbook.getSheetAt(0);

遍历行和单元格：可以使用迭代器遍历Sheet中的所有行，然后遍历每行中的所有单元格。

for (Row row : sheet) {
    for (Cell cell : row) {
        // 处理单元格数据
    }
}

处理不同的数据类型：Excel单元格可以包含不同类型的数据（字符串、数字、布尔值等），需要根据单元格类型进行处理。

switch (cell.getCellType()) {
    case STRING:
        System.out.println(cell.getStringCellValue());
        break;
    case NUMERIC:
        System.out.println(cell.getNumericCellValue());
        break;
    case BOOLEAN:
        System.out.println(cell.getBooleanCellValue());
        break;
    default:
        System.out.println("Unsupported cell type");
        break;
}

一、添加Apache POI库

Maven依赖

Apache POI库是处理Excel文件的最佳选择之一。首先，你需要在项目的pom.xml文件中添加POI库的依赖。确保添加以下依赖项：

<dependency>
    <groupId>org.apache.poi</groupId>
    <artifactId>poi-ooxml</artifactId>
    <version>5.0.0</version>
</dependency>

手动下载

如果不使用Maven，可以从Apache POI的官方网站手动下载库文件并将其添加到项目的类路径中。确保下载完整的POI库包，包括所有依赖项（如poi-ooxml、poi-ooxml-schemas等）。

二、读取Excel文件

创建Workbook对象

Workbook对象表示整个Excel文件。Apache POI库提供了两种不同的Workbook类，分别用于处理不同格式的Excel文件：

HSSFWorkbook：用于处理Excel 97-2003格式（.xls）。
XSSFWorkbook：用于处理Excel 2007及更高版本格式（.xlsx）。

根据文件的扩展名选择合适的Workbook类：

FileInputStream file = new FileInputStream(new File("path/to/excel/file.xlsx"));
Workbook workbook = new XSSFWorkbook(file);

关闭Workbook对象

在完成文件处理后，务必关闭Workbook对象以释放资源：

workbook.close();
file.close();

三、获取Sheet对象

Sheet对象表示Excel文件中的一个工作表。可以通过Sheet的名字或索引获取：

Sheet sheet = workbook.getSheetAt(0); // 获取第一个工作表
// 或者
Sheet sheet = workbook.getSheet("Sheet1"); // 根据名字获取工作表

四、遍历行和单元格

遍历行

使用Sheet对象的迭代器或for-each循环遍历所有行：

for (Row row : sheet) {
    // 遍历每一行
}

遍历单元格

使用Row对象的迭代器或for-each循环遍历每一行中的所有单元格：

for (Row row : sheet) {
    for (Cell cell : row) {
        // 处理单元格数据
    }
}

五、处理不同的数据类型

Excel单元格可以包含不同类型的数据，如字符串、数字、布尔值等。需要根据单元格的类型进行处理：

for (Row row : sheet) {
    for (Cell cell : row) {
        switch (cell.getCellType()) {
            case STRING:
                System.out.println(cell.getStringCellValue());
                break;
            case NUMERIC:
                System.out.println(cell.getNumericCellValue());
                break;
            case BOOLEAN:
                System.out.println(cell.getBooleanCellValue());
                break;
            default:
                System.out.println("Unsupported cell type");
                break;
        }
    }
}

日期处理

如果单元格包含日期数据，使用DateUtil类进行处理：

if (DateUtil.isCellDateFormatted(cell)) {
    System.out.println(cell.getDateCellValue());
} else {
    System.out.println(cell.getNumericCellValue());
}

六、处理异常

在解析Excel文件时，可能会遇到各种异常，如文件未找到、IO异常等。需要适当的异常处理机制：

try {
    FileInputStream file = new FileInputStream(new File("path/to/excel/file.xlsx"));
    Workbook workbook = new XSSFWorkbook(file);
    // 处理Excel文件
    workbook.close();
    file.close();
} catch (FileNotFoundException e) {
    e.printStackTrace();
} catch (IOException e) {
    e.printStackTrace();
}

七、优化性能

使用SAX解析

对于大型Excel文件，使用SAX解析器可以显著提高性能。Apache POI提供了XSSFReader类，用于基于SAX的解析：

OPCPackage pkg = OPCPackage.open(new File("path/to/excel/file.xlsx"));
XSSFReader reader = new XSSFReader(pkg);
SharedStringsTable sst = reader.getSharedStringsTable();
XMLReader parser = XMLReaderFactory.createXMLReader();
ContentHandler handler = new SheetHandler(sst);
parser.setContentHandler(handler);
InputStream sheet = reader.getSheet("rId1");
parser.parse(new InputSource(sheet));
sheet.close();
pkg.close();

SheetHandler类

需要自定义SheetHandler类来处理SAX事件：

class SheetHandler extends DefaultHandler {
    private SharedStringsTable sst;
    private String lastContents;
    private boolean nextIsString;
    SheetHandler(SharedStringsTable sst) {
        this.sst = sst;
    }
    public void startElement(String uri, String localName, String name, Attributes attributes) {
        if (name.equals("c")) {
            String cellType = attributes.getValue("t");
            nextIsString = cellType != null && cellType.equals("s");
        }
        lastContents = "";
    }
    public void endElement(String uri, String localName, String name) {
        if (nextIsString) {
            int idx = Integer.parseInt(lastContents);
            lastContents = sst.getItemAt(idx).getString();
            nextIsString = false;
        }
        if (name.equals("v")) {
            System.out.println(lastContents);
        }
    }
    public void characters(char[] ch, int start, int length) {
        lastContents += new String(ch, start, length);
    }
}

八、总结

通过以上步骤，已经详细介绍了如何使用Java解析Excel文件。关键步骤包括添加Apache POI库、创建Workbook对象、获取Sheet对象、遍历行和单元格、处理不同的数据类型以及优化性能。掌握这些步骤，可以有效地解析和处理Excel文件中的数据。使用Apache POI库时，务必注意性能优化，尤其是处理大型Excel文件时，建议采用基于SAX的解析方式。