JavaでCSV（Comma-Separated Values）を扱うライブラリってどれを使えば？

類似したフォーマットとして、タブで区切られた tab-separated values (TSV) や、欧文間隔 (いわゆる半角スペース) で区切られた space-separated values (SSV) などがあり、これらをまとめて character-separated values (CSV) や delimiter-separated values (DSV) などと呼ぶことも多い。

Comma-Separated Values - Wikipedia

⇧ はい、出ました。

CSV（comma-separated values）
CSV（character-separated values）

どっちの「CSV」指してるのか認識齟齬が発生するやんけ～！

2005年10月、それまでの各ソフトウェアにおけるCSVの実装を追認する形で、RFC 4180 で Informational（IESG の外部で決定された有用な情報の提供）として仕様が成文化された。しかし実際のソフトウェア側の実装はRFCに準拠していないことが多い。

Comma-Separated Values - Wikipedia

⇧ 仕様は、「RFC 4180」で定義されてるんだそうな。

JavaでCSV（Comma-Separated Values）を扱うライブラリっていうと？

Javaの場合「CSV（Comma-Separated Values）」を扱うライブラリっていっても、おそらく種類がたくさんあるんだとは思うんだけど、何がおススメなんかね？

その前に、「CSVライブラリ」って言った場合の「CSV」って、

CSV（comma-separated values）
CSV（character-separated values）

のどっちになるんだろうか？

とりあえず、「CSV（Comma-Separated Values）」って体で、話を進めてみますか。

honeplus.blog50.fc2.com

qiita.com

nainaistar.hatenablog.com

⇧ 上記サイト様で紹介されてるだけでも、どれを選んだら？ってなぐらいライブラリの数が豊富なんで、困っちゃいますね。

ライブラリの更新状況から見るに、「OrangeSignalCSV」「Super CSV」を使うのは微妙そうな感じですかね。

「CsvMapper」ってのも、「Jackson」に依存してるっぽいので、「Jackson」を使ってないようなプロジェクトだと微妙になってきますね。

ということで、選択肢的には、

Opencsv
Apache Commons Csv

のどっちかが良いのかな、と思ったんですが、「Opencsv」が使いやすいと仰る方が多そうな感じみたいなので、「Opencsv」を利用してみることにします。

ちなみに、

qiita.com

⇧ 上記サイト様によりますと、巨大なサイズの「Excel」ファイルから「CSV（Comma-Separated Values）」への変換って用途には、「Apache POI」のAPIで何とかできるらしい。（なんか、空のCSVファイルしかできなかったんだが...インプットに使ってるExcelファイルの説明がないから本当に動くのかが分からん...）

blog1.mammb.com

⇧ 上記サイト様も「コンソール」出力っぽいのよね...

ちなみに、

stackoverflow.com

codezine.jp

stackoverflow.com

⇧ JDK11の変更で「XMLとJavaをバインディング」してた「JAXB（Java Architecture for XML Binding）」ってAPIが、JDKの標準APIから除外されたらしく、JDK 11以上で「JAXB（Java Architecture for XML Binding）」に依存する機能を使いたい場合は、自分で「JAXB（Java Architecture for XML Binding）」を追加する必要があるらしい...知らんがな。

さらに、「Apache POI」で「org.apache.poi.ooxml.util.SAXHelper」が「Depricated（非推奨）」になったらしい...

まぁ、何て言うか、「Depricated（非推奨）」にするのは構わんけど、ちゃんと代替のコーディング例ってのを紹介してくれても良い気はするよね...

そして、

svn.apache.org

f:id:ts0818:20210521142934p:plain

⇧ 上記の「XLSX2CSV.java」のリンクから、公式のサンプルを確認してみたところ、一応、動くには動いたけど...

「CSV（Comma-Separated Values）」形式で「コンソール」に出力する例というね...

っていうか「コンソール」に出力するって、使い道ないやん...

実際に試してみる

とりあえず、実際に試してみますか。

github.com

⇧ 上記では、「OrangeSignal CSV」てライブラリを使えってなってるんですが、流石に「OrangeSignal CSV」のライブラリの「最終更新」が「2014-09-20」ってなってるので、他のライブラリを使っていこうかということで、今回は「Opnecsv」をチョイス。

今回は、「Excel」からデータを読み込んで、「Opencsv」の処理をかまして、最終的に「CSV」ファイルに出力という流れなので、

f:id:ts0818:20210527114931p:plain
⇧ みたいな処理の流れになる感じですかね。

今回は「CSV（Comma-Separated Values）」のデータが複数行になってるので、「opencsv」が「読み込んだデータ」から「Bean」にする際に、「Bean」のリストとなるように良しなに処理してくれるらしい。

と思ったら、「CSVファイル」に「空白行」とか「改行」のみの行があると、「Bean」に変換される際にエラー起きて上手くいかないし...

おまけに、

qiita.com

stackoverflow.com

⇧ 何か、「ヘッダー」が上手く認識されないっぽい問題も起こるみたいね...

そして、

nobeans.hatenablog.com

⇧ エンコーディングの問題とかも起きがち...

「Apache POI」に続き、「Opencsv」も扱い辛さが半端ない...実業務で使うの躊躇してしまうんだが...

というわけで、必要なファイルは以下のようになりました。

f:id:ts0818:20210526080107p:plain

/*
 * This file was generated by the Gradle 'init' task.
 *
 * This generated file contains a sample Java Library project to get you started.
 * For more details take a look at the Java Libraries chapter in the Gradle
 * User Manual available at https://docs.gradle.org/6.3/userguide/java_library_plugin.html
 */

plugins {
    // Apply the java-library plugin to add support for Java Library
    id 'java-library'
}

repositories {
    // Use jcenter for resolving dependencies.
    // You can declare any Maven/Ivy/file repository here.
    jcenter()
}

dependencies {
    // This dependency is exported to consumers, that is to say found on their compile classpath.
    api 'org.apache.commons:commons-math3:3.6.1'

    // This dependency is used internally, and not exposed to consumers on their own compile classpath.
    implementation 'com.google.guava:guava:28.2-jre'

	// https://mvnrepository.com/artifact/org.apache.poi/poi
	implementation group: 'org.apache.poi', name: 'poi', version: '4.1.2'

	// https://mvnrepository.com/artifact/org.apache.poi/poi-ooxml
	implementation group: 'org.apache.poi', name: 'poi-ooxml', version: '4.1.2'

	// https://mvnrepository.com/artifact/org.apache.poi/poi-ooxml-schemas
	implementation group: 'org.apache.poi', name: 'poi-ooxml-schemas', version: '4.1.2'

	// https://mvnrepository.com/artifact/com.opencsv/opencsv
	implementation group: 'com.opencsv', name: 'opencsv', version: '5.4'

	// https://mvnrepository.com/artifact/xml-apis/xml-apis
	implementation group: 'xml-apis', name: 'xml-apis', version: '2.0.2'

    // Use JUnit test framework
    testImplementation 'junit:junit:4.12'
}

⇧ 「poi」「poi-ooxml」「poi-ooxml-schemas」「opencsv」「xml-apis」の5つがあればOKかと。「SAX（Simple API For XML）」というか「XML（Extensible Markup Language）」を使わない場合は、「xml-apis」は不要。

■/Java_one_hundred_apache_poi/src/main/java/Java_one_hundred_apache_poi/ninety/dao/PopulationJapanDao.java

package Java_one_hundred_apache_poi.ninety.dao;

import java.io.Reader;
import java.io.Writer;
import java.util.List;

import com.opencsv.bean.CsvToBean;
import com.opencsv.bean.CsvToBeanBuilder;
import com.opencsv.bean.StatefulBeanToCsv;
import com.opencsv.bean.StatefulBeanToCsvBuilder;
import com.opencsv.exceptions.CsvDataTypeMismatchException;
import com.opencsv.exceptions.CsvRequiredFieldEmptyException;

import Java_one_hundred_apache_poi.ninety.dto.PopulationJapanDto;

public class PopulationJapanDao {

  public void write(Writer writer, List<PopulationJapanDto> beans)
      throws CsvDataTypeMismatchException, CsvRequiredFieldEmptyException {
    StatefulBeanToCsv<PopulationJapanDto> beanToCsv = new StatefulBeanToCsvBuilder<PopulationJapanDto>(writer)
        .build();
    beanToCsv.write(beans);
  }

  public List<PopulationJapanDto> read(Reader reader) {
    CsvToBean<PopulationJapanDto> csvToBean = new CsvToBeanBuilder<PopulationJapanDto>(reader).withType(PopulationJapanDto.class).build();
      return csvToBean.parse();
  }

}

■/Java_one_hundred_apache_poi/src/main/java/Java_one_hundred_apache_poi/ninety/dto/PopulationJapanDto.java

package Java_one_hundred_apache_poi.ninety.dto;

import com.opencsv.bean.CsvBindByName;

public class PopulationJapanDto {

  @CsvBindByName(column= "都道府県", required = true)
  //@CsvBindByPosition(position=0)
  private String prefecture;

  @CsvBindByName(column = "総人口", required = true)
  //@CsvBindByPosition(position=1)
  private double totalPopulation;

  public String getPrefecture() {
    return prefecture;
  }

  public void setPrefecture(String prefecture) {
    this.prefecture = prefecture;
  }

  public double getTotalPopulation() {
    return totalPopulation;
  }

  public void setTotalPopulation(double totalPopulation) {
    this.totalPopulation = totalPopulation;
  }

}

■/Java_one_hundred_apache_poi/src/main/java/Java_one_hundred_apache_poi/ninety/util/XLSX2CSV.java

package Java_one_hundred_apache_poi.ninety.util;

import java.io.IOException;
import java.io.InputStream;
import java.io.PrintStream;

import javax.xml.parsers.ParserConfigurationException;

import org.apache.poi.openxml4j.exceptions.OpenXML4JException;
import org.apache.poi.openxml4j.opc.OPCPackage;
import org.apache.poi.ss.usermodel.DataFormatter;
import org.apache.poi.ss.util.CellAddress;
import org.apache.poi.ss.util.CellReference;
import org.apache.poi.util.XMLHelper;
import org.apache.poi.xssf.eventusermodel.ReadOnlySharedStringsTable;
import org.apache.poi.xssf.eventusermodel.XSSFReader;
import org.apache.poi.xssf.eventusermodel.XSSFSheetXMLHandler;
import org.apache.poi.xssf.eventusermodel.XSSFSheetXMLHandler.SheetContentsHandler;
import org.apache.poi.xssf.model.SharedStrings;
import org.apache.poi.xssf.model.Styles;
import org.apache.poi.xssf.model.StylesTable;
import org.apache.poi.xssf.usermodel.XSSFComment;
import org.xml.sax.ContentHandler;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;
import org.xml.sax.XMLReader;

@SuppressWarnings({})
public class XLSX2CSV {
  /**
   * Uses the XSSF Event SAX helpers to do most of the work
   *  of parsing the Sheet XML, and outputs the contents
   *  as a (basic) CSV.
   */
  private class SheetToCSV implements SheetContentsHandler {
    private boolean firstCellOfRow;
    private int currentRow = -1;
    private int currentCol = -1;

    private void outputMissingRows(int number) {
      for (int i=0; i<number; i++) {
        for (int j=0; j<minColumns; j++) {
          output.append(',');
        }
        output.append('\n');
      }
    }

    @Override
    public void startRow(int rowNum) {

      // If there were gaps, output the missing rows
      outputMissingRows(rowNum-currentRow-1);
      // Prepare for this row
      firstCellOfRow = true;
      currentRow = rowNum;
      currentCol = -1;
    }

    @Override
    public void endRow(int rowNum) {
      int countColumn = 0;
      // Ensure the minimum number of columns
      for (int i=currentCol; i<minColumns; i++) {
        //output.append(',');
        countColumn++;
      }
      if (minColumns == (countColumn +1)) {
        output.append('\n');
      }
    }

    @Override
    public void cell(String cellReference, String formattedValue,
            XSSFComment comment) {
      if (firstCellOfRow) {
        firstCellOfRow = false;
      } else {
        output.append(',');
      }

      // gracefully handle missing CellRef here in a similar way as XSSFCell does
      if(cellReference == null) {
        cellReference = new CellAddress(currentRow, currentCol).formatAsString();
      }

      // no need to append anything if we do not have a value
      if (formattedValue == null) {
        return;
      }

      // Did we miss any cells?
      int thisCol = (new CellReference(cellReference)).getCol();
      int missedCols = thisCol - currentCol - 1;
      for (int i=0; i<missedCols; i++) {
        //output.append(',');
      }

      currentCol = thisCol;

      // Number or string?
      try {
        //noinspection ResultOfMethodCallIgnored
        Double.parseDouble(formattedValue);
        output.append(formattedValue);
      } catch (Exception e) {
        // let's remove quotes if they are already there
        if (formattedValue.startsWith("\"") && formattedValue.endsWith("\"")) {
            formattedValue = formattedValue.substring(1, formattedValue.length()-1);
        }

        output.append('"');
        // encode double-quote with two double-quotes to produce a valid CSV format
        output.append(formattedValue.replace("\"", "\"\""));
        output.append('"');
      }
    }
  }


  ///////////////////////////////////////

  private final OPCPackage xlsxPackage;

  /**
   * Number of columns to read starting with leftmost
   */
  private final int minColumns;

  /**
   * Destination for data
   */
  private final PrintStream output;

  /**
   * Creates a new XLSX -> CSV examples
   *
   * @param pkg        The XLSX package to process
   * @param output     The PrintStream to output the CSV to
   * @param minColumns The minimum number of columns to output, or -1 for no minimum
   */
  public XLSX2CSV(OPCPackage pkg, PrintStream output, int minColumns) {
    this.xlsxPackage = pkg;
    this.output = output;
    this.minColumns = minColumns;
  }

  /**
   * Parses and shows the content of one sheet
   * using the specified styles and shared-strings tables.
   *
   * @param styles The table of styles that may be referenced by cells in the sheet
   * @param strings The table of strings that may be referenced by cells in the sheet
   * @param sheetInputStream The stream to read the sheet-data from.

   * @exception java.io.IOException An IO exception from the parser,
   *            possibly from a byte stream or character stream
   *            supplied by the application.
   * @throws SAXException if parsing the XML data fails.
   */
  public void processSheet(
      Styles styles,
      SharedStrings strings,
      SheetContentsHandler sheetHandler,
      InputStream sheetInputStream) throws IOException, SAXException {
      
    DataFormatter formatter = new DataFormatter();
    InputSource sheetSource = new InputSource(sheetInputStream);

    try {
      XMLReader sheetParser = XMLHelper.newXMLReader();
      ContentHandler handler = new XSSFSheetXMLHandler(
            styles, null, strings, sheetHandler, formatter, false);
      sheetParser.setContentHandler(handler);
      sheetParser.parse(sheetSource);

    } catch(ParserConfigurationException e) {
      throw new RuntimeException("SAX parser appears to be broken - " + e.getMessage());
    }
  }

  /**
   * Initiates the processing of the XLS workbook file to CSV.
   *
   * @throws IOException If reading the data from the package fails.
   * @throws SAXException if parsing the XML data fails.
   */
  public void process() throws IOException, OpenXML4JException, SAXException {
    ReadOnlySharedStringsTable strings = new ReadOnlySharedStringsTable(this.xlsxPackage);
    XSSFReader xssfReader = new XSSFReader(this.xlsxPackage);
    StylesTable styles = xssfReader.getStylesTable();
    XSSFReader.SheetIterator iter = (XSSFReader.SheetIterator) xssfReader.getSheetsData();
    int index = 0;
    while (iter.hasNext()) {
      try (InputStream stream = iter.next()) {

        String sheetName = iter.getSheetName();
//        this.output.println();
//        this.output.println(sheetName + " [index=" + index + "]:");
        processSheet(styles, strings, new SheetToCSV(), stream);
      }
      ++index;
    }
  }
}

■/Java_one_hundred_apache_poi/src/main/java/Java_one_hundred_apache_poi/ninety/Main.java

package Java_one_hundred_apache_poi.ninety;

import java.io.IOException;
import java.io.PrintStream;
import java.io.Reader;
import java.io.Writer;
import java.nio.charset.StandardCharsets;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.util.List;

import org.apache.poi.openxml4j.exceptions.InvalidFormatException;
import org.apache.poi.openxml4j.exceptions.InvalidOperationException;
import org.apache.poi.openxml4j.exceptions.OpenXML4JException;
import org.apache.poi.openxml4j.opc.OPCPackage;
import org.apache.poi.openxml4j.opc.PackageAccess;
import org.xml.sax.SAXException;

import com.opencsv.exceptions.CsvDataTypeMismatchException;
import com.opencsv.exceptions.CsvRequiredFieldEmptyException;

import Java_one_hundred_apache_poi.ninety.dao.PopulationJapanDao;
import Java_one_hundred_apache_poi.ninety.dto.PopulationJapanDto;
import Java_one_hundred_apache_poi.ninety.util.XLSX2CSV;

public class Main {

  private static final String RESOURCE_ROOT_DIR = "src/main/resources/";
  private static final String INPUT_FILE_PATH = RESOURCE_ROOT_DIR + "output/n210200200_output.xlsx";
  private static final String CONVERT_FILE_PATH = RESOURCE_ROOT_DIR + "output/n210200200_convert.csv";
  private static final String OUPUT_FILE_PATH = RESOURCE_ROOT_DIR + "output/n210200200_output.csv";

  public static void main(String[] args) {
    // ExcelファイルからCSVファイルに変換
        // The package open is instantaneous, as it should be.
        try (OPCPackage p = OPCPackage.open(Paths.get(INPUT_FILE_PATH).toString(), PackageAccess.READ)) {
            XLSX2CSV xlsx2csv = new XLSX2CSV(p, new PrintStream(CONVERT_FILE_PATH), 2);
            xlsx2csv.process();
        } catch (InvalidOperationException e1) {
      // TODO 自動生成された catch ブロック
      e1.printStackTrace();
    } catch (InvalidFormatException e1) {
      // TODO 自動生成された catch ブロック
      e1.printStackTrace();
    } catch (IOException e1) {
      // TODO 自動生成された catch ブロック
      e1.printStackTrace();
    } catch (OpenXML4JException e) {
      // TODO 自動生成された catch ブロック
      e.printStackTrace();
    } catch (SAXException e) {
      // TODO 自動生成された catch ブロック
      e.printStackTrace();
    }

    // CSVファイルからBeanを作成して、BeanからCSVファイルに書き出し
    PopulationJapanDao populationJapanDao = new PopulationJapanDao();

    try(Reader reader = Files.newBufferedReader(Paths.get(CONVERT_FILE_PATH), StandardCharsets.UTF_8);
        Writer writer = Files.newBufferedWriter(Paths.get(OUPUT_FILE_PATH))) {

      List<PopulationJapanDto> populationJapanDtoList = populationJapanDao.read(reader);
      populationJapanDtoList.forEach(populationJapanDto -> {
        System.out.println(populationJapanDto.getPrefecture() + "\t" + populationJapanDto.getTotalPopulation());
      });
      populationJapanDao.write(writer, populationJapanDtoList);

    } catch (IOException e) {
      // TODO 自動生成された catch ブロック
      e.printStackTrace();
    } catch (CsvDataTypeMismatchException e) {
      // TODO 自動生成された catch ブロック
      e.printStackTrace();
    } catch (CsvRequiredFieldEmptyException e) {
      // TODO 自動生成された catch ブロック
      e.printStackTrace();
    }
  }
}

■/Java_one_hundred_apache_poi/src/main/resources/output/n210200200_output.xlsx

f:id:ts0818:20210526082922p:plain