How to parse a CSV file that might have one of two delimiters?
Asked Answered
E

3

14

In my case, valid CSV are ones delimited by either comma or semi-colon. I am open to other libraries, but it needs to be Java. Reading through the Apache CSVParser API, the only thing I can think is to do this which seems inefficient and ugly.

try
{
   BufferedReader reader = new BufferedReader(new InputStreamReader(file));
   CSVFormat csvFormat = CSVFormat.EXCEL.withHeader().withDelimiter(';');
   CSVParser parser = csvFormat.parse( reader );
   // now read the records
} 
catch (IOException eee) 
{
   try
   {
      // try the other valid delimeter
      csvFormat = CSVFormat.EXCEL.withHeader().withDelimiter(',');
      parser = csvFormat.parse( reader );
      // now read the records
   }
   catch (IOException eee) 
   {
      // then its really not a valid CSV file
   }
}

Is there a way to check the delimiter first, or perhaps allow two delimiters? Anyone have a better idea than just catching an exception?

Elegance answered 12/8, 2015 at 0:21 Comment(2)
I think your codes are best. No method for detecting delimiter in normal CSV file. Only way for detecting delimiter is retrying with several delimiters.Fluke
Just a thought, if you have well formed csv could you do a pattern match for one of your options? If every field is wrapped in quotes then separated by commas you might find several instances of the pattern ","Lifework
D
8

We built support for this in uniVocity-parsers:

public static void main(String... args) {
    CsvParserSettings settings = new CsvParserSettings();
    settings.setDelimiterDetectionEnabled(true);

    CsvParser parser = new CsvParser(settings);

    List<String[]> rows = parser.parseAll(file);

}

The parser has many more features that I'm sure you will find useful. Give it a try.

Disclaimer: I'm the author of this library, it's open source and free (apache 2.0 license)

Dettmer answered 12/8, 2015 at 1:40 Comment(6)
Awesome parser, saved me a lot of headache. Thanks for sharing!Tamarind
Glad to help! Please consider upvoting the question and answer it you found this useful. Cheers!Dettmer
I've tested the parser on all kinds of weird CSV and everything went ok. I tried using a plain, simple, "\r\n" separated file and it glues even the first 2-3 lines after the header.. :( autodetecting/supplying the line separator makes no difference.Tamarind
@Tamarind do you mind providing the file you used to test?Dettmer
I made a separate question here: #44208637Tamarind
Bravo! You saved me alot of time!Privett
E
0

I've had the same problem which I solved it in this way:

    BufferedReader in = Files.newBufferedReader(Paths.get(fileName));
    in.mark(1024);
    String line = in.readLine();
    CSVFormat fileFormat;
    
    if(line.indexOf(';') != -1)
        fileFormat = CSVFormat.EXCEL.withDelimiter(';');
    else
        fileFormat = CSVFormat.EXCEL;
    
    in.reset();

After that you can parse it with CSVParser.

Equidistant answered 7/2, 2021 at 0:4 Comment(0)
R
0

below my solve for this problem:

    private static final Character[] DELIMITERS = {';', ','};
    private static final char NO_DELIMITER = '\0'; //empty char

    private char detectDelimiter() throws IOException {
        try (
            final var reader = new BufferedReader(new InputStreamReader(resource.getInputStream()));
        ) {
            String line = reader.readLine();

            return Arrays.stream(DELIMITERS)
                .filter(s -> line.contains(s.toString()))
                .findFirst()
                .orElse(NO_DELIMITER);
        }
    }

example usage:

private CSVParser openCsv() throws IOException {

        final var csvFormat = CSVFormat.DEFAULT
            .withFirstRecordAsHeader()
            .withDelimiter(detectDelimiter())
            .withTrim();

        return new CSVParser(new InputStreamReader(resource.getInputStream(), StandardCharsets.UTF_8), csvFormat);
    }
Rheology answered 22/4, 2021 at 13:49 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.