CSV Detector in Apache Tika
Asked Answered
M

1

5

I'm using the Java library Tika by Apache (tika-core ver. 1.10).

Exists a org.apache.tika.detect.Detector for CSV files? The MIME type should be text/csv, but I cannot find anything like that.

I would like to use the nice detect method

Manhunt answered 21/8, 2015 at 9:34 Comment(1)
The main MimeTypes detector should cover you for that. What happens if you just try with DefaultDetetor or TikaConfig.getDefaultConfig().getDetector()?Marrs
W
7

Currently (v1.10) tika-mimetypes.xml defines text/csv like this:

<mime-type type="text/csv">
  <glob pattern="*.csv"/>
  <sub-class-of type="text/plain"/>
</mime-type>

This means that Apache Tika detects only by filename. If you use Tika#detect(File) Tika will add filename (under Metadata.RESOURCE_NAME_KEY key) to Metadata object passed to detector. There's similar behavior for URLs.

If you want to inject filename you can use something like:

new Tika().detect(is, fileName)

If you want some heuristics, based on content, feel free to check and file a ticket in Tika's JIRA.

Weiweibel answered 9/9, 2015 at 0:58 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.