I'm using the Java library Tika by Apache (tika-core
ver. 1.10
).
Exists a org.apache.tika.detect.Detector
for CSV files?
The MIME type should be text/csv
, but I cannot find anything like that.
I would like to use the nice detect
method
I'm using the Java library Tika by Apache (tika-core
ver. 1.10
).
Exists a org.apache.tika.detect.Detector
for CSV files?
The MIME type should be text/csv
, but I cannot find anything like that.
I would like to use the nice detect
method
Currently (v1.10) tika-mimetypes.xml
defines text/csv
like this:
<mime-type type="text/csv">
<glob pattern="*.csv"/>
<sub-class-of type="text/plain"/>
</mime-type>
This means that Apache Tika detects only by filename. If you use Tika#detect(File)
Tika will add filename (under Metadata.RESOURCE_NAME_KEY
key) to Metadata
object passed to detector. There's similar behavior for URLs.
If you want to inject filename you can use something like:
new Tika().detect(is, fileName)
If you want some heuristics, based on content, feel free to check and file a ticket in Tika's JIRA.
© 2022 - 2024 — McMap. All rights reserved.
MimeTypes
detector should cover you for that. What happens if you just try withDefaultDetetor
orTikaConfig.getDefaultConfig().getDetector()
? – Marrs