Apache Tika Server - Request Header Parameters?

The Apache Tika Server provides a Rest API to extract text from a document. It is also possible to set specific request header parameters like X-Tika-PDFOcrStrategy. e.g:

$ curl -T test/Dokument01.pdf http://localhost:9998/tika --header "X-Tika-PDFOcrStrategy: ocr_only"

From a lot of different documents about tika I found these documented additional header parameters:

X-Tika-OCRLanguage: eng
X-Tika-PDFextractInlineImages: true | false
X-Tika-PDFOcrStrategy: ocr_only  |  ocr_and_text_extraction
X-Tika-OCRoutputType: hocr

But there seems to be no documentation about how to use the X-Tika-.....? header parameters or which parameters are supported and which not.

For example I wonder if it is possible to overwrite the ImageType mode or the DPI with something like:

X-Tika-PDFocrImageType: rgb
X-Tika-PDFocrDPI: 100

My question is: Which header parameters are supported and which naming convention did these params follow?

Recommended topics

Hot tags