tika-server Questions

0

I'm trying to setup Tika for text extraction using python. I've installed Java runtime jre 1.8.0, Installed tika with pip install tika==1.23, Downloaded the tika server jar file from this link, and...
Pessa asked 2/3, 2021 at 18:36

0

I am attempting to Tika parse dozens of millions of office documents. Pdfs, docs, excels, xmls, etc. Wide assortment of types. Throughput is very important. I need to be able parse these files in a...
Meganmeganthropus asked 22/11, 2020 at 5:27

1

Solved

The Apache Tika Server provides a Rest API to extract text from a document. It is also possible to set specific request header parameters like X-Tika-PDFOcrStrategy. e.g: $ curl -T test/Dokument01...
Blowhard asked 25/5, 2020 at 21:26

1

Solved

I'm currently using tika to extract the text from pdf files. I found a very fast method within the tika module. This method is called unpack. This is my code: from tika import unpack text = unpa...
Oakman asked 2/11, 2018 at 16:7
1

© 2022 - 2024 — McMap. All rights reserved.