tika-server Questions
0
I'm trying to setup Tika for text extraction using python. I've installed Java runtime jre 1.8.0, Installed tika with pip install tika==1.23, Downloaded the tika server jar file from this link, and...
Pessa asked 2/3, 2021 at 18:36
0
I am attempting to Tika parse dozens of millions of office documents. Pdfs, docs, excels, xmls, etc. Wide assortment of types.
Throughput is very important. I need to be able parse these files in a...
Meganmeganthropus asked 22/11, 2020 at 5:27
1
Solved
The Apache Tika Server provides a Rest API to extract text from a document. It is also possible to set specific request header parameters like X-Tika-PDFOcrStrategy. e.g:
$ curl -T test/Dokument01...
Blowhard asked 25/5, 2020 at 21:26
1
Solved
I'm currently using tika to extract the text from pdf files. I found a very fast method within the tika module. This method is called unpack.
This is my code:
from tika import unpack
text = unpa...
Oakman asked 2/11, 2018 at 16:7
1
© 2022 - 2024 — McMap. All rights reserved.