I've installed tika with solr , and it's working well for arabic pdf , is there any tutorial to make this happen , I've seen a similar question to this and the solution was to include ICU4J.jar , but I don't now what does it mean
How to parse arabic pdf with Tika
Asked Answered
What's the question? You say "it's working well for arabic pdf" so I'm not sure what isn't working and what you need help with? –
Brisbane
It's Working for other document format such as doc , odt etc ... but for pdf it doesn't extract arabic well , I think they have found a solution here https://mcmap.net/q/1536472/-solr-for-arabic , but I'm newbie with Java . –
Scathing
ICU4J can be downloaded here: http://site.icu-project.org/download
THank for your response , but how to install it ?? –
Scathing
WEB-INF/lib is the standard place for additional libraries (jar files) in a web application (like Solr). If you are running the Solr war file, then look for a shared library directory for your servlet container (probably Tomcat or Jetty). –
Archaeopteryx
unfortunatly I'm a PHP programer and I'm not using tomcat , instead I'm using apache2 , with apachesolr as a server. Any detailed Howto will be so much apriciated, Thanks for your reply –
Scathing
Apache Solr is a web application written in Java. It is installed in a servlet container, usually Jetty or Tomcat. If you are going to use Solr, you will need to learn the basics of configuring Java webapps. –
Archaeopteryx
Hi sel_space, have you been able to get this working? Did you understand how you can include ICU4J? I am struggling with this as well.... –
Pasta
© 2022 - 2024 — McMap. All rights reserved.