How to parse arabic pdf with Tika
Asked Answered
S

1

1

I've installed tika with solr , and it's working well for arabic pdf , is there any tutorial to make this happen , I've seen a similar question to this and the solution was to include ICU4J.jar , but I don't now what does it mean

Scathing answered 9/4, 2012 at 17:14 Comment(2)
What's the question? You say "it's working well for arabic pdf" so I'm not sure what isn't working and what you need help with?Brisbane
It's Working for other document format such as doc , odt etc ... but for pdf it doesn't extract arabic well , I think they have found a solution here https://mcmap.net/q/1536472/-solr-for-arabic , but I'm newbie with Java .Scathing
A
1

ICU4J can be downloaded here: http://site.icu-project.org/download

Archaeopteryx answered 20/4, 2012 at 17:42 Comment(5)
THank for your response , but how to install it ??Scathing
WEB-INF/lib is the standard place for additional libraries (jar files) in a web application (like Solr). If you are running the Solr war file, then look for a shared library directory for your servlet container (probably Tomcat or Jetty).Archaeopteryx
unfortunatly I'm a PHP programer and I'm not using tomcat , instead I'm using apache2 , with apachesolr as a server. Any detailed Howto will be so much apriciated, Thanks for your replyScathing
Apache Solr is a web application written in Java. It is installed in a servlet container, usually Jetty or Tomcat. If you are going to use Solr, you will need to learn the basics of configuring Java webapps.Archaeopteryx
Hi sel_space, have you been able to get this working? Did you understand how you can include ICU4J? I am struggling with this as well....Pasta

© 2022 - 2024 — McMap. All rights reserved.