How to parse arabic pdf with Tika

About

Asked 9/4, 2012 at 17:14 Answered 20/4, 2012 at 17:42

I've installed tika with solr , and it's working well for arabic pdf , is there any tutorial to make this happen , I've seen a similar question to this and the solution was to include ICU4J.jar , but I don't now what does it mean

Scathing answered 9/4, 2012 at 17:14 Comment(2)

What's the question? You say "it's working well for arabic pdf" so I'm not sure what isn't working and what you need help with? – Brisbane 18/4, 2012 at 15:55

It's Working for other document format such as doc , odt etc ... but for pdf it doesn't extract arabic well , I think they have found a solution here https://mcmap.net/q/1536472/-solr-for-arabic , but I'm newbie with Java . – Scathing 18/4, 2012 at 16:10

ICU4J can be downloaded here: http://site.icu-project.org/download

Archaeopteryx answered 20/4, 2012 at 17:42 Comment(5)

THank for your response , but how to install it ?? – Scathing 21/4, 2012 at 18:38

WEB-INF/lib is the standard place for additional libraries (jar files) in a web application (like Solr). If you are running the Solr war file, then look for a shared library directory for your servlet container (probably Tomcat or Jetty). – Archaeopteryx 23/4, 2012 at 15:40

unfortunatly I'm a PHP programer and I'm not using tomcat , instead I'm using apache2 , with apachesolr as a server. Any detailed Howto will be so much apriciated, Thanks for your reply – Scathing 26/4, 2012 at 11:44

Apache Solr is a web application written in Java. It is installed in a servlet container, usually Jetty or Tomcat. If you are going to use Solr, you will need to learn the basics of configuring Java webapps. – Archaeopteryx 3/5, 2012 at 18:37

Hi sel_space, have you been able to get this working? Did you understand how you can include ICU4J? I am struggling with this as well.... – Pasta 27/11, 2012 at 21:35

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags