Solr for Arabic PDF's

I am trying to search arabic PDFs in Apache Solr. The problem appears to be that Tika indexes the PDF in reverse order (Left-to-right) instead of (Right-to-left).

I have found references about this problem here:

Solr for Arabic
How to parse arabic pdf with Tika
http://www.linnovate.net/blog/apache-solr-search-hebrew-and-probably-arabic-documents-drupal-pdf-problem-solution

However, I don't know how to include the latest version of PDFBOX or ICU4J in my apache solr. My Apache Solr Contrib/extraction/lib folder contains pdfbox-1.6.0.jar and icu4j-4.8.1.1.jar . Will removing the mentioned files and replacing them with the latest libraries from their projects pages be satisfactory to force TIKA to use them?

Please explain as I don't have a previous experience with Java servlet. Thanks!

Recommended topics

Hot tags