l have thousands of pdf documents that are 11-15mb. My program says that my document contains more than 100k characters.
Error output:
Exception in thread "main" org.apache.tika.sax.WriteOutContentHandler$WriteLimitReachedException: Your document contained more than 100000 characters, and so your requested limit has been reached. To receive the full text of the document, increase your limit.
How can l increase the limit to 10-15mb ?
I found a solution which is new Tika facade class but l could not find a way to integrate it with mine.
Tika tika = new Tika();
tika.setMaxStringLength(10*1024*1024);
Here is my code:
BodyContentHandler handler = new BodyContentHandler();
Metadata metadata = new Metadata();
String location = "C:\\Users\\Laptop\\Dropbox\\MainTextbookTrappe2ndEd.pdf";
FileInputStream inputstream = new FileInputStream(location);
ParseContext pcontext = new ParseContext();
PDFParser pdfparser = new PDFParser();
pdfparser.parse(inputstream, handler, metadata, pcontext);
Output:
System.out.println("Content of the PDF :" + pcontext);