I'm parsing a PDF file using IText7 in C# that contains Japanese characters like so:
public static string ExtractTextFromPDF(string filePath)
{
var pdfReader = new PdfReader(filePath);
var pdfDoc = new PdfDocument(pdfReader);
var sb = new StringBuilder();
for (int page = 1; page <= pdfDoc.GetNumberOfPages(); page++)
{
var strategy = new SimpleTextExtractionStrategy();
sb.Append(PdfTextExtractor.GetTextFromPage(pdfDoc.GetPage(page), strategy));
}
pdfDoc.Close();
pdfReader.Close();
return sb.ToString();
}
But I run into the exception:
iText.IO.IOException: 'The CMap iText.IO.Font.Cmap.UniJIS-UTF16-H was not found.'
I've searched around for a solution on how to add this but I haven't come up with anything that works for the Japanese characters. If there is any other library more suited that would also be ok. Any help?
Thanks