PDF-Forms with Unicode chars [closed]
Asked Answered
U

1

14

I am currently struggling with withing a PDF form created from a LibreOffice document.

I created it like suggested in the book "iText in Action" and am now trying to pre-fill the embedded form with a few values, that can contain Unicode chars.

This includes a character that consist of base char with an addition combining char (e.G. M̂).

I have tried several different hints I found in in stackoverflow and the book, but I never got a PDF document with a form that works on all platforms: Linux (Okular, Evince, Acrobat DC, macOS Previewer, etc.)

I'm aware that I need to have a font, that covers the chars and embedded the font fully. Below there is the code I used to file the PDF document and the PDF file.

My questions are:

  • Is the different behavior of the PDF readers specification weakness in the PDF specification and I have to live with it?
  • Specially the Linux PDF readers and Acrobat behave badly. Are there known bugs?
  • I'm not very familiar with internals of PDF, so any suggestions? Are the contents of my PDF files ok?
  • Any suggestions on how to improve the code to get better results?

Code to fill the form:

BaseFont uniFont = BaseFont.createFont("./src/main/resources/UnicodeDoc.ttf", BaseFont.IDENTITY_H, BaseFont.EMBEDDED, false, null, null, false);
uniFont.setSubset(false);

// Debugging code...
for (String codepage : uniFont.getCodePagesSupported()) {
    System.out.println("Codepage = " + codepage);
}

FileInputStream fis = new FileInputStream(src);
ByteArrayOutputStream baos = new ByteArrayOutputStream();
PdfReader reader = new PdfReader(fis);
PdfStamper stamper = new PdfStamper(reader, baos);

// Fill all fields in PDF form
String text = "aM\u0302a"; // Same as "aM̂a"
com.itextpdf.text.pdf.AcroFields form = stamper.getAcroFields();
for (String fname : form.getFields().keySet()) {
    System.out.println("form." + fname);
    form.setField(fname, text);
    form.setFieldProperty(fname, "textfont", uniFont, null);
}
form.setGenerateAppearances(true);
form.addSubstitutionFont(uniFont);
stamper.setFormFlattening(false);
stamper.close();
reader.close();

Thanks in advance, Mik86

Unitarianism answered 27/1, 2018 at 14:50 Comment(4)
This is a rare, well researched question. Sorry, I have no answer for you, but I just wanted to say that.Cream
Well, I wouldn't say well researched, because I found no solution for that issue.Unitarianism
No, but you wrote what you already tried, and you added your code, and you clearly described the problem. That is very rare on Stack Overflow.Cream
I can't believe, that there is no solution to this not very exotic problem :-(Unitarianism
S
4

I'm not very familiar with internals of PDF, so any suggestions? Are the contents of my PDF files ok?

I'll have to dig into the PDF specification to see if there is something definitively incorrect going on, but to me there does appear to be a confusion.

Firstly, your input Template gives me an error when I attempt to open it in Acrobat, and LiveCycle complains that "UnicodeDoc" must be swapped out for a different font. "UnicodeDoc" is used within the original input file:

enter image description here

Note that the font "UnicodeDoc" is not embedded in your input file. When filling in you create and embed a font, but it looks like you don't overwrite the original (again, not to say this is correct or incorrect):

enter image description here

Without going too much into the inner workings of PDFs the form that is getting filled out still links to the original Font that isn't embedded.

This doesn't necessarily directly address the issue, but if I "fix" your document by removing the font from the original template:

input.pdf

And run it through your code it produces output.pdf which has the correct output in Acrobat and Reader.

Again, this isn't to say your PDF is wrong or iText is wrong in this case as I haven't looked through the entire specification to see what (if any) interaction is expected here, but as it stands the font that you are embedding is not the font that ends up getting used in the form field.

Situated answered 5/2, 2018 at 15:17 Comment(1)
Another problem with the original document: Not only is UnicodeDoc not embedded, its encoding is set to WinAnsiEncoding which also will allow for hardly any interesting Unicode characters. So LibreOffice has created a really deficient template document, for which reasons ever.Quartas

© 2022 - 2024 — McMap. All rights reserved.