PDFTK and removing the XFA format
Asked Answered
P

1

5

Are there any issues that can come up of removing the XFA format from a PDF form? I'm using PDFTK to fill form, and found that if forms are XFA, then PDFTK doesn't work unless I do a drop_xfa command first to create a new template form. One thing I did notice is that if I didn't do the drop_xfa, I could see the fields pre-filled on Acrobat Reader but not Acrobat Pro. Other views like Ubuntu Document Viewer, would be fine. I don't mind doing the drop_xfa but just checking is there might be issues with me doing that to forms that I am not aware of.

Example: If the form is filled, and it's to be read on a system to grab the fields/values to process.

Thank you in advance.

Procarp answered 14/4, 2015 at 17:34 Comment(0)
B
7

There are three types of forms in PDF:

  • Forms using AcroForm technology. In this case, each field corresponds with one or more widgets with fixed positions on specific pages. The form is described using nothing but PDF syntax.
  • Dynamic forms using the XML Forms Architecture (XFA). In this case, the PDF file is nothing but a container for an XML file that describes the whole form. We refer to this as dynamic XFA, because the form can expand or shrink based on the data that is added: a 1-page form can turn into a 100-page form by adding more data.
  • Hybrid forms that combine AcroForm and XFA technology. In this case, the form is described twice: once using PDF objects; once using XML. Obviously, such a form is not dynamic: the AcroForm part still defines widget annotations that are defined at absolute positions on specific pages. The form can't adapt to its data.

If you have a dynamic XFA form, dropping the XML will remove the complete form. There won't be anything left.

However, it seems that you are confronted with a hybrid form that consists of both AcroForm and XFA syntax. Hybrid forms are a pain because they often lead to confusion. For instance: a viewer that is not XFA aware, will show you the data as stored in the AcroForm. A viewer that is XFA aware, can give preference to the data as stored in the XFA form. What's the problem, you might ask? Aren't both forms equivalent?

Ideally, both versions of the form are indeed equivalent, but:

  • If the form isn't filled out correctly, the AcroForm can be different from the XFA form.
  • XFA has more functionality that AcroForm technology. For instance: a text field in an XFA form can be justified (similar to <p align="justify"> in HTML). However, this option doesn't exist in an AcroForm text field (you can only have left, center or right alignment). Hence if you have text that is justified in an XFA form, but you only look at the AcroForm, then the text won't be justified (because justified text doesn't exist in an AcroForm text field).

This is a long answer to explain that, if you have a hybrid form, it is in most cases OK to throw away the XFA part. You may have small differences, but if you are OK with what the form looks like in Ubuntu Document Viewer (a viewer that doesn't support XFA), then you should be fine.

DISCLAIMER: I am the CEO of the iText Group. Pdftk is a third party tool based on an obsolete and no longer supported version of iText. iText Group does not endorse the use of Pdftk.

Bainter answered 15/4, 2015 at 8:24 Comment(5)
Hey Bruno, thank you so much for the the explanation. Me using PDFTK is because the alternative that I found is from setasign or something similar and their asking price for the filling form application is too much. Haven't found any other open source or not that expensive alternatives to use. Do you know of 1 or 2 I don't know about? Thanks in advance.Procarp
SO should not be used for questions asking to recommend a tool, a library,... hence your question should remain unanswered.Bainter
Bruno, I use PDFTK because it's the only option for Node.js/Meteor applications (JavaScript only server side). iText/iTextsharp only works with Java and .Net. PDFTK works great, and does everything I can think of and need (sorry, I know you loose money that way). Didn't you just wrap an API around PDFTK for the most part anyhow?Leija
@Leija Did I just wrap an API around PdfTk? Man, I wrote iText and somebody compiled my code and names it PdfTk. Please don't turn things upside-down. iText is the original product; PdfTk is derived from it.Bainter
After looking this up for myself, I stand corrected. Curious, if you're a for profit co, making money off of the pain that is PDF, why in world did you allow for a free derivative of your work? Seems like a racket that we have to pay to do basic 'file' type things with the 'standard' that is PDF.Leija

© 2022 - 2024 — McMap. All rights reserved.