PDF form field in a text editor
Asked Answered
G

1

0

To make the long story short; I would like to edit a read-only field from a pdf form using a text editor ONLY. I've succeeded but I would like to understand why in some cases it doesn't work...

I've noticed that if I have a version PDF 1.5 of my original document (without fields, saved by word 2010 as pdf) and add the field with Acrobat Pro XI, save it using Save as other... -> Optimized PDF and make it compatible with Acrobat 6.0. My field will look like this in a text editor (notepad++):

<</AP<</N 28 0 R>>/DA(/Helv 12 Tf 0 g)/DV(mytextfield)/F 4/FT/Tx/Ff 1/MK<<>>/P 3 0 

R/Rect[99.4934 686.99 249.493 708.99]/Subtype/Widget/T(%mytextfield)/Type/Annot/V(mytextfield)>>
endobj
28 0 obj
<</BBox[0.0 0.0 150.0 22.0]/FormType 1/Length 88/Matrix[1.0 0.0 0.0 1.0 0.0 0.0]/Resources<</Font<</Helv 20 0 R>>/ProcSet[/PDF/Text]>>/Subtype/Form/Type/XObject>>stream
/Tx BMC 
q
1 1 148 20 re
W
n
BT
/Helv 12 Tf
0 g
2 6.548 Td
(mytextfield) Tj

Which is very easy to modify as every time you see 'mytextfield', it's the content of my field and where you see '%mytextfield', it's the name of my field.

On the other hand, if I take my PDF 1.5 (saved by word 2010) and instead of making an optimized saving (after adding the field) using acrobat pro XI I save it normally (save as), I get a PDF 1.6 with the following (in notepad++):

<</AcroForm 25 0 R/Lang(fr-CH)/MarkInfo<</Marked true>>/Metadata 3 0 R/Pages 15 0 R/StructTreeRoot 8 0 R/Type/Catalog>>
endobj
19 0 obj
<</Annots 26 0 R/Contents 22 0 R/CropBox[0 0 595.32 841.92]/Group<</CS/DeviceRGB/S/Transparency/Type/Group>>/MediaBox[0 0 595.32 841.92]/Parent 15 0 R/Resources<</ExtGState<</GS0 30 0 R>>/Font<</TT0 33 0 R>>/ProcSet[/PDF/Text]>>/Rotate 0/StructParents 0/Tabs/S/Type/Page>>
endobj
20 0 obj
<</BBox[0.0 0.0 150.0 22.0]/FormType 1/Length 85/Matrix[1.0 0.0 0.0 1.0 0.0 0.0]/Resources<</Font<</Helv 28 0 R>>/ProcSet[/PDF/Text]>>/Subtype/Form/Type/XObject>>stream
/Tx BMC 
q
1 1 148 20 re
W
n
BT
/Helv 12 Tf
0 g
2 6.548 Td
(mytextfield) Tj

Which is not an easy format to edit the field (if I change mytextfield, I get a corrupted document!). Now, it would be just fine if when I open this PDF 1.6 in acrobat pro and save it using the optimized PDF trick mentioned above the field would transform to the first one; but it's not the case! Instead I get the exact same field format.

So my questions are the following:

  1. Is there a way to ensure that my pdf form, no matter which PDF version the original is, get converted to the right format (field easy to edit) using Acrobat Pro or any other program?
  2. Is there a way to easily edit the PDF 1.6 fields?
Gynecium answered 27/7, 2014 at 23:16 Comment(9)
In the course of editing the field contents, do you make sure you don't change the size of field contents? If you don't, do you update criss references accordingly? If you don't, you create invalid documents for sure.Epperson
Are you talking about my second question? If yes, I tried to change the length in the bbox tag but didn't help. I also tried to keep the same text length with no more success. Please note that in version 1.5 I don't even need to change the length! PS what is the criss reference?Gynecium
No, I'm talking about your editing in general. PDF files have a cross reference table (or stream) indicating the respective offset of each indirect object (each nnn 0 obj...endobj). If during your edits you replace something by something longer or shorter, you break those cross references. BTW, how did you test that your edits are ok? Hopefully not by merely opening the file in PDF viewers. PDF viewers often repair PDFs on the fly...Epperson
Yes by looking at a viewer (reader) :o) ok so what should I exactly do? I understand the cross reference thing but is there any documentation explaining this? Or could you please tell me step by step what to do? Thanks!Gynecium
Essentially editing a PDF manually in a text editor is a sure way to shoot in one's foot. Thus, my advice is not to do it. If you still want to try, you'll find the documentation here.Epperson
you're probably right but it's the only solution I found for my problem: #24944904Gynecium
I came to the conclusion that doing this id doable but not ideal! Anyway, I don't really have the choice... Changing the length and the text is easy in first example (1.5) however changing the xref (cross reference) is a bit more tricky. I can do it but do you know any piece of code (javascript only) that could do it (don't want to re-invent the wheel if I have the choice!).Gynecium
I only know server side solutions.Epperson
OK, will try to implement it myself. Thank you very much! Could you please answer this question with the length and cross reference comment you added so I can validate it?Gynecium
E
1

The OP in comments made clear that during his edits he replaced PDF data by something longer or shorter.

This in general is a bad idea because PDF files have a cross reference table (or stream) indicating the respective offset of each indirect object (each nnn 0 obj...endobj). Replacing PDF data with data of different length invalidates these cross reference information for objects following the editing positions.

Thus, to have a valid PDF after editing, one at least has to update cross reference information which in a mere text editor is a real hassle (in case of cross reference tables) or even virtually impossible (in case of compressed cross reference streams).

Details can be found in the PDF specification ISO 32000-1.

Furthermore the OP indicated that he checked for document validity after his edits by opening them in a PDF viewer.

This also is not a good idea because well-known PDF viewers generally have the tendency to try and repair invalid PDFs on the fly without necessarily showing this. Programs manipulating PDFs more often require valid PDFs (at least valid in the aspect they are manipulating) as input and, therefore, probably will reject or (even worse) garble the edited PDFs.

The OP indicates his task has been described in this question. Unless there is some appropriate JS library out there, he will essentially have to program one according to his needs.

It might be advantageous to try and use incremental updates here instead of manipulating the inner information of the source PDF. For this look at section 7.5.6 Incremental Updates in the specification mentioned above.

PS The OP asked

would incremental updates work with read-only fields

Incremental updates simply are a different way to organize your changes - everything you can change inside the original file you can also change using incremental updates. Actually you can even do more using incremental updates: In case of signed documents often certain changes to the document still are allowed, but these changes must be made as incremental updates as otherwise the signature would be structurally broken.

Epperson answered 29/7, 2014 at 8:39 Comment(3)
very good answer! Two more questions: would incremental updates work with read-only fields? If yes, how can I add an incremental updates via acrobat (or any other program) so I can see how it is structured in a text editor?Gynecium
and for the second question?Gynecium
how can I add an incremental updates - if you have a signed document allowing annotations to be added, use a current Adobe Reader or Acrobat to add such annotations; this change will automatically be done as incremental update. In your former Question you mentioned iText; if you are ok with Java, create a PdfStamper constructed with the append argument set to true; this will also create incremental updates.Epperson

© 2022 - 2024 — McMap. All rights reserved.