Convert PDF to PDF/A3 or PDF/A-1 to PDF/A-3
Asked Answered
C

1

5

I'm testing iTextSharp to generate ZUGFeRD-Files. My first step was to generate a ZUGFeRD conform file from an existing PDF/A-3 file. This was successfull by using PDFACopy and creating the necessary PDFFileSpecification.

The next step would be to generate a PDF/A-3 file from an existing PDF or PDF/A-1 file and this is the hard part.

First, when I'm trying to use PDFACopy in combination with a regular PDF (not PDF/A) im getting an error that PDFACopy can only be used with PDF/A-conform files. My first question is, how to get an PDF/A-3-conform file from a PDF with iTextSharp?

To reduce the gap, I decided to convert the PDF into PDF/A-1 file with ghostscript (cf. How to use ghostscript to convert PDF to PDF/A or PDF/X?). This was succesfull and I tried again. Then the error "Different PDF/A version." was thrown. It seems that I can't copy from existing PDF/A-1 into a new PDF/A-3. How can I create this PDF/A-3 from an existing PDF(/A-1)? Is that even possible?

Here is my code:

XmlDocument xmlDoc = new XmlDocument();
        xmlDoc.Load(XML);
        byte[] xmlBytes = Encoding.Default.GetBytes(xmlDoc.OuterXml);

        Document doc = new Document();
        PdfReader src_reader = new PdfReader(pdfPath);    

        FileStream fs = new FileStream(DEST, FileMode.Create, FileAccess.ReadWrite);

        PdfACopy aCopy = new PdfACopy(doc, fs, PdfAConformanceLevel.ZUGFeRD);

        doc.AddLanguage("de-DE");
        doc.AddTitle("title");
        doc.SetPageSize(src_reader.GetPageSizeWithRotation(1));

        aCopy.SetTagged();
        aCopy.UserProperties = true;
        aCopy.PdfVersion = PdfCopy.VERSION_1_7;
        aCopy.ViewerPreferences = PdfCopy.DisplayDocTitle;
        aCopy.CreateXmpMetadata();
        aCopy.XmpWriter.SetProperty(PdfAXmpWriter.zugferdSchemaNS, PdfAXmpWriter.zugferdDocumentFileName, "ZUGFeRD-invoice.xml");

        //Ab hier können keine Metadaten mehr geschrieben werden
        doc.Open();

        ICC_Profile icc = ICC_Profile.GetInstance(new FileStream(ICM, FileMode.Open));
        aCopy.SetOutputIntents("Custom", "", "http://www.color.org", "sRGB IEC61966-2.1", icc);

        [...add the dictionary to doc..]
        aCopy.AddDocument(src_reader);
        doc.Close();

One more question: addDocument works, but when I'm using copy.addPage(copy.getImportedPage(src_reader, i)), an error "the document has no pages" will be thrown. WHY?

Chios answered 11/12, 2015 at 7:59 Comment(0)
L
9

1. Can you convert a regular PDF to a PDF/A document?

The answer is: it depends.

PDF/A is a subset of PDF and involves some obligations (e.g. all fonts must be embedded) and restrictions (e.g. no Javascript is allowed). iText can't "automatically" convert a regular PDF to a PDF/A for a number of reasons. For instance: if a font is not embedded, iText doesn't know which font to use to replace the unembedded font, nor where to find the necessary font program. Usually this requires human interaction because replacing one font by an arbitrary other font usually results in very ugly PDFs.

The answer is: it depends because some people are using iText to convert PDF to PDF/A, but this involves a lot of programming and human decisions. I see that you succeed when using GhostScript. In that case, GhostScript is making some decisions in your place. This can lead to acceptable results. In some cases, the result will not be acceptable (e.g. very odd-looking PDFs if the fonts don't match).

2. Can you convert a PDF/A-1 file to a PDF/A-3 file?

The PDF/A standard is written in such a way that old versions of the PDF/A specification are never outdated. Newer versions only add newer functionality. For instance: PDF/A-1 was based on the PDF 1.4 specification. Optional Content functionality (OCG) was introduced in PDF 1.5. The introduction of OCG is one of the differences between PDF/A-2 and PDF/A-1.

This means that every file that conforms to PDF/A-1 automatically conforms to PDF/A-2. However, a PDF/A-2 file could contain functionality that isn't supported in PDF/A-1.

3. What is the difference between PDF/A-2 and PDF/A-3?

PDF/A-2 and PDF/A-3 are identical, except for one difference: a PDF/A-3 file can have attachments that aren't PDF/A files. For instance: a PDF/A-3 file can have a Word file as attachment, an XLS file, a plain text file,... You mention ZUGFeRD: in that case, the PDF/A-3 file has at least an XML file as attachment.

Summarized:

This is a broad answer to a broad question (your question goes in many different directions, so it's hard to give you a specific answer). Why don't you use the already built-in ZUGFeRD support to create the invoices? Read ZUGFeRD, the future of invoicing for more info.

Lifeanddeath answered 11/12, 2015 at 8:40 Comment(2)
Thank you for the fast respsonse! I have .pdf-files that always have the same fonts. So there shouldn't be a problem to convert it to PDF/A? If a PDF contains Fonts, it's automatically a PDF/A?, isn't it? How do you program this? I can't use PDFCopy from PDF to a PDF/A File? At the moment I am going to create my.xml-file without C#. So my task is it to attach the created .XML File to a regular PDF. As you mentioned, I need PDF/A-3 for this. So I need to convert this regular PDF. The built-in solution creates a whole new PDF Layout that i don't need. The problem is to get the PDF/A3Chios
I would like to have a all-in-one solution with iText, so that ghostscript is not needed.Chios

© 2022 - 2024 — McMap. All rights reserved.