How to set the default language in Apache FOP
Asked Answered
B

3

8

I'm generating PDF files using Apache FOP 2.1.

For this I am trying to set the default language to be English.
This is supposed to be verified after the creation of the PDF via Adobe Reader's option File/Properties/Advanced/Reading Options. This value currently is empty.

Image showing language is not set

I have tried setting xml:lang="en" in fo:root element, in first page-sequence or in the very first element of the .xsl file... Nothing seams to do the trick.

Any Advice?
Thanks Dimitris.

Update:
I have tried 2 more options as suggested in the answers, neither of the 2 worked

  1. <fo:declarations> <pdf:catalog xmlns:pdf="http://xmlgraphics.apache.org/fop/‌extensions/pdf"> <pdf:string key="Lang">en</pdf:string> </pdf:catalog>
  2. <x:xmpmeta xmlns:x="adobe:ns:meta/"> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <rdf:Description rdf:about="" xmlns:dc="http://purl.org/dc/elements/1.1/"> <dc:title>the document title</dc:title> <dc:language>en</dc:language>

Update 2
Have started a bounty on this question.
Any help appreciated and rewarderd

Beelzebub answered 18/8, 2016 at 12:0 Comment(6)
Hm, trying to use the answers from stackoverflow.com/questions/38347687 and the pdf reference I would <fo:declarations><pdf:catalog xmlns:pdf="http://xmlgraphics.apache.org/fop/extensions/pdf"><pdf:string key="Lang">en</pdf:string></pdf:catalog>... (no idea where the ; in the xmlns comes from) (as well as xml:lang="en" in the fo:root) but it will not show the language in the acrobat tag neitherTrembly
On my previous comment, however, exiftool on that pdf file will show Language : en so it makes it into the file correctly. The problem may be that the field in the acrobat properties dialog refers to something different. See also PDF16: Setting the default language using the /Lang entry in the document catalog of a PDF documentTrembly
My guess is that it just does not work in Reader. If you go to the page above: w3.org/TR/WCAG20-TECHS/PDF16.html and then download the exact sample they reference and look at the properties, it shows blank. I also searched my PDFs from FOP, RenderX, Word ... some tagged, all kinds. I could not find a single one that had any value when viewed this way.Areca
Thanks @KevinBrown looks like a limitation of Adobe Reader... Maybe you need the pro versionBeelzebub
Do you have a sample of a PDF ... any PDF produced any way at all .. that shows the language in Reader? If you do, perhaps we can look at the PDF and see what is different. If you cannot find one, then the answer to the question is ... "ask Adobe" (although that would be too short for Stackoverflow as an answer :) )Areca
I have downloaded Adobe Acrobat Pro, when setting in <fo:root> xml:lang="en" it is displayed in Pro's properties, but not in Adobe Reader (free version). Looks like a limitation/bugBeelzebub
T
2

According to everything I've tried, the Language field in the Document Properties shown by adobe reader has not much to do with the document language actually found in the pdf (It's alway empty).

The xml:lang="en" tag in the fo:root with FOP 2.1 is sufficient for exiftool to list the document as having english language and also for the PDFDebugger from pdfbox to show the /Lang Entry in the Document catalog which is where the language is specified according to the pdf_reference 1.7 Table 3.25 "Entries in the catalog dictionary".

The code

<fo:declarations>
 <pdf:catalog 
   xmlns:pdf="http://xmlgraphics.apache.org/fop/‌extensions/pdf"‌​>
    <pdf:string key="Lang">en</pdf:string>
   </pdf:catalog>

does exactly the same in the pdf output as the xml:lang.

Additonally you can also set the language in the metadata (also inside fo:declarations)

<x:xmpmeta 
  xmlns:x="adobe:ns:meta/" 
  xmlns:dc="http://purl.org/dc/elements/1.1/" 
  xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
    <rdf:RDF>
      <rdf:Description rdf:about="">
        <dc:language><rdf:Bag><rdf:li>en</rdf:li></rdf:Bag></dc:language>

But my fop 2.1 seems to set that too automagically if the xml:lang is there.

So it would be interesting if someone drops in who can explain what that document language property in the adobe reader actually shows.

Trembly answered 18/8, 2016 at 19:4 Comment(3)
Well they work in they set the language of the document. You could put your question different: What is it that Adobe Reader does show in the Language field? Maybe this attracts those people that can answer it. Or maybe check some forums at adobe.Trembly
let's see what comes out here Reader Document Property "Language"Trembly
I see you are also keen to know - asking the fop mailing list is also a good idea ;-)Trembly
T
1

You may need to set language (http://www.w3.org/TR/xsl/#language). See 'language' in http://xmlgraphics.apache.org/fop/compliance.html

You'd think that xml:lang would work, but you say it doesn't. The FOP FAQ has an answer about setting language to control hyphenation, so it's worth a try even though language is defined to apply only to fo:block and fo:character.

You might need enable accessible PDF. See https://xmlgraphics.apache.org/fop/2.1/accessibility.html, which has references to the language being set in the PDF (including from xml:lang).

Thier answered 18/8, 2016 at 13:57 Comment(1)
Thanks Tony, I have already tried these... They don't have any affectBeelzebub
S
0
<fo:root xmlns:fo="http://www.w3.org/1999/XSL/Format" language="en">

Language is shown as "English" in Reading Options using language in fo:root with Apache FOP 2.9, the latest release at the time of writing. (I appreciate that this might have been better as a comment on Tony Graham's post, but I have insufficient reputation.)

Schweitzer answered 14/6 at 11:23 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.