Pandoc: setting language of exported Word docx
Asked Answered
T

2

12

I export Word docx from markdown using Pandoc.

By default, everything seems to be marked as English in the docx file. So I tried to override this, e.g. by command line option pandoc -s -S images.md -o images.docx -V lang=de or in the header YAML:

---
subtitle: <%= @report.name %>
toc-title: <%= t('.toc_title') %>
lang: de
---

But none seems to work, all content in the exported docx file is marked red by the language spelling feature, telling me that words are not found in English.

How can I override the language?

Update

I tried specifying the language in the docx-file, by simply selecting all text (Cmd+A, I'm on OSX) and clicking on the language button on the bottom left.

enter image description here

Also, I tried using Tools -> Language:

enter image description here

None of it did have an effect though.

Update

Interestingly, when exporting to HTML, the language is set correctly in the <html> attribute.

Tieratierce answered 7/12, 2016 at 21:31 Comment(8)
I think you need to use the "--reference-docx" option, as discussed here. Create a reference docx file, and then override the language there.Household
I already tried this. But I'm not 100% sure where to specify the language in the docx-file, I simply selected all text and clicked on the language button on the bottom left. But maybe there's a general language option for the full document?Tieratierce
I have set the language through Tools -> Language in Word 365 on OSX. Didn't solve the problem.Tieratierce
Interestingly, when exporting to HTML, the language is set correctly in the <html> attribute.Tieratierce
Thanks for voting down without giving a reason.Tieratierce
reference-docx can only set styles and a few properties (margins, page size, header, and footer) but language is not one of them <pandoc.org/MANUAL.html#options-affecting-specific-writers>; a workaround is to write a doc macro that does that, and post-process your file.Jana
Agree with scoa, it seems that some post processing is the only way for now. That said, it's an issue that has been discussed already on github. It shouldn't be that hard to fix (after all, docx is just a zip with xml files inside), but of course that's easier said than done.Household
Is there any news on this? Pandoc has undergone quite some updates since this question was posted (2016).Tieratierce
T
5

I have just checked again, and with Pandoc v 2.9.2.1 it seems to set the language correctly:

english docx

german docx

Hooray!! Thanks, Pandoc community! <3

Would be interesting though to know when exactly it was added (couldn't find a mention in the https://pandoc.org/changelog.txt).

Tieratierce answered 2/6, 2020 at 13:6 Comment(0)
J
5

There is currently no way to set the language of a doc, docx, or odt document output by pandoc. A pandoc GitHub issue discusses this problem (noted in the comments by @Serge Correia).

Indeed, localization in other formats goes through templates, but the doc, docx, and odt equivalent of a template, reference files, only set a few selected styles and properties. For instance, reference-docx: (from the pandoc README)

The contents of the reference docx are ignored, but its stylesheets and document properties (including margins, page size, header, and footer) are used in the new docx.

Jana answered 25/1, 2017 at 12:52 Comment(5)
Thank you for explaining. Maybe there is a way to "hack" the Pandoc executable: I mean, Pandoc takes a default docx file somewhere I guess, maybe I can hack this one to be in a specific language?Tieratierce
The github issue has discussion over how to do this; maybe you could try implementing it in your own fork. For now, my workaround has been to write an openoffice/word macro to take care of localization (for French: change the language, change the quotation marks, add unbreakable space before !?:;).Jana
I thought about this solution, too. But I didn't manage it to do it, as I'm no visual basic macros programmer (also, I'm on Mac... Office 365). Would you mind send me your version of the macro? :)Tieratierce
@JoshuaMuheim It's a libreoffice one -- should work with openoffice too --, and it's very badly written, but here you go: gist.github.com/scoavoux/2ff93f30ec4dedae1a9d087ddec40d5d You'll need to install the libreoffice python macro moduleJana
Thank you, I will take a look at it.Tieratierce
T
5

I have just checked again, and with Pandoc v 2.9.2.1 it seems to set the language correctly:

english docx

german docx

Hooray!! Thanks, Pandoc community! <3

Would be interesting though to know when exactly it was added (couldn't find a mention in the https://pandoc.org/changelog.txt).

Tieratierce answered 2/6, 2020 at 13:6 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.