Goal
I have several thousand Khmer-language .docx
files and would like to convert them to .pdf
format using Pandoc.
Background
I installed Pandoc using MacPorts. Pandoc requires LaTeX for PDF conversion, so I installed MacTeX. Installation appears to have gone properly, and I've been able to convert English-language .docx
files into .pdf
without difficulty.
Attempt 1
When I try to convert a Khmer-language file (you can find an example at https://briancroxall.net/pandoc/transcription.docx) to PDF, I use the following command:
pandoc transcription.docx -s -o transcript.pdf
I receive the following error:
Error producing PDF.
! Package inputenc Error: Unicode character អ (U+17A2)
(inputenc) not set up for use with LaTeX.
See the inputenc package documentation for explanation.
Type H <return> for immediate help.
...
l.64 ...�នៅសម័យប៉ុល ពត។}
Try running pandoc with --pdf-engine=xelatex.
Attempt 2
Following this suggestion, I use this command:
pandoc --pdf-engine=xelatex transcription.docx -s -o transcript.pdf
Pandoc then throws an error message for every Khmer character in the text:
[WARNING] Missing character: There is no អ in font [lmroman10-bold]:mapping=tex-text;!
[WARNING] Missing character: There is no ្ in font [lmroman10-bold]:mapping=tex-text;!
[WARNING] Missing character: There is no ន in font [lmroman10-bold]:mapping=tex-text;!
...
A PDF is produced by this process (see https://briancroxall.net/pandoc/transcript.pdf), but it is largely empty.
Issue
As best as I can tell, this suggests that Khmer characters are not being available in the LaTeX engine that I'm trying to use to do the conversion. Whether or not that is so, how can I manage this file conversion successfully?
mainfont
toKhmer MN
orKhmer Sangam MN
see pandoc.org/MANUAL.html#fonts (and alvinalexander.com/macos/…).... maybe also tex.stackexchange.com/a/234796/33952 – Leannleanna