pandoc markdown to pdf: fixing missing character warnings?
Asked Answered
V

1

9

I have seen How do I fix "missing character" warnings when converting from docx to pdf using Pandoc and LaTeX? - but unfortunately, the advice there does not seem to apply to this test case:

$ git clone https://github.com/raspberrypi/documentation.git
$ cd documentation/configuration
$ pandoc *.md --pdf-engine=xelatex -o result.pdf
[WARNING] Missing character: There is no ┌ (U+250C) in font [lmmono10-regular]:!
[WARNING] Missing character: There is no ─ (U+2500) in font [lmmono10-regular]:!
[WARNING] Missing character: There is no ─ (U+2500) in font [lmmono10-regular]:!
[WARNING] Missing character: There is no ─ (U+2500) in font [lmmono10-regular]:!
[WARNING] Missing character: There is no ─ (U+2500) in font [lmmono10-regular]:!
...
[WARNING] Missing character: There is no ─ (U+2500) in font [lmmono10-regular]:!
[WARNING] Missing character: There is no ─ (U+2500) in font [lmmono10-regular]:!
[WARNING] Missing character: There is no ┘ (U+2518) in font [lmmono10-regular]:!

So, there are some specific "box drawing" glyphs, missing from Latin Modern Mono - so probably they are used in a context of code snippets.

Is there a way to provide a "fallback font" in this case? Or how could I solve this otherwise, so I can produce a (Latex) PDF from these markdown files via pandoc?


EDIT: found:

... so I tried:

header-includes.yaml:

---
header-includes: |
    \usepackage{combofont}
    \setupcombofont{multiscript-regular}
    {
      {file:lmsans10-regular.otf:\combodefaultfeat} at #1pt,
      {file:DejaVuSans.ttf} at #1pt,
      {file:NotoSansCJK-Regular.ttc(0)} at #1pt
    }
    {
       {} ,
       fallback,
       fallback
    }
    \DeclareFontFamily{TU}{multiscript}{}
    \DeclareFontShape {TU}{multiscript}{m}{n} {<->combo*multiscript-regular}{}
    \fontfamily{multiscript}\selectfont
...

... and then I tried (note, using just single file from the repo, raspi-config.md, here):

$ pandoc header-includes.yaml ./raspi-config.md --pdf-engine=lualatex -o result.pdf
Error producing PDF.
! Paragraph ended before \setupcombofont  was complete.
<to be read again>
\par
l.61

... so, cannot get this approach to work, either ...

Vibrato answered 7/12, 2020 at 8:4 Comment(5)
I think you need to fix the line breaks in your header include. Try with a - before each command and avoid line breaks inside the commandsDevi
To find a font that contains these glyphs, you could use the fantastic gitlab.com/islandoftex/albatrossDevi
Thanks @Devi - albatross looks great! I can see it is from someone related to Tex Users Group, and I was hoping it is already in texlive, but I couldn't find it ... So I looked into compiling it myself, but it's in Kotlin, needs gradlew to build, and I'm not really a Java guy, so cannot do that either. Otherwise, I use the Python approach listed in jdhao.github.io/2018/04/08/matplotlib-unicode-characterVibrato
AFAIK Albatros was uploaded to ctan yesterday, so it should appear in texlive within the next couple of daysDevi
@Vibrato You don't need to compile yourself. Choose the download arrow on the repo page and select the build:linux:jdk8 artifact. There's the executable JAR in there. If you're on Windows you probably want to download it from the windows-paths branch because the first version was not prepared for strange operating system paths ;)Kipton
D
2

You can see what's happening by checking how pandoc parsed the input, e.g. by converting it back to Markdown: pandoc -t native -s -t markdown -V header-includes='' header-includes.yaml

---
header-includes: |
  ```{=tex}
  \usepackage{combofont}
  \setupcombofont{multiscript-regular}
  ```
  { {file:lmsans10-regular.otf:`\combodefaultfeat`{=tex}} at \#1pt,
  {file:DejaVuSans.ttf} at \#1pt, {file:NotoSansCJK-Regular.ttc(0)} at
  \#1pt } { {} , fallback, fallback }
  `\DeclareFontFamily{TU}{multiscript}{}`{=tex}
  `\DeclareFontShape {TU}{multiscript}{m}{n}`{=tex}
  {\<-\>combo\*multiscript-regular}{}
  `\fontfamily{multiscript}`{=tex}`\selectfont`{=tex}
---

You'll note that some parts are not recognized as TeX but as plain text. Force interpretation as a LaTeX block by using raw attribute syntax:

---
header-includes: |
  - ```{=latex}
    \usepackage{combofont}
    \setupcombofont{multiscript-regular}
    {
      {file:lmsans10-regular.otf:\combodefaultfeat} at #1pt,
      {file:DejaVuSans.ttf} at #1pt,
      {file:NotoSansCJK-Regular.ttc(0)} at #1pt
    }
    {
       {} ,
       fallback,
       fallback
    }
    \DeclareFontFamily{TU}{multiscript}{}
    \DeclareFontShape {TU}{multiscript}{m}{n} {<->combo*multiscript-regular}{}
    \fontfamily{multiscript}\selectfont
    ```
...

Or you could write the TeX snippets into a file and pass that via the -H option, which will insert the file contents unchanged into the intermediary LaTeX file.

Demonstration answered 7/12, 2020 at 16:39 Comment(3)
Just wondering if the problem has been resolved...Auroora
@Auroora What do you mean? Whether this answer solves the question?Demonstration
right. I tried the same but it did not work. Anyway I bypassed the issue by add the ascii coded tree structure into the markdown file and that generates good looking PDF without issues.Auroora

© 2022 - 2024 — McMap. All rights reserved.