Using Unicode in fancyvrb’s VerbatimOut
Asked Answered
R

5

7

Problem

VerbatimOut from the “fancyvrb” package doesn’t play nicely with UTF-8 characters.

Minimal working example:

\documentclass{minimal}
\usepackage[utf8]{inputenc}
\usepackage[T1]{fontenc}
\usepackage{fancyvrb}

\begin{document}
\begin{VerbatimOut}{\jobname.test}
é
\end{VerbatimOut}

\input{\jobname.test}
\end{document}

Error message

When compiled using pdflatex mini, this gives the error

File ended while scanning use of \UTFviii@three@octets.

A different error occurs when the sole occurrence of é above is replaced by something else, e.g. é */:

Package inputenc Error: Unicode char \u8:### not set up for use with LaTeX.

– indicating that in this case, LaTeX succeeds in reading a multi-byte UTF-8 character, but not knowing what to do with it (i.e. it’s the wrong character).

In fact, when I open the produced .test file manually, it contains the character é, but in Latin-1 encoding!

Proof: when I open the files in a hex editor, I get the following:

  • Original file: C3 A9 (corresponds to LATIN SMALL LETTER E WITH ACUTE in UTF-8)
  • Written file: E9 (corresponds to é in Latin-1)

Question

How to set VerbatimOut up correctly?

filecontents* (from “filecontents”) shows that it can work. Unfortunately, I don’t understand either code so I cannot fix fancyvrb’s code by replicating the logic from filecontents manually.

I also cannot use filecontents* instead of VerbatimOut because the former doesn’t work within a \newenvironment, while the latter does.

(Oh, by the way: vanilla Verbatim instead of VerbatimOut also works as expected. The error seems to occur when writing the file, not when reading the verbatim input)

Rickart answered 25/1, 2010 at 14:34 Comment(0)
O
4

Is your end goal to write symbols and accents in Verbatim? Because you can do that like this:

\documentclass{article}
\usepackage{fancyvrb}
\begin{document}
\begin{Verbatim}[commandchars=\\\{\}]
\'{e} \~{e} \`{e} \^{e}
\end{Verbatim}
\end{document}

The commandchars option allows the \ { } characters to work as they normally would.

Source: http://ctan.mirror.garr.it/mirrors/CTAN/macros/latex/contrib/fancyvrb/fancyvrb.pdf

Oppen answered 25/1, 2010 at 15:6 Comment(3)
Thanks for the hint but that solution isn’t usable because the saved verbatim code will be further processed by another program that doesn’t know about LaTeX – so I really need to be able to use Unicode characters directly.Rickart
Ah, okay. Then I am not quite sure. Good luck.Oppen
Updated hyperlink: ctan.mirror.garr.it/mirrors/CTAN/macros/latex/contrib/fancyvrb/…Overvalue
R
3

This is still unfixed? I'll take another look. What exactly do you want: your package to use VerbatimOut, or for it not to interfere with it?

Tests

TexLive 2009's Xelatex compiles fine. With pdflatex, version

This is pdfTeX, Version 3.1415926-1.40.10 (TeX Live 2009)

I get an error message that is rather more useful error message than you got:


! Argument of \UTFviii@three@octets has an extra }.
 
                \par 
l.8 é

? i \makeatletter\show\UTFviii@three@octets
! Undefined control sequence.
\GenericError  ...                                
                                                    #4  \errhelp \@err@     ...
l.8 é

If I were to make a wild guess, I'd say that inputenc with pdftex uses the pdftex primitives to do some hairy storing and restoring of character tables, and some table somewhere has got a rarely mistake in it.

Possibly related

I saw a post by Vladimir Volovich in the pdf-tex mailing list archives, all the way back from 2003, that discusses a conflict between inputenc & fancyvrb, and posts a patch to "solve the problem". Who knows, maybe he faced the same problem? It might be worth emailing him.

Religieux answered 26/1, 2010 at 10:34 Comment(4)
(Yes, this is still unfixed.) That’s indeed a completely different error – although I’d suspect that a } is missing solely because the UTF-8 parser has already read one char too many. But why are you getting “undefined control sequence” when trying to show the definition of the macro?Rickart
@Konrad: I'm afraid debugging problems throwing up \GenericError is something that I have had bad experiences with. I plan on trying again sometime, but it won't be in the next few days.Religieux
No worries. It’s a pretty big problem but unfortunately I don’t really have time to spend on it either at the moment. The easiest course would probably to contact the maintainer of the involved packages (i.e. fancyvrb and inputenc) so I’ll try that once I get the leisure to spend more time on this bug.Rickart
Still unfixed in TeXLive2016.Cobber
G
2

XeTeX has much better Unicode support. The following run through xelatex produces “é” both in \jobname.test and the output PDF.

\documentclass{minimal}
\usepackage{fontspec}
\tracingonline=1
\usepackage{fancyvrb}

\begin{document}
\begin{VerbatimOut}{\jobname.test}
é
\end{VerbatimOut}

\input{\jobname.test}
\end{document}

fontspec loads the Latin Modern fonts, which have Unicode support. The standard TeX Computer Modern fonts don’t have the right tables for Unicode support.

If you use a character that does not have a glyph in the current font, by default XeTeX writes a blank space to the PDF and prints a warning in the log but not on the terminal. \tracingonline=1 prints the warning to the terminal.

Gaygaya answered 26/1, 2010 at 17:48 Comment(1)
Yes, I know about XeTeX and I use it exclusively. But I need this for a general-purpose package and since accented characters do work in normal LaTeX I don’t really want to break what little Unicode support works. This isn’t a Computer Modern font problem.Rickart
W
2

On http://wiki.portal.chalmers.se/agda/pmwiki.php?n=Main.LiterateAgda, they suggest that you should use

\usepackage{ucs}
\usepackage[utf8x]{inputenc}

in the preabmle. I successfully used this in order to insert unicode into a verbatim environment.

Walcott answered 30/6, 2011 at 15:23 Comment(1)
Not all Unicode works, though. In particular, utf8x is pretty much deprecated in favour of plain utf8, and so is the package ucs. There might be solitary cases where your code works while mine doesn’t – but these will be the exception. Ultimately, the real solution is to bin pdflatex and use xelatex instead. I’ve made the switch two years ago, and never looked back.Rickart
C
1
\documentclass{article}

\usepackage{fancyvrb}

\usepackage[utf8]{inputenc}
\usepackage[T1]{fontenc}
\newenvironment{MonVerbatim}{%
\count0=128\relax %
\loop
   \catcode\count0=11\relax
   \advance\count0 by 1\relax 
   \ifnum\count0<256
   \repeat
   \VerbatimOut[commandchars=\\\{\}]{VerbatimText.tex}%
}{\endVerbatimOut}

\newcommand\test{A command producing accented characters éà}

\begin{document}
\begin{MonVerbatim}
     A little bit text in verbatim mode éà_].
     \test
\end{MonVerbatim}
Followed by some accented character éà.
\end{document}

This code is working for me with TeXLive 2018 and pdflatex. Yous should probably avoid changing catcode if you are using a 16 bits TeX (lualatex or xelatex).

You can use the package "iftex" to check the tex engine used.

Comprehensible answered 15/5, 2019 at 17:30 Comment(1)
With texlive 2018 and later \usepackage[utf8]{inputenc} should no longer be necessary, utf8 is now the default encoding for pdflatexSnakebird

© 2022 - 2024 — McMap. All rights reserved.