Bilingual (English and Portuguese) documentation in an R package
Asked Answered
Z

2

24

I am writing a package to facilitate importing Brazilian socio-economic microdata sets (Census, PNAD, etc). I foresee two distinct groups of users of the package:

  • Users in Brazil, who may feel more at ease with the documentation in Portuguese. The probably can understand English to some extent, but a foreign language would probably make the package feel less "ergonomic".

  • The broader international users community, from whom English documentation may be a necessary condition.

Is it possible to write a package in a way that the documentation is "bilingual" (English and Portuguese), and that the language shown to the user will depend on their country/language settings?

Also,

Is that doable within the roxygen2 documentation framework?

I realise there is a tradeoff of making the package more user-friendly by making it bilingual vs. the increased complexity and difficulty to maintain. General comments on this tradeoff from previous expirience are also welcome.

EDIT: following the comment's suggestion I cross-posted r-package-devel mailling list. HERE, then follow the answers at the bottom. Duncan Murdoch posted an interesting answer covering some of what @Brandons answer (bellow) covers, but also including two additional suggestions that I think are useful:

  • have the package in one language, but the vignettes for different languages. I will follow this advice.

  • have to versions of the package , let's say 1.1 and 1.2, one on each language

Zugzwang answered 18/5, 2016 at 1:39 Comment(4)
If you don't get useful answers here this might be a good question to ask on the r-package-devel mailing list ...Munitions
@BenBolker you were correct, no answer for days here, so I posted on the mailling list (see edit above). A couple hours latter there is already an answer. TKsZugzwang
Great question. I feel it would also be beneficial to others to make it available here on SO, so if you wish to assemble an answer based on the response you received on the mailing list that would be great.Sears
There is also this project funded by the R Consortium : 4dpiecharts.com/2016/03/23/rl10n-let-r-speak-your-languageColonic
T
15

According to Ropensci, there is no standard mechanism for translating package documentation into non-English languages. They describe the typical process of internationalization/localization as follows:

To create non-English documentation requires manual creation of supplemental .Rd files or package vignettes.

Packages supplying non-English documentation should include a Language field in the DESCRIPTION file.

And some more info on the Language field:

A ‘Language’ field can be used to indicate if the package documentation is not in English: this should be a comma-separated list of standard (not private use or grandfathered) IETF language tags as currently defined by RFC 5646 (https://www.rfc-editor.org/rfc/rfc5646, see also https://en.wikipedia.org/wiki/IETF_language_tag), i.e., use language subtags which in essence are 2-letter ISO 639-1 (https://en.wikipedia.org/wiki/ISO_639-1) or 3-letter ISO 639-3 (https://en.wikipedia.org/wiki/ISO_639-3) language codes.

Care is needed if your package contains non-ASCII text, and in particular if it is intended to be used in more than one locale. It is possible to mark the encoding used in the DESCRIPTION file and in .Rd files.

Regarding encoding...

First, consider carefully if you really need non-ASCII text. Many users of R will only be able to view correctly text in their native language group (e.g. Western European, Eastern European, Simplified Chinese) and ASCII.72. Other characters may not be rendered at all, rendered incorrectly, or cause your R code to give an error. For .Rd documentation, marking the encoding and including ASCII transliterations is likely to do a reasonable job. The set of characters which is commonly supported is wider than it used to be around 2000, but non-Latin alphabets (Greek, Russian, Georgian, …) are still often problematic and those with double-width characters (Chinese, Japanese, Korean) often need specialist fonts to render correctly.

On a related note, R does, however, provide support for "errors and warnings" in different languages - "There are mechanisms to translate the R- and C-level error and warning messages. There are only available if R is compiled with NLS support (which is requested by configure option --enable-nls, the default)."

Tytybald answered 21/5, 2016 at 15:25 Comment(0)
A
4

Besides bilingual documentation, please allow me the following comment: Given your two "target" groups, it may be assumed that some of your users will be running non-English OS (typically, Windows in Portuguese). When importing time series data (or any date entries as a matter of fact), due to different "date" formatting (English vs. non-English), you may get different "results" (i.e. misinterpeted date entries) when importing to English/non-English machines. I have some experience with those issues (I often work with Czech-language-based OSs) and -other than ad-hoc coding- I don't find a simple solution. (If you find this off-topic, please feel free to delete)

Alyssa answered 25/5, 2016 at 15:22 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.