R users are a global community. From Xiamen to Santiago, Addis Ababa to Tbilisi, Ogallala to Adelaide, R users are legion and their native languages are as well.
If the target audience of your package extends beyond the English-speaking world, or if you want to make the user experience for the non-native English speakers using your tools, you can consider internationalizing your package by translating its user-facing communications (verbose messages, warnings, errors, etc.).
Unfortunately, to do so has some tedious aspects, namely, learning the gettext system of .po
files, .pot
templates, and .mo
binaries – another syntax rife with quirks and idiosyncrasies.
potools
is designed to minimize the friction to translating your package by abstracting away as many details of the .po
system of translations as possible.
The core function of potools
, translate_package
, is a one-stop-shop for interactively setting your package up for translation and providing those translations, all without ever having to touch a .po
file yourself.
potools
is a UTF-8 package – all .po
and .pot
files it produces will be treated as UTF-8.
translate_package()
The primary feature of potools
is the translate_package()
function, which is designed to be your first & only stop for typical experience internationalizing a package
A .pot template can be used by translators to produce translations; just running translate_package()
on your package’s source will produce this file (or files, if your package has both R and C/C++ messages to translated), e.g.
# run from the directory into which potools is cloned, i.e., 'potools' here is a file path
translate_package('potools')
To further add translations in your desired language, include the target language in the translate_package()
call.
Running the following will launch an interactive dialog prompting for translations message-by-message:
# es.po & es.mo Spanish translation files will be produced
translate_package('potools', 'es')
base
R provides several functions for messaging that are natively equipped for translation (they all have a domain
argument): stop()
, warning()
, message()
, gettext()
, gettextf()
, ngettext()
, and packageStartupMessage()
.
While handy, some developers may prefer to write their own functions, or to write wrappers of the provided functions that provide some enhanced functionality (e.g., templating or automatic wrapping). In this case, the default R tooling for translation (xgettext()
, xngettext()
xgettext2pot()
, update_pkg_po()
from tools
) will not work, but potools::translate_package()
and its workhorse potools::get_message_data()
provide an interface to continue building translations for your workflow.
Suppose you wrote a function stopf()
that is a wrapper of stop(gettextf())
used to build templated error messages in R, which makes translation easier for translators (see below), e.g.:
stopf = function(fmt, ..., domain = NULL) {
stop(gettextf(fmt, ...), domain = domain, call. = FALSE)
}
Note that potools
itself uses just such a wrapper internally to build error messages! To extract strings from calls in your package to stopf()
and mark them for translation, use the argument custom_translation_functions
:
get_message_data(
'/path/to/my_package',
custom_translation_functions = list(R = 'stopf:fmt|1')
)
This invocation tells get_message_data()
to look for strings in the fmt
argument in calls to stopf()
. 1
indicates that fmt
is the first argument.
More specifications are possible, including marking custom calls in C/C++; see ?translate_package
for a full explanation.
Note that this is only good for marking such strings for translation – for them to actually be translated during code execution, your custom function will ultimately have to be pass the strings to gettext()
or gettextf()
(as done in the stopf()
example). Without doing so, the string will always just appear in English.
translate_package
also runs some diagnostics that can help make your package more translation-ready (see below).
A cracked message is one like:
In its current state, translators will be asked to translate three messages independently:
The message has been cracked; it might not be possible to translate a string as generic as “There are” into many languages – context is key!
To keep the context, the error message should instead be build with gettextf
like so:
Now there is only one string to translate! Note that this also allows the translator to change the word order as they see fit – for example, in Japanese, the grammatical order usually puts the verb last (where in English it usually comes right after the subject).
translate_package
detects such cracked messages and suggests a gettextf
-based approach to fix them.
cat()
Only strings which are passed to certain base
functions are eligible for translation, namely stop
, warning
, message
, packageStartupMessage
, gettext
, gettextf
, and ngettext
(all of which have a domain
argument that is key for translation).
However, it is common to also produce some user-facing messages using cat
– if your package does so, it must first use gettext
or gettextf
to translate the message before sending it to the user with cat
.
translate_package
detects strings produced with cat
and suggests a gettext
- or gettextf
-based fix.
This diagnostic detects any literal char
arrays provided to common messaging functions in C/C++, namely ngettext()
, Rprintf()
, REprintf()
, Rvprintf()
, REvprintf()
, R_ShowMessage()
, R_Suicide()
, warning()
, Rf_warning()
, error()
, Rf_error()
, dgettext()
, and snprintf()
. To actually translate these strings, pass them through the translation macro _
.
NB: Translation in C/C++ requires some additional #include
s and declarations, including defining the _
macro. See the Internationalization section of Writing R Extensions for details.
potools
is on CRAN as of v0.2.0.
You can also install the latest development version from GitHub. The easiest way to do so:
One observation about offering translated messages is that non-English messages are harder to google. A few suggestions:
+ You can give error messages a unique identifier (e.g. numbering). This may be harder to do for "established" packages since adding identifiers might be a breaking change.
+ End users can switch to an English locale mid-session by running `Sys.setenv(LANGUAGE = 'en')` -- error messages will be produced in English until they set `LANGUAGE` again.
+ You could write a custom error wrapper that produces the error both in English and as a translation.
Technical terms are par for the course in R packages; showing users similar terms for the same concept might lead to needless confusion. R recommends using the ISI Multilingual Glossary of Statistical Terms to help overcome this issue.
What domain should you use when translating Spanish? There’s es_AR
, es_BO
, es_CL
, es_DO
, es_HN
, … do I really need to provide a separate file for my Nicaraguan (es_NI
) users?
No, but you could. Typically, you are best off creating one set of translations under the language’s general domain (here, es
). Once translations exist for es
, users in all of the more specific locales will see the messages for es
whenever they exist. If you really do want to provide more regionally-specific error messages (awesome!), you can either (1) create a whole new set of translations for each region or (2) write translations only for the region-specific messages. The latter is how R handles messages that differ on British/American spelling, for example.
Say a user is running in es_GT
and triggers an error. R will first look for a translation into es_GT
; if none is defined, it will look for a translation into es
. If none is defined again, it will finally fall back to the package’s default language (i.e., whatever language is written in the source code, usually English).
potools
is by no means the first tool for facilitating internationalization; other open-source projects have deeper experience in this domain and as a result there are some relatively mature options for working with gettext/the po ecosystem in general. Here is a smattering of such tools that I’ve come across: