Localization for REST APIs
Asked Answered
C

2

21

I am starting this discussion to gather more info on localization practices for APIs. It seems HTTP does NOT provide sufficient guidance and even the state of practice is not sufficient enough.

The basic problem is that APIs may need to provide content that is dependent on the user culture, country, language and timezone. For example a German user would like to read messages in German language, with European metric dates, numbers, units, using Euro currency and in Central European Timezone.

Reading through RFC 7231 Section 5.3.5 Accept-Language and further into RFC 4647 one may think Accept-Language is sophisticated enough and is what should be done. There are several notable shortcomings though:

  1. Language tags may not be precise enough e.g. user may only request language without country code and thus leave ambiguity as: "de, en;q=0.8"
  2. Even if the user supplies both language and country preferences it is not clear how to tie the selection of message locale and value formatting locale. For example if a user requests: "hu_HU, en_US;q=0.9" while the application lacks Hungarian messages and is written in Java that knows how to format date in Hungarian. So should the app use English messages with Hungarian dates or rather provide English messages with US dates? The actual situation may be more complex.
  3. Timezone is not present in the language tags. There is no HTTP standard header for this it seems.

I see Microsoft have thought about #2 in ASP.Net and introduce the notion of Culture and UICulture to separate selection of message language from formatting.

In Java world Spring have introduced TimeZoneAwareLocaleContext to address #3

W3c have issued guideline to Accept-Language used for locale setting. This more or less says that Accept-Language is not enough

So what is your thinking?

  1. Do you know of APIs tat solve this problem in comprehensive way? Pointers?
  2. Should APIs accept multiple values for selecting message language, value formatting locale and timezone?
  3. Should Accept-Language be used at all?
Chiffon answered 27/12, 2018 at 13:48 Comment(1)
Formatting and using locale specific numbers etc can be derived from the CLDR data base as many vendors seem to do cldr.unicode.org However still some problems may exists: 1. Disparity of available translations to CLDR i.e. CLDR covers many locales while most APIs will have few translations 2. Is it possible that people want to use their own formatting settings that contradict the CLDR values?Chiffon
C
13

Ok guys,

here is a summary of how I answer my question. I hope this helps future API authors.

The fundamental requirements for an UI based on top of API excluding currency presentation seem to be:

  1. Select the best language out of the available product translations using RFC 4647 list of language ranges
  2. Select the best data format out of the available using RFC 4647 list of language ranges
  3. Allow clients to provide distinct preferences for translation and format. There will be cases where people will not find the best translation and yet prefer to see the proper formatting aligned with their culture.
  4. Allow clients to specify a timezone using IANA TZDB identifiers
  5. Format data elements using Unicode CLDR http://cldr.unicode.org/
  6. Use named placeholders in localization bundles e.g. "{drive} is corrupt" is easier to translate properly than "{1} is corrupt"

On the REST HTTP headers I suggest use of 3 headers

  1. accept-language - used for selecting translation and following the guidelines of RFC 7231 https://www.rfc-editor.org/rfc/rfc7231#section-5.3.5
  2. format-locale - used to select data formatting style if different from the translation language preferences. Again list of language range elements. Defaults to accept-language if omitted.
  3. timezone - used to select timezone for rendering date and time values. This should be valid timezone ID from the IANA TZDB https://www.iana.org/time-zones

Implementation wise it seems Java 8 and later have full capability to implement a globalized application. Other languages and older Java versions seem to have varying degrees of issues.

Chiffon answered 11/10, 2019 at 21:4 Comment(1)
I can't find Format-Locale or Timezone headers in the http specification. Is this a custom header?Repentance
W
5

I would keep all data in a universal locale independent format. For numbers using . as a decimal separator, date and time using ISO 8601 and in UTC, etc.

Provide localized text only if it absolutely necessary. In that case get the locale from accept-language header field, and if you have the localized string pass that. If not fallback to the string you have.

For example, you might a multilingual product database that contains product data in several languages. When you write an API for the database you can select the product data in user's language (if any).

Here is a sample.

Words answered 11/10, 2019 at 21:52 Comment(1)
Very good idea to delay or defer localization from services. In fact you one go a bit further and only provide the data that will be localized leaving it to UI to do the actual localization. This may work better in a micro-service environment where services call each other. Eventually some services will need to cache results from other services and deliver them to multiple end users in different locales. With this idea an API may return message key and a json object with the arguments in locale neutral format instead of string ready for display.Chiffon

© 2022 - 2024 — McMap. All rights reserved.