Parse Accept-Language header in Java
Asked Answered
K

7

46

The accept-language header in request is usually a long complex string -

Eg.

Accept-Language : en-ca,en;q=0.8,en-us;q=0.6,de-de;q=0.4,de;q=0.2

Is there a simple way to parse it in java? Or a API to help me do that?

Kirman answered 26/7, 2011 at 0:58 Comment(6)
It is not really that complicated: you split the part after the colon by commas, then look for a semicolon in each group, then parse the language codes and q factors.Malacology
And the language codes tend to correspond to java.util.Locales after you replace the '-'s with '_'s.Lichtenfeld
Do you really need to parse it yourself, or can you use [Http]ServletRequest.getLocale[s] and let the container handle the complexity?Macready
@bkail : please put your comment in an answer, since it is 'right'Ingres
Sure. It wasn't obvious whether this was a servlet question or not, though I guess the presence of java-ee tag suggests the OP might be satisfied using a servlet API.Macready
Actually the best answer is the last one: Locale.forLanguageTag(locale).Kidderminster
M
50

I would suggest using ServletRequest.getLocales() to let the container parse Accept-Language rather than trying to manage the complexity yourself.

Macready answered 27/7, 2011 at 15:17 Comment(4)
Unless you're planning to directly support every possible locale, ServerRequest.getLocales is probably a better choice.Unheardof
The problem is ServletRequest.getLocales returns the server locale if the user does not provides a valid one. To prevent language spam requests you must parse it yourself where LanguageRange.parse(String) is convenient.Bresnahan
@Bresnahan It's easy enough to just check for the existence of the Accept-Language header when relevant. You're right, though, that newer JDKs have added additional APIs that could be useful (this answer is from 2011!).Macready
If bots abuse the Accept-Language to spam your website it exists but without a valid element. Absent can still be valid in case some google bots crawl your page. See webmasters.stackexchange.com/questions/101473/…Bresnahan
H
45

For the record, now it is possible with Java 8:

Locale.LanguageRange.parse()
Hamster answered 12/3, 2015 at 1:7 Comment(4)
And if you want the list of locales, you can use Locale.LanguageRange.parse(requestedLangs) .stream().sorted(Comparator.comparing(Locale.LanguageRange::getWeight).reversed()).map(range -> new Locale(range.getRange())).collect(Collectors.toList());Therewith
@Alex: According to the javadoc, you don't need to sort the returned List: "Unlike a weighted list, language ranges in a prioritized list are sorted in the descending order based on its priority. The first language range has the highest priority and meets the user's preference most.". So your code could simply be: Locale.LanguageRange.parse(requestedLangs).stream().map(range -> new Locale(range.getRange())).collect(Collectors.toList());Plosion
This doesn't return a properly parsed locale, .e.g. "en-GB" will get parsed to a language called "en-gb" with no country.Ashjian
The Locale.filter methods should be used to convert a LanguageRange to a set of matching LocaleAndromeda
G
16

Here's an alternative way to parse the Accept-Language header which doesn't require a servlet container:

String header = "en-ca,en;q=0.8,en-us;q=0.6,de-de;q=0.4,de;q=0.2";
for (String str : header.split(",")){
    String[] arr = str.trim().replace("-", "_").split(";");

  //Parse the locale
    Locale locale = null;
    String[] l = arr[0].split("_");
    switch(l.length){
        case 2: locale = new Locale(l[0], l[1]); break;
        case 3: locale = new Locale(l[0], l[1], l[2]); break;
        default: locale = new Locale(l[0]); break;
    }

  //Parse the q-value
    Double q = 1.0D;
    for (String s : arr){
        s = s.trim();
        if (s.startsWith("q=")){
            q = Double.parseDouble(s.substring(2).trim());
            break;
        }
    }

  //Print the Locale and associated q-value
    System.out.println(q + " - " + arr[0] + "\t " + locale.getDisplayLanguage());
}

You can find an explanation of the Accept-Language header and associated q-values here:

http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html

Many thanks to Karl Knechtel and Mike Samuel. Thier comments to the original question helped point me in the right direction.

Gidgetgie answered 27/8, 2012 at 11:32 Comment(0)
T
5

We are using Spring boot and Java 8. This works

In ApplicationConfig.java write this

@Bean

public LocaleResolver localeResolver() {
    return new SmartLocaleResolver();
}

and I have this list in my constants class that has languages that we support

List<Locale> locales = Arrays.asList(new Locale("en"),
                                         new Locale("es"),
                                         new Locale("fr"),
                                         new Locale("es", "MX"),
                                         new Locale("zh"),
                                         new Locale("ja"));

and write the logic in the below class.

public class SmartLocaleResolver extends AcceptHeaderLocaleResolver {
          @Override
         public Locale resolveLocale(HttpServletRequest request) {
            if (StringUtils.isBlank(request.getHeader("Accept-Language"))) {
            return Locale.getDefault();
            }
            List<Locale.LanguageRange> ranges = Locale.LanguageRange.parse("da,es-MX;q=0.8");
            Locale locale = Locale.lookup(ranges, locales);
            return locale ;
        }
}
Topknot answered 21/4, 2016 at 19:17 Comment(0)
P
4

ServletRequest.getLocale() is certainly the best option if it is available and not overwritten as some frameworks do.

For all other cases Java 8 offers Locale.LanguageRange.parse() as previously mentioned by Quiang Li. This however only gives back a Language String, not a Locale. To parse the language strings you can use Locale.forLanguageTag() (available since Java 7):

    final List<Locale> acceptedLocales = new ArrayList<>();
    final String userLocale = request.getHeader("Accept-Language");
    if (userLocale != null) {
        final List<LanguageRange> ranges = Locale.LanguageRange.parse(userLocale);

        if (ranges != null) {
            ranges.forEach(languageRange -> {
                final String localeString = languageRange.getRange();
                final Locale locale = Locale.forLanguageTag(localeString);
                acceptedLocales.add(locale);
            });
        }
    }
    return acceptedLocales;
Pantywaist answered 31/8, 2015 at 14:19 Comment(1)
This still allows for a nonsene Locale instance like "test" that are used by spam requests since LanguageRange.parse only checks for synax and not IANA language rules. You need to check the locale against valid locales like Locale.getAvailableLocales() to be sure it is valid.Bresnahan
B
2

The above solutions lack some kind of validation. Using ServletRequest.getLocale() returns the server locale if the user does not provides a valid one.

Our websites lately received spam requests with various Accept-Language heades like:

  1. secret.google.com
  2. o-o-8-o-o.com search shell is much better than google!
  3. Google officially recommends o-o-8-o-o.com search shell!
  4. Vitaly rules google ☆*:。゜゚・*ヽ(^ᴗ^)ノ*・゜゚。:*☆ ¯\_(ツ)_/¯(ಠ益ಠ)(ಥ‿ಥ)(ʘ‿ʘ)ლ(ಠ_ಠლ)( ͡° ͜ʖ ͡°)ヽ(゚Д゚)ノʕ•̫͡•ʔᶘ ᵒᴥᵒᶅ(=^ ^=)oO

This implementation can optional check against a supported list of valid Locale. Without this check a simple request with "test" or (2, 3, 4) still bypass the syntax-only validation of LanguageRange.parse(String).

It optional allows empty and null values to allow search engine crawler.

Servlet Filter

final String headerAcceptLanguage = request.getHeader("Accept-Language");

// check valid
if (!HttpHeaderUtils.isHeaderAcceptLanguageValid(headerAcceptLanguage, true, Locale.getAvailableLocales()))
    return;

Utility

/**
 * Checks if the given accept-language request header can be parsed.<br>
 * <br>
 * Optional the parsed LanguageRange's can be checked against the provided
 * <code>locales</code> so that at least one locale must match.
 *
 * @see LanguageRange#parse(String)
 *
 * @param acceptLanguage
 * @param isBlankValid Set to <code>true</code> if blank values are also
 *            valid
 * @param locales Optional collection of valid Locale to validate any
 *            against.
 *
 * @return <code>true</code> if it can be parsed
 */
public static boolean isHeaderAcceptLanguageValid(final String acceptLanguage, final boolean isBlankValid,
    final Locale[] locales)
{
    // allow null or empty
    if (StringUtils.isBlank(acceptLanguage))
        return isBlankValid;

    try
    {
        // check syntax
        final List<LanguageRange> languageRanges = Locale.LanguageRange.parse(acceptLanguage);

        // wrong syntax
        if (languageRanges.isEmpty())
            return false;

        // no valid locale's to check against
        if (ArrayUtils.isEmpty(locales))
            return true;

        // check if any valid locale exists
        for (final LanguageRange languageRange : languageRanges)
        {
            final Locale locale = Locale.forLanguageTag(languageRange.getRange());

            // validate available locale
            if (ArrayUtils.contains(locales, locale))
                return true;
        }

        return false;
    }
    catch (final Exception e)
    {
        return false;
    }
}
Bresnahan answered 20/12, 2016 at 1:47 Comment(0)
M
0
Locale.forLanguageTag("en-ca,en;q=0.8,en-us;q=0.6,de-de;q=0.4,de;q=0.2")
Minion answered 29/9, 2015 at 12:56 Comment(3)
I works only for single String, for example: Locale.forLanguageTag("en")Hamiltonian
@Hamiltonian how did you check it? I double checked it works for me.Minion
en-ca isn't evena valid language tag...call getCountry() back on that and see what you get.Jobless

© 2022 - 2024 — McMap. All rights reserved.