What <html lang=""> attribute value should I use for a mixed language page?
Asked Answered
H

3

58

I usually use this: <html lang="en">.

However, I am working on a website that will use two languages and mix them up sometimes in the same sentence or heading.

How would the above code look in this case? Can I use <html lang="lang1 lang2">?

Hyperon answered 16/8, 2011 at 10:35 Comment(0)
C
80

As far as I can tell from reading the HTML5 spec the lang attribute:

value must be a valid BCP 47 language tag, or the empty string

Source: http://www.w3.org/TR/html5/dom.html#the-lang-and-xml:lang-attributes

There's no mention in the spec of an array of language strings and every example I've found uses a single language string.

This makes sense since really a given section can only be in one language unless we're creating a new hybrid language.

Since the lang attribute is valid on all HTML elements you can wrap your language specific code in a new tag in order to indicate its language.

<html lang="en">
[...]
<body>
<h1>I am a heading <span lang="de-DE">Eine Überschrift</span></h1>
</body>
</html>
Choirmaster answered 16/8, 2011 at 10:48 Comment(6)
+1, wtf is a BCP 47 language tag? It it like the Posix locales (lang_ISOCC)?Swedenborgianism
BCP stands for "Best Current Practice" and the BCP 47 Track is where the Internet Engineering Task Force (IETF) outlines its IETF language tags. You can read more about it here en.wikipedia.org/wiki/IETF_language_tagChoirmaster
FYI - your w3.org link is broken. Tried briefly to find a replacement, but couldn't... Wish they followed their own advice and didn't break their URIs.Candlestick
New Link: w3.org/TR/html5/dom.html#the-lang-and-xml:lang-attributes It probably changed, because its a working draft, @Zach LInerrable
You're right. In addition, there is a mention of a list of languages in BCP47 (tools.ietf.org/html/bcp47#section-4.3) but it is defined as being the association of "more than one language tag".Araminta
The assumption that a section can only be in one language is incorrect. See en.wikipedia.org/wiki/Code-switching "Hybrid languages" are realVolitive
B
9

As I understand it you should be able to use <html lang="mul"> to indicate Multiple languages.

Choose subtags from the IANA Language Subtag Registry.

Source; https://www.w3.org/TR/2007/NOTE-i18n-html-tech-lang-20070412/#ri20030112.224623362

There is a subtag in the list named Subtag: mul

Source: http://www.iana.org/assignments/language-subtag-registry/language-subtag-registry

However I don't think you will be able to specify exactly which languages you're mixing in the html element. However, as Jamie wrote, you can specify different lang attributes for different elements at the page.

There do exist four special language codes within ISO 639-3 and all of them are also valid within the IANA subtag registry; https://en.wikipedia.org/wiki/ISO_639-3#Special_codes

However, I doubt this have good support from search engines as Google.

Baghdad answered 18/12, 2016 at 3:12 Comment(4)
Although mul's existence, it might not only have negative impact on robots, but also on assistive technology. Imagine a screen reader software trying to provide voice with correct pronunciation. mul would probably handled like an empty value?!Bushwhacker
Yes, but what else tag can be used for content with mixed language? If you mark content with multiple language to just one of the language, that will also not help assistive technology... I think, hopefully, if you have a content with mixed language, you will also have a translation of that content into clean "one language versions" of it, an point rel="alternate" hreflang="" to itBaghdad
Seems the best option forward is, you don't! Use the dominant language in a document as the root lang attribute value and override where necessary. You would have to use the appropriate value anyways to encapsulate any specific language section.Bushwhacker
But what if there is no "dominant language", it is exactly 50/50 or 33/33/33?Busiek
K
4

Adding this answer in April 2020 to provide the latest guidance from the W3C (W3.org) ...

Firstly, no, you cannot use <html lang="lang1 lang2"> since it will not validate properly. This is the result when validating via the W3's Nu Html Checker with more than one language (English and Swahili) in the language attribute of the html tag. This error will result with or without comma(s):

Error: Bad value en fr for attribute lang on element html: The language subtag en swh is not a valid language subtag.

<html lang="en swh">↩</html>

Below is the latest based on the W3C's Declaring language in HTML if you want to declare the language of polyglot web pages with more than one language:

QUICK ANSWER

Always use a language attribute on the html tag to declare the default language of the text in the page. When the page contains content in another language, add a language attribute to an element surrounding that content.

Use the lang attribute for pages served as HTML, and the xml:lang attribute > for pages served as XML. For XHTML 1.x and HTML5 polyglot documents, use both together.

Use language tags from the IANA Language Subtag Registry. You can find subtags using > the unofficial Language Subtag Lookup tool.

Use nested elements to take care of content and attribute values on the same element that are in different languages.

What if element content and attribute values are in different languages?

In the image below from the W3C's site, the link text shows the language of the target page (Spanish) using the language of the target page ("Español"), but an associated title attribute contains a hint in the language of the current page ("Spanish" in English):

enter image description here

The markup for the above should look like follows, where the span element inherits the default en setting of the html element:

<span title="Spanish"><a lang="es" href="qa-html-language-declarations.es">Español</a></span>

What if there's no element to hang your attribute on?

If you want to specify the language of some content but there is no markup around it, use an element such as span or div around the content. Here is an example:

<p>You'd say that in Chinese as <span lang="zh-Hans">中国科学院文献情报中心</span>.</p>

How can you specify metadata for more than one audience language?

Get the server to send the information in the HTTP Content-Language header. If your intended audience speaks more than one language, the HTTP header allows you to use a comma-separated list of languages.

Here is an example of an HTTP header that declares the resource to be a mixture of English, Hindi and Punjabi:

Content-Language: en, hi, pa

Note that this approach is not effective if your page is accessed from a hard drive, disk or other non-server based location. There is currently no widely recognized way of using this kind of metadata inside the page.

In the past, many people used a meta element with the http-equiv attribute set to Content-Language. Due to long-standing confusion and inconsistent implementations of this element, the HTML5 specification made this non-conforming in HTML, so you should no longer use it.

See these links for the details:

Kirkkirkcaldy answered 3/4, 2020 at 20:8 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.