Is lang=unknown attribute valid?
Asked Answered
C

2

7

Given an HTML document in a specific language (english).

I have defined a lang attribute on the tag :

<html lang="en">

Some texts in the page are written in another language (for example french) :

<span lang="fr">
  blabla...
</span>

But, if I can not identify the language, but I know it is NOT english, can I set "unknown" as a valid value for lang attribute?

<span lang="unknown">
  blabla...
</span>

I read this in w3c documentation, but I am not sure if "the default value is [...] unknown" means that "unknown" is a real value...

http://www.w3.org/TR/html4/struct/dirlang.html

lang = language-code [CI] This attribute specifies the base language of an element's attribute values and text content. The default value of this attribute is unknown.

Cecrops answered 14/3, 2013 at 17:54 Comment(2)
I don't know what 'lang' is/does, but perhaps the answer is to not use the attribute at all if it is unknown?Castle
The value is inherited. <body lang="en"> This is english. I don't know what language this quote is in: <q>Se trata de españoles</q></body>. Leaving it blank claims that hte unknown language is English.Inessive
O
9

The wording in the HTML 4.01 specification is obscure; the value unknown is not a valid language tag, and the spec uses the word “unknown” as a normal English word. That is, the default value is a value that indicates that the language is not known, but this value is not explicitly specified.

The spec is partly outdated in this area, as it refers to a superseded RFC on language tags. The current RFC is RFC 5646, Tags for Identifying Languages, also known as BCP (Best Current Practice) 47. It refers, among other things, to ISO 639-2 as regards to primary language tags, and they contain the code und for “undetermined”. So technically you could use lang=und, but the RFC says: “This subtag SHOULD NOT be used unless a language tag is required and language information is not available or cannot be determined. Omitting the language tag (where permitted) is preferred.”

And this is the approach adopted in HTML5 RC, which says about lang: “Setting the attribute to the empty string indicates that the primary language is unknown. [BCP47]”

Thus, for text in unidentifiable language you can use e.g. <span lang="">...</span>.

This is, in principle, useful when you have indicated the language at a higher level of element nesting. Setting lang="" may mean that user agents disable spelling checks and language-specific formatting, for example, though this is still rather theoretical.

Observant answered 14/3, 2013 at 20:48 Comment(1)
I agree, if that's what the HTML 4.01 spec says then it's badly worded. If "unknown" is being used as an English word, then to me it says "We don't know what the default value is". I say there should be no statement of default value at all, as section 8.1.2 covers it, and so it's only "unknown" if there's no lang attribute on the element or any ancestor element, no Content-Language header, and no language set in the UA config.Glaze
B
0

I would prefer not setting it at all if you don't have to. Note that the lang= value will be derived from its enclosing (parent) element if it is not set.

If you are somehow forced to set it, setting it to the default unknown as you already suggested seems perfectly legit according to the specification.

Boost answered 14/3, 2013 at 17:55 Comment(5)
If it's not set, it's taken from the parent element, which is known. But if the language of a html fragment is ot known?Chun
Which is not good if you have something like this: <body lang="en"> This is english. I don't know what language this quote is in: <q>Se trata de españoles</q></body>Inessive
Oswald Right, but has a box containing elements of a various amount of languages a language itself? This one seems to be unknown or merely undefined or multiple as well.Boost
The lang attribute applies specifically to text content, i.e. not to child elements. However, if a child element does not define it's own lang attribute, the one from the closest ancestor that defines on is assuemd. I see no problem with that.Chun
The value unknown is not allowed.Lionhearted

© 2022 - 2024 — McMap. All rights reserved.