What are language codes in Chrome's implementation of the HTML5 speech recognition API?
Asked Answered
S

4

47

Chrome implemented the HTML5 speech recognition API. Many languages are supported. I wanna know which languages are supported and each language's corresponding code which is used in the HTML element's lang attribute.

For instance:

  • Polish (pl-PL)
  • Turkish (tr-TR)

Thank you!

Student answered 10/1, 2013 at 12:7 Comment(1)
Not find enough info about supported language of that API.Read it..lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Feb/…Materiality
S
79

Ok, if it is not published, we can try to at least figure this out. Let me put this table for the beginning and we will refine it if someone has more information.

I'm making assumption that supported languages shall be similar to those supported by voice search and that google uses standard language codes and does that consistently across its services.

I've looked up languages supported by voice search on wikipedia

I've found language codes here, on google language settings page and here

EDIT: I've experimented with backend voice recognition service. I've run a series of tests where I've passed the same english speech sample to the API but specified different dialect every time. It looks like:

  • If a language is not supported, recognition falls back to en-US (looks like it recognizes that the sample is in english)
  • If a dialect is not supported (or doesn't exist) recognition falls back to main dialect or en-US in some cases
  • Main dialect can be specified just as first part of identifier. So 'en-US' and 'en' gives same results.
  • Recognition for some languages, like chinese and japanese gives results in english, different from en-US though, which is strange. Probably the sample is different very much from chinese and the service is clever to figure that out.

I treat a dialect as supported if recognition gives a different result from en-US and from main dialect for the language. Still, to verify it 100% we need to run samples for each language.

Legend

  • + Most of all supported, because test gives a result different from en-US and main dialect.
  • .+ Absent on wikipedia but most of all supported, because test gives result different from en-US and main dialect.
  • +? Most of all supported because it is listed on wikipedia. But test on my sample gives result identical to the main dialect. So either this is a coinsidense or language code is wrong.
  • .+? Not listed on wikipedia but looks like supported, because test gives result different from en-US and main dialect.

Languages

  • + Afrikaans af
  • + Basque eu
  • + Bulgarian bg
  • + Catalan ca
  • + Arabic (Egypt) ar-EG
  • +? Arabic (Jordan) ar-JO
  • + Arabic (Kuwait) ar-KW
  • +? Arabic (Lebanon) ar-LB
  • + Arabic (Qatar) ar-QA
  • + Arabic (UAE) ar-AE
  • .+ Arabic (Morocco) ar-MA
  • .+ Arabic (Iraq) ar-IQ
  • .+ Arabic (Algeria) ar-DZ
  • .+ Arabic (Bahrain) ar-BH
  • .+ Arabic (Lybia) ar-LY
  • .+ Arabic (Oman) ar-OM
  • .+ Arabic (Saudi Arabia) ar-SA
  • .+ Arabic (Tunisia) ar-TN
  • .+ Arabic (Yemen) ar-YE
  • + Czech cs
  • + Dutch nl-NL
  • + English (Australia) en-AU
  • +? English (Canada) en-CA
  • + English (India) en-IN
  • + English (New Zealand) en-NZ
  • + English (South Africa) en-ZA
  • + English(UK) en-GB
  • + English(US) en-US
  • + Finnish fi
  • + French fr-FR
  • + Galician gl
  • + German de-DE
  • + Hebrew he
  • + Hungarian hu
  • + Icelandic is
  • + Italian it-IT
  • + Indonesian id
  • + Japanese ja
  • + Korean ko
  • + Latin la
  • + Mandarin Chinese zh-CN
  • + Traditional Taiwan zh-TW
  • +? Simplified China zh-CN ?
  • + Simplified Hong Kong zh-HK
  • + Yue Chinese (Traditional Hong Kong) zh-yue
  • + Malaysian ms-MY
  • + Norwegian no-NO
  • + Polish pl
  • +? Pig Latin xx-piglatin
  • + Portuguese pt-PT
  • .+ Portuguese (brasil) pt-BR
  • + Romanian ro-RO
  • + Russian ru
  • + Serbian sr-SP
  • + Slovak sk
  • + Spanish (Argentina) es-AR
  • + Spanish(Bolivia) es-BO
  • +? Spanish( Chile) es-CL
  • +? Spanish (Colombia) es-CO
  • +? Spanish(Costa Rica) es-CR
  • + Spanish(Dominican Republic) es-DO
  • + Spanish(Ecuador) es-EC
  • + Spanish(El Salvador) es-SV
  • + Spanish(Guatemala) es-GT
  • + Spanish(Honduras) es-HN
  • + Spanish(Mexico) es-MX
  • + Spanish(Nicaragua) es-NI
  • + Spanish(Panama) es-PA
  • + Spanish(Paraguay) es-PY
  • + Spanish(Peru) es-PE
  • + Spanish(Puerto Rico) es-PR
  • + Spanish(Spain) es-ES
  • + Spanish(US) es-US
  • + Spanish(Uruguay) es-UY
  • + Spanish(Venezuela) es-VE
  • + Swedish sv-SE
  • + Turkish tr
  • + Zulu zu
Slew answered 13/1, 2013 at 9:11 Comment(6)
Will the supported languages be the same as Android's voice recognition service's?Student
I don't know. According to numerous sources in internet speech input api uses private google's endpoint, so the recognition is done on google servers. It's my assumption that it would be logical (but not obligatory true) to have the same service do recognition both for voice search and speech input api.Slew
The Arabic ones all sound the same to me.Mantua
The way to do this would probably be to write a script to download a sample for each and do a hash on the result. Check for which ones are distinct.Mantua
@Sergey Zyuzin: How about Persian?Orchestra
Last link is now personal.psu.edu/ejp10/symbolcodes/bylanguage/index.htmlGrater
E
21

I know this is an old post, but since this information is annoyingly hard to find I thought I'd post a list for anyone who might be looking. Please leave a note if you find any errors or omissions.

{
  "Afrikaans": [
    ["South Africa", "af-ZA"]
  ],
  "Arabic" : [
    ["Algeria","ar-DZ"],
    ["Bahrain","ar-BH"],
    ["Egypt","ar-EG"],
    ["Israel","ar-IL"],
    ["Iraq","ar-IQ"],
    ["Jordan","ar-JO"],
    ["Kuwait","ar-KW"],
    ["Lebanon","ar-LB"],
    ["Morocco","ar-MA"],
    ["Oman","ar-OM"],
    ["Palestinian Territory","ar-PS"],
    ["Qatar","ar-QA"],
    ["Saudi Arabia","ar-SA"],
    ["Tunisia","ar-TN"],
    ["UAE","ar-AE"]
  ],
  "Basque": [
    ["Spain", "eu-ES"]
  ],
  "Bulgarian": [
    ["Bulgaria", "bg-BG"]
  ],
  "Catalan": [
    ["Spain", "ca-ES"]
  ],
  "Chinese Mandarin": [
    ["China (Simp.)", "cmn-Hans-CN"],
    ["Hong Kong SAR (Trad.)", "cmn-Hans-HK"],
    ["Taiwan (Trad.)", "cmn-Hant-TW"]
  ],
  "Chinese Cantonese": [
    ["Hong Kong", "yue-Hant-HK"]
  ],
  "Croatian": [
    ["Croatia", "hr_HR"]
  ],
  "Czech": [
    ["Czech Republic", "cs-CZ"]
  ],
  "Danish": [
    ["Denmark", "da-DK"]
  ],
  "English": [
    ["Australia", "en-AU"],
    ["Canada", "en-CA"],
    ["India", "en-IN"],
    ["Ireland", "en-IE"],
    ["New Zealand", "en-NZ"],
    ["Philippines", "en-PH"],
    ["South Africa", "en-ZA"],
    ["United Kingdom", "en-GB"],
    ["United States", "en-US"]
  ],
  "Farsi": [
    ["Iran", "fa-IR"]
  ],
  "French": [
    ["France", "fr-FR"]
  ],
  "Filipino": [
    ["Philippines", "fil-PH"]
  ],
  "Galician": [
    ["Spain", "gl-ES"]
  ],
  "German": [
    ["Germany", "de-DE"]
  ],
  "Greek": [
    ["Greece", "el-GR"]
  ],
  "Finnish": [
    ["Finland", "fi-FI"]
  ],
  "Hebrew" :[
    ["Israel", "he-IL"]
  ],
  "Hindi": [
    ["India", "hi-IN"]
  ],
  "Hungarian": [
    ["Hungary", "hu-HU"]
  ],
  "Indonesian": [
    ["Indonesia", "id-ID"]
  ],
  "Icelandic": [
    ["Iceland", "is-IS"]
  ],
  "Italian": [
    ["Italy", "it-IT"],
    ["Switzerland", "it-CH"]
  ],
  "Japanese": [
    ["Japan", "ja-JP"]
  ],
  "Korean": [
    ["Korea", "ko-KR"]
  ],
  "Lithuanian": [
    ["Lithuania", "lt-LT"]
  ],
  "Malaysian": [
    ["Malaysia", "ms-MY"]
  ],
  "Dutch": [
    ["Netherlands", "nl-NL"]
  ],
  "Norwegian": [
    ["Norway", "nb-NO"]
  ],
  "Polish": [
    ["Poland", "pl-PL"]
  ],
  "Portuguese": [
    ["Brazil", "pt-BR"],
    ["Portugal", "pt-PT"]
  ],
  "Romanian": [
    ["Romania", "ro-RO"]
  ],
  "Russian": [
    ["Russia", "ru-RU"]
  ],
  "Serbian": [
    ["Serbia", "sr-RS"]
  ],
  "Slovak": [
    ["Slovakia", "sk-SK"]
  ],
  "Slovenian": [
    ["Slovenia", "sl-SI"]
  ],
  "Spanish": [
    ["Argentina", "es-AR"],
    ["Bolivia", "es-BO"],
    ["Chile", "es-CL"],
    ["Colombia", "es-CO"],
    ["Costa Rica", "es-CR"],
    ["Dominican Republic", "es-DO"],
    ["Ecuador", "es-EC"],
    ["El Salvador", "es-SV"],
    ["Guatemala", "es-GT"],
    ["Honduras", "es-HN"],
    ["México", "es-MX"],
    ["Nicaragua", "es-NI"],
    ["Panamá", "es-PA"],
    ["Paraguay", "es-PY"],
    ["Perú", "es-PE"],
    ["Puerto Rico", "es-PR"],
    ["Spain", "es-ES"],
    ["Uruguay", "es-UY"],
    ["United States", "es-US"],
    ["Venezuela", "es-VE"]
  ],
  "Swedish": [
    ["Sweden", "sv-SE"]
  ],
  "Thai": [
    ["Thailand", "th-TH"]
  ],
  "Turkish": [
    ["Turkey", "tr-TR"]
  ],
  "Ukrainian": [
    ["Ukraine", "uk-UA"]
  ],
  "Vietnamese": [
    ["Viet Nam", "vi-VN"]
  ],
  "Zulu": [
    ["South Africa", "zu-ZA"]
  ]
}

Edit: I also found this list, which is probably more current: https://cloud.google.com/speech-to-text/docs/languages

Edit 2: Adding this list of sample voices as well: https://cloud.google.com/text-to-speech/docs/voices

Evictee answered 30/1, 2017 at 20:34 Comment(0)
A
5

Use the following code to get all available voices for the speech API in your browser:

var voices = speechSynthesis.getVoices();
      for(var i = 0; i < voices.length; i++ ) {
        console.log("Voice " + i.toString() + ' ' + voices[i].name + ' ' + voices[i].uri);
      }

At this time only Chrome and Safari support the Web Speech API (although Safari only supports the Text to Speech functionalities). Curiously Firefox OS supports TTS but the browser version does not.

The list of languages depends on what browser you are on according to both the documentation and my tests (user agent dependent).

In Safari you also get lots of languages available (I believe over 40). In Chrome, at this time you get the following list:

Voice 0 Google US English undefined

Voice 1 Google UK English Male undefined

Voice 2 Google UK English Female undefined

Voice 3 Google Español undefined

Voice 4 Google Français undefined

Voice 5 Google Italiano undefined

Voice 6 Google Deutsch undefined

Voice 7 Google 日本人 undefined

Voice 8 Google 한국의 undefined

Voice 9 Google 中国的 undefined

Voice 10 native undefined

Arnie answered 12/8, 2014 at 14:36 Comment(4)
The question was asking about speech recognition... this only gives a list for speech synthesis / TTSEvictee
At the time, I thought it could only recognize the same languages it could synthesize. Is it not so?Arnie
I don't think they are connected, although I'm sure they are similar. I've found the drop-down list in Google's Cloud Speech API fairly accurate for recognition in the Web Speech API cloud.google.com/speechEvictee
Cool. I don't think this existed back then though ;)Arnie
W
0

Here you have @TimHayes in a LinkedHashMap where you can fetch the values. Im using LinkedHashMap so I can get the position of the map.

    LinkedHashMap<String,String> country = new LinkedHashMap<String,String>();


    country.put("South Africa", "af-ZA");
    country.put("Algeria", "ar-DZ");
    country.put("Bahrain", "ar-BH");
    country.put("Egypt", "ar-EG");
    country.put("Israel", "ar-IL");
    country.put("Iraq", "ar-IQ");
    country.put("Jordan", "ar-JO");
    country.put("Kuwait", "ar-KW");
    country.put("Lebanon", "ar-LB");
    country.put("Morocco", "ar-MA");
    country.put("Oman", "ar-OM");
    country.put("Palestinian Territory", "ar-PS");
    country.put("Qatar", "ar-QA");
    country.put("Saudi Arabia", "ar-SA");
    country.put("Tunisia", "ar-TN");
    country.put("UAE", "ar-AE");
    country.put("Spain", "eu-ES");
    country.put("Bulgaria", "bg-BG");
    country.put("Spain", "ca-ES");
    country.put("China (Simp.)", "cmn-Hans-CN");
    country.put("Hong Kong SAR (Trad.)", "cmn-Hans-HK");
    country.put("Taiwan (Trad.)", "cmn-Hant-TW");
    country.put("Hong Kong", "yue-Hant-HK");
    country.put("Croatia", "hr_HR");
    country.put("Czech Republic", "cs-CZ");
    country.put("Denmark", "da-DK");
    country.put("Australia", "en-AU");
    country.put("Canada", "en-CA");
    country.put("India", "en-IN");
    country.put("Ireland", "en-IE");
    country.put("New Zealand", "en-NZ");
    country.put("Philippines", "en-PH");
    country.put("South Africa", "en-ZA");
    country.put("United Kingdom", "en-GB");
    country.put("United States", "en-US");
    country.put("Iran", "fa-IR");
    country.put("France", "fr-FR");
    country.put("Philippines", "fil-PH");
    country.put("Spain", "gl-ES");
    country.put("Germany", "de-DE");
    country.put("Greece", "el-GR");
    country.put("Finland", "fi-FI");
    country.put("Israel", "he-IL");
    country.put("India", "hi-IN");
    country.put("Hungary", "hu-HU");
    country.put("Indonesia", "id-ID");
    country.put("Iceland", "is-IS");
    country.put("Italy", "it-IT");
    country.put("Switzerland", "it-CH");
    country.put("Japan", "ja-JP");
    country.put("Korea", "ko-KR");
    country.put("Lithuania", "lt-LT");
    country.put("Malaysia", "ms-MY");
    country.put("Netherlands", "nl-NL");
    country.put("Norway", "nb-NO");
    country.put("Poland", "pl-PL");
    country.put("Brazil", "pt-BR");
    country.put("Portugal", "pt-PT");
    country.put("Romania", "ro-RO");
    country.put("Russia", "ru-RU");
    country.put("Serbia", "sr-RS");
    country.put("Slovakia", "sk-SK");
    country.put("Slovenia", "sl-SI");
    country.put("Argentina", "es-AR");
    country.put("Bolivia", "es-BO");
    country.put("Chile", "es-CL");
    country.put("Colombia", "es-CO");
    country.put("Costa Rica", "es-CR");
    country.put("Dominican Republic", "es-DO");
    country.put("Ecuador", "es-EC");
    country.put("El Salvador", "es-SV");
    country.put("Guatemala", "es-GT");
    country.put("Honduras", "es-HN");
    country.put("México", "es-MX");
    country.put("Nicaragua", "es-NI");
    country.put("Panamá", "es-PA");
    country.put("Paraguay", "es-PY");
    country.put("Perú", "es-PE");
    country.put("Puerto Rico", "es-PR");
    country.put("Spain", "es-ES");
    country.put("Uruguay", "es-UY");
    country.put("United States", "es-US");
    country.put("Venezuela", "es-VE");
    country.put("Sweden", "sv-SE");
    country.put("Thailand", "th-TH");
    country.put("Turkey", "tr-TR");
    country.put("Ukraine", "uk-UA");
    country.put("Viet Nam", "vi-VN");
    country.put("South Africa", "zu-ZA");
Womanhater answered 8/5, 2017 at 2:42 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.