The right way to use SSML with Web Speech API
Asked Answered
K

4

20

Web Speech API specification says:

text attribute
This attribute specifies the text to be synthesized and spoken for this utterance. This may be either plain text or a complete, well-formed SSML document. For speech synthesis engines that do not support SSML, or only support certain tags, the user agent or speech engine must strip away the tags they do not support and speak the text.

It does not provide an example of using text with an SSML document.

I tried the following in Chrome 33:

var msg = new SpeechSynthesisUtterance();
msg.text = '<?xml version="1.0"?>\r\n<speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xml:lang="en-US">ABCD</speak>';
speechSynthesis.speak(msg);

It did not work -- the voice attempted to narrate the XML tags. Is this code valid?
Do I have to provide a XMLDocument object instead?

I am trying to understand whether Chrome violates the specification (which should be reported as a bug), or whether my code is invalid.

Krystlekrystyna answered 22/2, 2014 at 10:5 Comment(7)
Did you ever solve this? The closest thing I can find on SSML and chrome is the documentation for Chrome plugin speech synthesis developer.chrome.com/extensions/ttsKillingsworth
Also are you using Linux. Because it appears that there may be problems there code.google.com/p/chromium/issues/detail?id=88072Killingsworth
@Killingsworth all I found was that bug (I have commented there) -- btw the way I read the description it is not implemented in Mac/Win as well.Krystlekrystyna
It seems fair to say from that bug thread and others, that SSML is simply not yet supported in this Chrome API, and looks like it's not a high priority for anyone. Hope it's added some time, so that speech synthesis can be made more responsive.Taiga
@AndreyShchekin ah yes my mistake, it does appear to be Mac/Win too. Back to doing my TTS server-side for now, need SSML for pitching my singing voice hack.Killingsworth
var xmldoc = new DOMParser().parseFromString(text, 'text/xml') does not help either, so I think matt's point is correct.Quincy
If you're still interested in this at all, I know Chrome's TTS API will work for Mac prosody commands, eg the square root of [[pbas +4]] 2 [[char LTRL]]a[[char NORM]] to the [[pbas +4]] 14 [[char LTRL]]x[[char NORM]] . I do not know if this is only for Mac native voices, though. developer.apple.com/library/mac/documentation/UserExperience/…Frangos
M
5

In Chrome 46, the XML is being interpreted properly as an XML document, on Windows, when the language is set to en; however, I see no evidence that the tags are actually doing anything. I heard no difference between the <emphasis> and non-<emphasis> versions of this SSML:

var msg = new SpeechSynthesisUtterance();
msg.text = '<?xml version="1.0"?>\r\n<speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xml:lang="en-US"><emphasis>Welcome</emphasis> to the Bird Seed Emporium.  Welcome to the Bird Seed Emporium.</speak>';
msg.lang = 'en';
speechSynthesis.speak(msg);

The <phoneme> tag was also completely ignored, which made my attempt to speak IPA fail.

var msg = new SpeechSynthesisUtterance();
msg.text='<?xml version="1.0" encoding="ISO-8859-1"?> <speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2001/10/synthesis http://www.w3.org/TR/speech-synthesis/synthesis.xsd" xml:lang="en-US"> Pavlova is a meringue-based dessert named after the Russian ballerina Anna Pavlova. It is a meringue cake with a crisp crust and soft, light inside, usually topped with fruit and, optionally, whipped cream.  The name is pronounced <phoneme alphabet="ipa" ph="p&aelig;v&#712;lo&#650;v&#601;">...</phoneme> or <phoneme alphabet="ipa" ph="p&#593;&#720;v&#712;lo&#650;v&#601;">...</phoneme>, unlike the name of the dancer, which was <phoneme alphabet="ipa" ph="&#712;p&#593;&#720;vl&#601;v&#601;">...</phoneme> </speak>';
msg.lang = 'en';
speechSynthesis.speak(msg);

This is despite the fact that the Microsoft speech API does handle SSML correctly. Here is a C# snippet, suitable for use in LinqPad:

var str = "Pavlova is a meringue-based dessert named after the Russian ballerina Anna Pavlova. It is a meringue cake with a crisp crust and soft, light inside, usually topped with fruit and, optionally, whipped cream.  The name is pronounced /pævˈloʊvə/ or /pɑːvˈloʊvə/, unlike the name of the dancer, which was /ˈpɑːvləvə/.";
var regex = new Regex("/([^/]+)/");
if (regex.IsMatch(str))
{
    str = regex.Replace(str, "<phoneme alphabet=\"ipa\" ph=\"$1\">word</phoneme>");
    str.Dump();
}   
SpeechSynthesizer synth = new SpeechSynthesizer();
PromptBuilder pb = new PromptBuilder();
pb.AppendSsmlMarkup(str);
synth.Speak(pb);
Macnamara answered 21/11, 2015 at 18:32 Comment(5)
Same problem here.Lovejoy
With the current Chrome 55.0, it isn't even recognizing the XML. My speak(msg) is saying things like "less than questionmark ex em el version equal quote one point zero quote..."Macnamara
I don't think SSML is supported yet :(Mallarme
It works fine now as of May 2020. Version 81.0.4044.138Lierne
I am still on build 81.0.4044.129 due to my company restrictions, and it does NOT work correctly for me. <emphasis> and <phoneme> are still ignored on my system.Macnamara
T
4

There are bugs for this issue currently open with Chromium.

  • 88072: Extension TTS API platform implementations need to support SSML
  • 428902: speechSynthesis.speak() doesn't strip unrecognized tags This bug has been fixed in Chrome as of Sept 2016.
Terrilynterrine answered 19/3, 2015 at 13:28 Comment(2)
And 428902 regressed :/ It's still here.Sweven
The bug is happening in Windows, but not MacOS.Bohannan
B
0

I have tested this, and XML parsing seems to work properly in Windows, however it does not work properly in MacOS.

Bohannan answered 27/12, 2018 at 22:41 Comment(1)
I'm just trying to get XML parsing to work in Windows 10 without success in firefox, chrome, edge. Would be gratefull for any pointers to working examples.Vassily
C
-2

I've tried this using Chrome 104.0.5112.101 (on Linux). Didn't work. When checking the debugging console I got the message:

speechSynthesis.speak() without user activation is deprecated and will be removed

Adding a button like mentioned in The question of whether speechSynthesis is allowed to run without user interaction does work for me. At least to speak out text, not SSML formatted text though.

Cycloplegia answered 21/9, 2022 at 17:36 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.