Is there a way to force Google Speech api to return only words as response?
Asked Answered
Z

2

19

I am using Googles this api :-

https://www.google.com/speech-api/v2/recognize?output=json&lang="+ language_code+"&key="My key"

for speech recognition and it's working very well.

The issue is with numbers i.e, if I say one two three four the result will be 1234 and if I say one thousand two hundred thirty four the result is still 1234.

Another issue is that with other languages i.e. the word elf in German is eleven. If you say elf the result is 11, instead of elf.

I know we have no control over the api but is there any parameters or hacks we can add to this api to force it to return only words.

The response some times have the correct result but not always.

These are sample responses

1) When I say "one two three four"

{"result":[{"alternative":[{"transcript":"1234","confidence":0.47215959},{"transcript":"1 2 3 4","confidence":0.25},{"transcript":"one two three four","confidence":0.25},{"transcript":"1 2 34","confidence":0.33333334},{"transcript":"1 to 34","confidence":1}],"final":true}],"result_index":0}

2) When I say "one thousand two hundred thirty four"

{"result":[{"alternative":[{"transcript":"1234","confidence":0.94247383},{"transcript":"1.254","confidence":1},{"transcript":"1284","confidence":1},{"transcript":"1244","confidence":1},{"transcript":"1230 4","confidence":1}],"final":true}],"result_index":0}

What I have done.

Check if the result is a number, Then split each number by space and check if there is same sequence in the result array. In this e.g. Result 1234 becomes 1 2 3 4 and will search if there is a similar sequence in the result array and then convert it to words.In 2nd case there is no 1 2 3 4 so will stick with the original result.

This is the code.

 String numberPattern = "[0-9]";
  Pattern r1 = Pattern.compile(numberPattern);
  Matcher m2 = r1.matcher(output);
  if (m2.find()) {
      char[] digits2 = output.toCharArray();
      String digit = "";
      for (char c: digits2) {
          digit += c + " ";
      }

      for (int i = 1; i < jsonArray2.length(); i++) {
          String value = jsonArray2.getJSONObject(i).getString("transcript");
          if (digit.trim().equals(value.trim())) {
              output = digit + " ";
          }
      }
  }

So the issue is when I "say thirteen four eight" this method will split 13 as one three and hence not a reliable solution.

Update

I tried the new cloud vision api (https://cloud.google.com/speech/) and it's little better than the v2. The result for one two three four is in words itself for which my workaround is working as well. But when I say thirteen four eight it's still the same result as in v2.

And also elf is still 11 in German.

Also tried speech_context that also didn't worked.

Zagazig answered 14/3, 2017 at 11:30 Comment(1)
In what way is speech_context not working? If anything is going to help achieve the result you want, it's the speech context.Lylalyle
L
2

Take a look at this question and answer.

You can give the API "speech context" hints, like this:

"speech_context": {
  "phrases":["zero", "one", "two", ... "nine", "ten", "eleven", ... "twenty", "thirty,..., "ninety"]
 }

I imagine this could work for other languages too, like German.

"speech_context": {
  "phrases":["eins", "zwei", "drei", ..., "elf", "zwölf" ... ]
 }
Lylalyle answered 15/3, 2017 at 13:3 Comment(2)
I am not using cloud speech api and speech api v2 dosn't have this parameter but still I am ok to switch to cloud api. However this is not practical ,I cannot give all these numbers as hints .The user can say any numbers.Also there 20 more languages in my app.Zagazig
Oh, I didn't notice you're using v2. Not sure if it helps, but you don't need to give all numbers, just the unique words. You wouldn't need to pass "twenty one" because you'll already have "twenty" and "one" separately. This would keep the number of phrases to below 50 and you can send up to 500 phrases.Lylalyle
P
0

You may have to convert numbers (not digits) to words by yourself. As there is some logic in most languages (e.g. English, German), you can do this with an algorithmic approach.

See How to convert number to words in java

Plasticizer answered 25/3, 2017 at 11:14 Comment(1)
I have no problems in converting numbers to words. The only issue I am having is I can't differentiate between '1 ' '2 ' '3 '4' and 1234 as in both cases result from google is 1234 hence the result will be converted to 'one thousand two hundred thirty four'.Zagazig

© 2022 - 2024 — McMap. All rights reserved.