Export the result of cloud speech API to JSON file using Python
Asked Answered
F

3

5

I am trying to convert an audio file to text in an Indian language via Google Speech API. The API returns an object of the type "google.cloud.speech_v1.types.SpeechRecognitionAlternative". I am trying to export the result to .json file. I am very new to python. This is the first project I am doing in python.

    import io
    import os
    import pickle

    # Imports the Google Cloud client library
    from google.cloud
    import speech
    from google.cloud.speech
    import enums
    from google.cloud.speech
    import types

    client = speech.SpeechClient()

    audio = types.RecognitionAudio(uri = "gs://storage-staples-canada/client-data/TapTapTap.wav")
    config = types.RecognitionConfig(
      encoding = enums.RecognitionConfig.AudioEncoding.LINEAR16,
      sample_rate_hertz = 16000,
      language_code = 'hi-IN',
      enable_word_time_offsets = True,
      speech_contexts = [speech.types.SpeechContext(phrases = ['एक जंगल था।', 'ख़ूब घना जंगल।', 'एक दिन झमाझम पानी बरसा।', 'थोड़ी देर बाद बारिश रुक गई। सारे जानवर बाहर निकले।', 'तभी आवाज़ आई टप टप टपक सब घबरा गए।', 'सबने सोचा यह कैसी आवाज़ है? सारे बड़े जानवर डर कर भाग गए।', 'चींटी ने कहा- चलो साथियो, हम टप- टप- टपक को ढूँढें। चूहा बोला- शाबाश चींटी बहन!', 'जुगनू बोला- मैं आगे रहूँगा। बिल्ली बोली- मैं साथ हूँ।', 'ख़रगोश ने कहा- मैं भी पीछे नहीं रहूँगा। सब टप- टप- टपक को ढूँढने चले।', 'फिर सबने देखा, बरसात का पानी पत्तों पर टपक रहा था', 'टप- टप- टपक।'])], )

    operation = client.long_running_recognize(config, audio)
    print('Waiting for operation to complete...')
    response = operation.result(timeout = 90)

    # Gets the time - offsets of each of the words in the audio

    for result in response.results:
      alternative = result.alternatives[0]# The first alternative is the most likely one
    for this portion.
    print('Transcript: {}'.format(result.alternatives[0].transcript))
    print('Confidence: {}'.format(result.alternatives[0].confidence))
    for word_info in alternative.words:
      word = word_info.word
    start_time = word_info.start_time
    end_time = word_info.end_time
    print('Word: {}, start_time: {}, end_time: {}'.format(
      word,
      start_time.seconds + start_time.nanos * 1e-9,
      end_time.seconds + end_time.nanos * 1e-9))

When I try to convert the results of the API(stored in response variable in the code above) to the dictionary. The error message I get "TypeError: 'SpeechRecognitionAlternative' object is not iterable". Can you help me convert and export the result to a .json file?

Fokine answered 14/2, 2018 at 12:29 Comment(2)
Like the message says, you are trying to loop through an object that does not support iteration. Go back to the API docs and find out what this object is, and how you are supposed to use it. Don't just read your code; debug it! Go through the code and make sure you understand every line before going to the next. You should also show the text of the exact error message with traceback and line numbers as well.Mistymisunderstand
I read the document, nothing much concrete is written. Here is the link to the document:- cloud.google.com/speech/reference/rest/v1/speech/recognize. I also debugged my code and I am getting the correct output. I just need help to export this object to a .json file. When I try to convert it to a dictionary, it gives the error "The object is not iterable".Fokine
T
0

For this task you can use command line tool gcloud. For example

gcloud ml speech recognize-long-running \
  gs://storage-staples-canada/client-data/TapTapTap.wav \
  --language-code=hi-IN --encoding=linear16 --sample-rate=16000 \
  --include-word-time-offsets \
  --hints="एक जंगल था।,ख़ूब घना जंगल।"
  --format=json

You can add --log-http flag to see API interactions which can help you fix your python code.

Tableware answered 14/2, 2018 at 16:30 Comment(0)
S
6

I'd recommend using protobuf↔json converters from google's protobuf library:

from google.protobuf.json_format import MessageToJson

# the below line is taken from the code above, which contains the google api results
response = operation.result(timeout = 90)
result_json = MessageToJson(response)

Then use json.dump to write result_json. Please see How do I write JSON data to a file?

Siebert answered 27/7, 2018 at 16:52 Comment(1)
Thank you. I have been trying to debug this for hours. There is very little documentation on the web about how to actually convert the results of the google stt API to something useful. There are LOTS of examples that no longer work (including my old code) as Google seem to have changed the API. And to @Mistymisunderstand who suggested just debugging the code....well, Duh!Telephone
L
2
from google.protobuf.json_format import MessageToDict
result_json = MessageToDict(response)
print(result_json)

import json
with open("result.json", "w") as jsonFile:
    json.dump(result_json, jsonFile, indent=4, sort_keys=True)
Linearity answered 21/12, 2018 at 18:34 Comment(0)
T
0

For this task you can use command line tool gcloud. For example

gcloud ml speech recognize-long-running \
  gs://storage-staples-canada/client-data/TapTapTap.wav \
  --language-code=hi-IN --encoding=linear16 --sample-rate=16000 \
  --include-word-time-offsets \
  --hints="एक जंगल था।,ख़ूब घना जंगल।"
  --format=json

You can add --log-http flag to see API interactions which can help you fix your python code.

Tableware answered 14/2, 2018 at 16:30 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.