I am trying to convert an audio file to text in an Indian language via Google Speech API. The API returns an object of the type "google.cloud.speech_v1.types.SpeechRecognitionAlternative". I am trying to export the result to .json file. I am very new to python. This is the first project I am doing in python.
import io
import os
import pickle
# Imports the Google Cloud client library
from google.cloud
import speech
from google.cloud.speech
import enums
from google.cloud.speech
import types
client = speech.SpeechClient()
audio = types.RecognitionAudio(uri = "gs://storage-staples-canada/client-data/TapTapTap.wav")
config = types.RecognitionConfig(
encoding = enums.RecognitionConfig.AudioEncoding.LINEAR16,
sample_rate_hertz = 16000,
language_code = 'hi-IN',
enable_word_time_offsets = True,
speech_contexts = [speech.types.SpeechContext(phrases = ['एक जंगल था।', 'ख़ूब घना जंगल।', 'एक दिन झमाझम पानी बरसा।', 'थोड़ी देर बाद बारिश रुक गई। सारे जानवर बाहर निकले।', 'तभी आवाज़ आई टप टप टपक सब घबरा गए।', 'सबने सोचा यह कैसी आवाज़ है? सारे बड़े जानवर डर कर भाग गए।', 'चींटी ने कहा- चलो साथियो, हम टप- टप- टपक को ढूँढें। चूहा बोला- शाबाश चींटी बहन!', 'जुगनू बोला- मैं आगे रहूँगा। बिल्ली बोली- मैं साथ हूँ।', 'ख़रगोश ने कहा- मैं भी पीछे नहीं रहूँगा। सब टप- टप- टपक को ढूँढने चले।', 'फिर सबने देखा, बरसात का पानी पत्तों पर टपक रहा था', 'टप- टप- टपक।'])], )
operation = client.long_running_recognize(config, audio)
print('Waiting for operation to complete...')
response = operation.result(timeout = 90)
# Gets the time - offsets of each of the words in the audio
for result in response.results:
alternative = result.alternatives[0]# The first alternative is the most likely one
for this portion.
print('Transcript: {}'.format(result.alternatives[0].transcript))
print('Confidence: {}'.format(result.alternatives[0].confidence))
for word_info in alternative.words:
word = word_info.word
start_time = word_info.start_time
end_time = word_info.end_time
print('Word: {}, start_time: {}, end_time: {}'.format(
word,
start_time.seconds + start_time.nanos * 1e-9,
end_time.seconds + end_time.nanos * 1e-9))
When I try to convert the results of the API(stored in response variable in the code above) to the dictionary. The error message I get "TypeError: 'SpeechRecognitionAlternative' object is not iterable". Can you help me convert and export the result to a .json file?