This is the Google Speech API docs: https://cloud.google.com/speech/docs/sync-recognize
I trried this API for 2 weeks, but still can't solve my main purpose (translate live streaming).
I'm using PHP. (other language suggestion is allowed, I will find by myself)
What I can do in my 2 weeks:
Synchronous Speech Recognition (<=1min)
Asynchronous Speech Recognition (>1min and <=80min). Note: i can modify this to accept 3hours video.
Live speech recognition from mic : https://www.google.com/intl/en/chrome/demos/speech.html
UPDATE: Perform streaming API with audio less than 6sec duration.
What can't I do is:
How to translate live streaming. ex: radio streaming (delay is allowed)
How to Translate when video/audio playing. (delay is allowed)
UPDATE:
I also ask the question on google github too. but since no answer, i ask here.
Summary:
I can perform speech streaming but only with 6 second audio. This is not like what i expected. My expectation is to recognize unlimited duration (seems we dont know when radio streaming will end).
Thank for any help. i very appreciate it
UPDATE:
To approve that I can't use video longer than 6sec. so i write this:
I try this video interview.mp4 and convert it with ffmpeg to interview.flac using this ffmpeg -i interview.mp4 -c:a flac -ar 16000 -ac 1 -sample_fmt s16 interview.flac
.
i use this library to transcribe the video using this command:
php speech.php transcribe --encoding FLAC --language-code en-US --sample-rate 16000 --stream interview.flac
and the result is:
[Google\GAX\ApiException]
Invalid 'audio_content': too long.
it cant be too long, because the video duration is only 48 sec. this is the meta from ffmpeg result:
Output #0, flac, to 'interview.flac':
Metadata:
major_brand : isom
minor_version : 512
compatible_brands: isomiso2avc1mp41
encoder : Lavf57.72.101
Stream #0:0(und): Audio: flac, 16000 Hz, mono, s16, 128 kb/s (default)
Metadata:
handler_name : SoundHandler
encoder : Lavc57.92.100 flac
size= 810kB time=00:00:48.01 bitrate= 138.1kbits/s speed= 108x
video:0kB audio:801kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 1.019650%