Currently, the google assistant SDK accepts voice input, which means my question is fairly simple: I want to converse with the google assistant but not using voice, just chat. This is certainly possible, for instance, in Google Allo. Has google exposed an API for text input?
It is supported now in the v1alpha2
version of the Google Assistant SDK Service
So it doesn't look like the sdk accepts text but it does accept an audio file input. It even outputs as an audio file.
python -m pushtotalk -i somefile.wav -o outputfile.wav
This got me thinking and I wrote a script:
echo $1 >> query.txt
espeak -f query.txt -w audio_query.wav
python -m pushtotalk -i audio_query.wav -o audio_response.wav &> pushtotalk.log
pocketsphinx_continuous -infile audio_response.wav 2> pocketsphinx.log > response.txt
cat response.txt
rm response.txt query.txt audio_query.wav audio_response.wav pocketsphinx.log pushtotalk.log
This is just a shell script, but this can likely be converted to python too. To use it, save the script as pushtotalk_script.sh
and run ./pushtotalk_script.sh "how tall is mount kilamanjaro?
. I'm using espeak to turn the text into a wav file. Then using the assistant sdk to get a response. You could stop here and play the response. Pocketsphinx is a audio transcriber engine created by CMU. You can find packages for these tools using apt-get, but if you're on OSX, the pocketsphinx package doesn't work and you'll need to tap these formulas. Also, here's a python module to use espeak. And there's a repo for pocketsphinx as a python module but I can't link more than two links.
Google's Assistant doesn't seem to have much trouble understanding the output from espeak. Pocketsphinx does however have a bit of trouble transcribing the text usually. But it works well for simple responses. Depending on the length of the question and the response audio files, the whole process takes about 5 to 10 seconds.
chmod
–
Mcmorris © 2022 - 2024 — McMap. All rights reserved.