Python having trouble accessing usb microphone using Gstreamer to perform speech recognition with Pocketsphinx on a Raspberry Pi
Asked Answered
S

1

8

So python is acting like acting like it can't hear ANYTHING from my microphone at all.

Here's the problem. I have a Python ( 2.7 ) script that is suppose to be using Gstreamer to access my microphone and do speech recognition for me via Pocketsphinx. I'm using Pulse Audio and my device is a Raspberry Pi. My microphone is a Playstation 3 Eye.

Now off the bat, I have already gotten pocketsphinx_continuous to run correctly and recognize the words I have defined in my .dict and .lm files. The accuracy is around 85-90% accurate after a couple trial runs I've had. So off the bat I know my microphone is picking up sound normally via pocketsphinx + pulse audio.

FYI I ran the following:

pocketsphinx_continuous -lm /home/pi/dev/scarlettPi/config/speech/lm/scarlett.lm -dict /home/pi/dev/scarlettPi/config/speech/dict/scarlett.dic -hmm /home/pi/dev/scarlettPi/config/speech/model/hmm/en_US/hub4wsj_sc_8k -silprob  0.1 -wip 1e-4 -bestpath 0

In my python code i'm attempting to do the same thing, but i'm using gstreamer to access the microphone in python. ( Note: I'm a bit new to Python )

Here is my code ( Thanks Josip Lisec for getting me this far ):

import pi
from pi.becore import ScarlettConfig
from recorder import Recorder
from brain import Brain

import os
import json
import tempfile
#import sys

import pygtk
pygtk.require('2.0')
import gtk
import gobject
import pygst
pygst.require('0.10')
gobject.threads_init()
import gst

scarlett_config=ScarlettConfig()

class Listener:
  def __init__(self, gobject, gst):
    self.failed = 0

    self.pipeline = gst.parse_launch(' ! '.join(['pulsesrc',
                                               'audioconvert',
                                               'audioresample',
                                               'vader name=vader auto-threshold=true',
                                               'pocketsphinx lm=' + scarlett_config.get('LM') + ' dict=' + scarlett_config.get('DICT') + ' hmm=' + scarlett_config.get('HMM') + ' name=listener',
                                               'fakesink']))
    listener = self.pipeline.get_by_name('listener')
    listener.connect('result', self.__result__)
    listener.set_property('configured', True)
    print "KEYWORDS WE'RE LOOKING FOR: " + scarlett_config.get('ourkeywords')

    bus = self.pipeline.get_bus()
    bus.add_signal_watch()
    bus.connect('message::application', self.__application_message__)
    self.pipeline.set_state(gst.STATE_PLAYING)

  def result(self, hyp, uttid):
    if hyp in scarlett_config.get('ourkeywords'):
      self.failed = 0
      self.listen()
    else:
      self.failed += 1
      if self.failed > 4:
        pi.speak("" + scarlett_config.get('scarlett_owner') + ", if you need me, just say my name.")
        self.failed = 0

  def listen(self):
    self.pipeline.set_state(gst.STATE_PAUSED)
    pi.play('pi-listening')
    Recorder(self)

  def cancel_listening(self):
    pi.play('pi-cancel')
    self.pipeline.set_state(gst.STATE_PLAYING)

  # question - sound recording
  def answer(self, question):
    pi.play('pi-cancel')

    print " * Contacting Google"
    destf = tempfile.mktemp(suffix='piresult')
    os.system('wget --post-file %s --user-agent="Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_8) AppleWebKit/535.7 (KHTML, like Gecko) Chrome/16.0.912.77 Safari/535.7" --header="Content-Type: audio/x-flac; rate=16000" -O %s -q "https://www.google.com/speech-api/v1/recognize?client=chromium&lang=en-US"' % (question, destf))
    #os.system("speech2text %s > %s" % (question, destf))
    b = open(destf)
    result = b.read()
    b.close()

    os.unlink(question)
    os.unlink(destf)

    if len(result) == 0:
      print " * nop"
      pi.play('pi-cancel')
    else:
      brain = Brain(json.loads(result))
      if brain.think() == False:
        print " * nop2"
        pi.play('pi-cancel')

    self.pipeline.set_state(gst.STATE_PLAYING)

  def __result__(self, listener, text, uttid):
    struct = gst.Structure('result')
    struct.set_value('hyp', text)
    struct.set_value('uttid', uttid)
    listener.post_message(gst.message_new_application(listener, struct))

  def __application_message__(self, bus, msg):
    msgtype =  msg.structure.get_name()
    if msgtype == 'result':
      self.result(msg.structure['hyp'], msg.structure['uttid'])

The application is suppose to match on the keyword "Scarlett" then perform an action after that.

When I run my application, I get the following output:

pi@scarlettpi ~/dev/scarlettPi/scripts/pi/bin $ ./pi 
/usr/lib/python2.7/dist-packages/gtk-2.0/gtk/__init__.py:57: GtkWarning: could not open display
  warnings.warn(str(e), _gtk.Warning)
INFO: cmd_ln.c(691): Parsing command line:
gst-pocketsphinx \
    -samprate 8000 \
    -cmn prior \
    -fwdflat no \
    -bestpath no \
    -maxhmmpf 2000 \
    -maxwpf 20 

Current configuration:
[NAME]      [DEFLT]     [VALUE]
-agc        none        none
-agcthresh  2.0     2.000000e+00
-alpha      0.97        9.700000e-01
-ascale     20.0        2.000000e+01
-aw     1       1
-backtrace  no      no
-beam       1e-48       1.000000e-48
-bestpath   no      no
-bestpathlw 9.5     9.500000e+00
-bghist     no      no
-ceplen     13      13
-cmn        current     prior
-cmninit    8.0     8.0
-compallsen no      no
-debug              0
-dict               
-dictcase   no      no
-dither     no      no
-doublebw   no      no
-ds     1       1
-fdict              
-feat       1s_c_d_dd   1s_c_d_dd
-featparams         
-fillprob   1e-8        1.000000e-08
-frate      100     100
-fsg                
-fsgusealtpron  yes     yes
-fsgusefiller   yes     yes
-fwdflat    yes     no
-fwdflatbeam    1e-64       1.000000e-64
-fwdflatefwid   4       4
-fwdflatlw  8.5     8.500000e+00
-fwdflatsfwin   25      25
-fwdflatwbeam   7e-29       7.000000e-29
-fwdtree    yes     yes
-hmm                
-input_endian   little      little
-jsgf               
-kdmaxbbi   -1      -1
-kdmaxdepth 0       0
-kdtree             
-latsize    5000        5000
-lda                
-ldadim     0       0
-lextreedump    0       0
-lifter     0       0
-lm             
-lmctl              
-lmname     default     default
-logbase    1.0001      1.000100e+00
-logfn              
-logspec    no      no
-lowerf     133.33334   1.333333e+02
-lpbeam     1e-40       1.000000e-40
-lponlybeam 7e-29       7.000000e-29
-lw     6.5     6.500000e+00
-maxhmmpf   -1      2000
-maxnewoov  20      20
-maxwpf     -1      20
-mdef               
-mean               
-mfclogdir          
-min_endfr  0       0
-mixw               
-mixwfloor  0.0000001   1.000000e-07
-mllr               
-mmap       yes     yes
-ncep       13      13
-nfft       512     512
-nfilt      40      40
-nwpen      1.0     1.000000e+00
-pbeam      1e-48       1.000000e-48
-pip        1.0     1.000000e+00
-pl_beam    1e-10       1.000000e-10
-pl_pbeam   1e-5        1.000000e-05
-pl_window  0       0
-rawlogdir          
-remove_dc  no      no
-round_filters  yes     yes
-samprate   16000       8.000000e+03
-seed       -1      -1
-sendump            
-senlogdir          
-senmgau            
-silprob    0.1     1.000000e-01
-smoothspec no      no
-svspec             
-tmat               
-tmatfloor  0.0001      1.000000e-04
-topn       4       4
-topn_beam  0       0
-toprule            
-transform  legacy      legacy
-unit_area  yes     yes
-upperf     6855.4976   6.855498e+03
-usewdphones    no      no
-uw     1.0     1.000000e+00
-var                
-varfloor   0.0001      1.000000e-04
-varnorm    no      no
-verbose    no      no
-warp_params            
-warp_type  inverse_linear  inverse_linear
-wbeam      7e-29       7.000000e-29
-wip        1e-4        1.000000e-04
-wlen       0.025625    2.562500e-02

INFO: cmd_ln.c(691): Parsing command line:
\
    -nfilt 20 \
    -lowerf 1 \
    -upperf 4000 \
    -wlen 0.025 \
    -transform dct \
    -round_filters no \
    -remove_dc yes \
    -svspec 0-12/13-25/26-38 \
    -feat 1s_c_d_dd \
    -agc none \
    -cmn current \
    -cmninit 56,-3,1 \
    -varnorm no 

Current configuration:
[NAME]      [DEFLT]     [VALUE]
-agc        none        none
-agcthresh  2.0     2.000000e+00
-alpha      0.97        9.700000e-01
-ceplen     13      13
-cmn        current     current
-cmninit    8.0     56,-3,1
-dither     no      no
-doublebw   no      no
-feat       1s_c_d_dd   1s_c_d_dd
-frate      100     100
-input_endian   little      little
-lda                
-ldadim     0       0
-lifter     0       0
-logspec    no      no
-lowerf     133.33334   1.000000e+00
-ncep       13      13
-nfft       512     512
-nfilt      40      20
-remove_dc  no      yes
-round_filters  yes     no
-samprate   16000       8.000000e+03
-seed       -1      -1
-smoothspec no      no
-svspec             0-12/13-25/26-38
-transform  legacy      dct
-unit_area  yes     yes
-upperf     6855.4976   4.000000e+03
-varnorm    no      no
-verbose    no      no
-warp_params            
-warp_type  inverse_linear  inverse_linear
-wlen       0.025625    2.500000e-02

INFO: acmod.c(246): Parsed model-specific feature parameters from /usr/local/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k/feat.params
INFO: feat.c(713): Initializing feature stream to type: '1s_c_d_dd', ceplen=13, CMN='current', VARNORM='no', AGC='none'
INFO: cmn.c(142): mean[0]= 12.00, mean[1..12]= 0.0
INFO: acmod.c(167): Using subvector specification 0-12/13-25/26-38
INFO: mdef.c(517): Reading model definition: /usr/local/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k/mdef
INFO: mdef.c(528): Found byte-order mark BMDF, assuming this is a binary mdef file
INFO: bin_mdef.c(336): Reading binary model definition: /usr/local/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k/mdef
INFO: bin_mdef.c(513): 50 CI-phone, 143047 CD-phone, 3 emitstate/phone, 150 CI-sen, 5150 Sen, 27135 Sen-Seq
INFO: tmat.c(205): Reading HMM transition probability matrices: /usr/local/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k/transition_matrices
INFO: acmod.c(121): Attempting to use SCHMM computation module
INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /usr/local/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k/means
INFO: ms_gauden.c(292): 1 codebook, 3 feature, size: 
INFO: ms_gauden.c(294):  256x13
INFO: ms_gauden.c(294):  256x13
INFO: ms_gauden.c(294):  256x13
INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /usr/local/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k/variances
INFO: ms_gauden.c(292): 1 codebook, 3 feature, size: 
INFO: ms_gauden.c(294):  256x13
INFO: ms_gauden.c(294):  256x13
INFO: ms_gauden.c(294):  256x13
INFO: ms_gauden.c(354): 0 variance values floored
INFO: s2_semi_mgau.c(903): Loading senones from dump file /usr/local/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k/sendump
INFO: s2_semi_mgau.c(927): BEGIN FILE FORMAT DESCRIPTION
INFO: s2_semi_mgau.c(1022): Using memory-mapped I/O for senones
INFO: s2_semi_mgau.c(1296): Maximum top-N: 4 Top-N beams: 0 0 0
INFO: dict.c(317): Allocating 4120 * 20 bytes (80 KiB) for word entries
INFO: dict.c(332): Reading main dictionary: /home/pi/dev/scarlettPi/config/speech/dict/scarlett.dic
INFO: dict.c(211): Allocated 0 KiB for strings, 0 KiB for phones
INFO: dict.c(335): 13 words read
INFO: dict.c(341): Reading filler dictionary: /usr/local/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k/noisedict
INFO: dict.c(211): Allocated 0 KiB for strings, 0 KiB for phones
INFO: dict.c(344): 11 words read
INFO: dict2pid.c(396): Building PID tables for dictionary
INFO: dict2pid.c(404): Allocating 50^3 * 2 bytes (244 KiB) for word-initial triphones
INFO: dict2pid.c(131): Allocated 30200 bytes (29 KiB) for word-final triphones
INFO: dict2pid.c(195): Allocated 30200 bytes (29 KiB) for single-phone word triphones
INFO: ngram_model_arpa.c(477): ngrams 1=12, 2=18, 3=17
INFO: ngram_model_arpa.c(135): Reading unigrams
INFO: ngram_model_arpa.c(516):       12 = #unigrams created
INFO: ngram_model_arpa.c(195): Reading bigrams
INFO: ngram_model_arpa.c(533):       18 = #bigrams created
INFO: ngram_model_arpa.c(534):        3 = #prob2 entries
INFO: ngram_model_arpa.c(542):        3 = #bo_wt2 entries
INFO: ngram_model_arpa.c(292): Reading trigrams
INFO: ngram_model_arpa.c(555):       17 = #trigrams created
INFO: ngram_model_arpa.c(556):        2 = #prob3 entries
INFO: ngram_search_fwdtree.c(99): 12 unique initial diphones
INFO: ngram_search_fwdtree.c(147): 0 root, 0 non-root channels, 12 single-phone words
INFO: ngram_search_fwdtree.c(186): Creating search tree
INFO: ngram_search_fwdtree.c(191): before: 0 root, 0 non-root channels, 12 single-phone words
INFO: ngram_search_fwdtree.c(326): after: max nonroot chan increased to 152
INFO: ngram_search_fwdtree.c(338): after: 12 root, 24 non-root channels, 11 single-phone words
KEYWORDS WE'RE LOOKING FOR: [ 'scarlett', 'SCARLETT' ]    

But it fails to match on anything. I almost think python can not hear anything from the microphone, there aren't even any attempts to recognize anything. In pocketsphinx_continuious it usually prints out a READY state when its prepared to start listening...I expect the same in python?

Here are my python packages:

pi@scarlettpi ~/dev/scarlettPi/scripts/pi/bin $ dpkg -l | grep -i python
ii  idle                                  2.7.3-4                              all          IDE for Python using Tkinter (default version)
ii  idle-python2.7                        2.7.3-6                              all          IDE for Python (v2.7) using Tkinter
rc  idle3                                 3.2.3-6                              all          IDE for Python using Tkinter (default version)
ii  libpyside1.1:armhf                    1.1.1-3                              armhf        Python bindings for Qt 4 (base files)
ii  libpython2.6                          2.6.8-1.1                            armhf        Shared Python runtime library (version 2.6)
ii  libpython2.7                          2.7.3-6                              armhf        Shared Python runtime library (version 2.7)
ii  libshiboken1.1:armhf                  1.1.1-1                              armhf        CPython bindings generator for C++ libraries - shared library
ii  python                                2.7.3-4                              all          interactive high-level object-oriented language (default version)
ii  python-alsaaudio                      0.5+svn36-1                          armhf        Alsa bindings for Python
ii  python-cairo                          1.8.8-1                              armhf        Python bindings for the Cairo vector graphics library
ii  python-dbg                            2.7.3-4                              all          debug build of the Python Interpreter (version 2.7)
ii  python-dbus                           1.1.1-1                              armhf        simple interprocess messaging system (Python interface)
ii  python-dbus-dev                       1.1.1-1                              all          main loop integration development files for python-dbus
ii  python-dev                            2.7.3-4                              all          header files and a static library for Python (default)
ii  python-gi                             3.2.2-2                              armhf        Python 2.x bindings for gobject-introspection libraries
ii  python-gi-dbg                         3.2.2-2                              armhf        Python bindings for the GObject library (debug extension)
ii  python-gi-dev                         3.2.2-2                              all          development headers for GObject Python bindings
ii  python-gobject                        3.2.2-2                              all          Python 2.x bindings for GObject - transitional package
ii  python-gobject-2                      2.28.6-10                            armhf        deprecated static Python bindings for the GObject library
ii  python-gobject-2-dbg                  2.28.6-10                            armhf        deprecated static Python bindings for the GObject library (debug extension)
ii  python-gobject-2-dev                  2.28.6-10                            all          development headers for the static GObject Python bindings
ii  python-gobject-dbg                    3.2.2-2                              all          Python 2.x debugging modules for GObject - transitional package
ii  python-gobject-dev                    3.2.2-2                              all          Python 2.x development headers for GObject - transitional package
ii  python-gst0.10                        0.10.22-3                            armhf        generic media-playing framework (Python bindings)
ii  python-gst0.10-dbg                    0.10.22-3                            armhf        generic media-playing framework (Python debug bindings)
ii  python-gst0.10-dev                    0.10.22-3                            armhf        generic media-playing framework (Python bindings)
ii  python-gst0.10-rtsp                   0.10.8-3                             armhf        GStreamer RTSP server plugin (Python bindings)
ii  python-gtk2                           2.24.0-3                             armhf        Python bindings for the GTK+ widget set
ii  python-iplib                          1.1-3                                all          Python library to convert amongst many different IPv4 notations
ii  python-libxml2                        2.8.0+dfsg1-7+nmu1                   armhf        Python bindings for the GNOME XML library
ii  python-minimal                        2.7.3-4                              all          minimal subset of the Python language (default version)
ii  python-numpy                          1:1.6.2-1.2                          armhf        Numerical Python adds a fast array facility to the Python language
ii  python-pexpect                        2.4-1                                all          Python module for automating interactive applications
ii  python-pip                            1.1-3                                all          alternative Python package installer
ii  python-pkg-resources                  0.6.24-1                             all          Package Discovery and Resource Access using pkg_resources
ii  python-pyalsa                         1.0.25-1                             armhf        Official ALSA Python binding library
ii  python-pyside                         1.1.1-3                              all          Python bindings for Qt4 (big metapackage)
ii  python-pyside.phonon                  1.1.1-3                              armhf        Qt 4 Phonon module - Python bindings
ii  python-pyside.qtcore                  1.1.1-3                              armhf        Qt 4 core module - Python bindings
ii  python-pyside.qtdeclarative           1.1.1-3                              armhf        Qt 4 Declarative module - Python bindings
ii  python-pyside.qtgui                   1.1.1-3                              armhf        Qt 4 GUI module - Python bindings
ii  python-pyside.qthelp                  1.1.1-3                              armhf        Qt 4 help module - Python bindings
ii  python-pyside.qtnetwork               1.1.1-3                              armhf        Qt 4 network module - Python bindings
ii  python-pyside.qtopengl                1.1.1-3                              armhf        Qt 4 OpenGL module - Python bindings
ii  python-pyside.qtscript                1.1.1-3                              armhf        Qt 4 script module - Python bindings
ii  python-pyside.qtsql                   1.1.1-3                              armhf        Qt 4 SQL module - Python bindings
ii  python-pyside.qtsvg                   1.1.1-3                              armhf        Qt 4 SVG module - Python bindings
ii  python-pyside.qttest                  1.1.1-3                              armhf        Qt 4 test module - Python bindings
ii  python-pyside.qtuitools               1.1.1-3                              armhf        Qt 4 UI tools module - Python bindings
ii  python-pyside.qtwebkit                1.1.1-3                              armhf        Qt 4 WebKit module - Python bindings
ii  python-pyside.qtxml                   1.1.1-3                              armhf        Qt 4 XML module - Python bindings
ii  python-rpi.gpio                       0.5.3a-1                             armhf        Python GPIO module for Raspberry Pi
ii  python-setuptools                     0.6.24-1                             all          Python Distutils Enhancements (setuptools compatibility)
ii  python-simplejson                     2.5.2-1                              armhf        simple, fast, extensible JSON encoder/decoder for Python
ii  python-support                        1.0.15                               all          automated rebuilding support for Python modules
ii  python-tk                             2.7.3-1                              armhf        Tkinter - Writing Tk applications with Python
ii  python-yaml                           3.10-4                               armhf        YAML parser and emitter for Python
ii  python-yaml-dbg                       3.10-4                               armhf        YAML parser and emitter for Python (debug build)
ii  python2.6                             2.6.8-1.1                            armhf        Interactive high-level object-oriented language (version 2.6)
ii  python2.6-minimal                     2.6.8-1.1                            armhf        Minimal subset of the Python language (version 2.6)
ii  python2.7                             2.7.3-6                              armhf        Interactive high-level object-oriented language (version 2.7)
ii  python2.7-dbg                         2.7.3-6                              armhf        Debug Build of the Python Interpreter (version 2.7)
ii  python2.7-dev                         2.7.3-6                              armhf        Header files and a static library for Python (v2.7)
ii  python2.7-minimal                     2.7.3-6                              armhf        Minimal subset of the Python language (version 2.7)
pi@scarlettpi ~/dev/scarlettPi/scripts/pi/bin $

Also just to confirm that pocketsphinx is complied correctly against the right libaries:

pi@scarlettpi ~ $ ldd /usr/local/bin/pocketsphinx_continuous 
    /usr/lib/arm-linux-gnueabihf/libcofi_rpi.so (0xb6f9b000)
    libpocketsphinx.so.1 => /usr/local/lib/libpocketsphinx.so.1 (0xb6f5a000)
    libsphinxad.so.0 => /usr/local/lib/libsphinxad.so.0 (0xb6f4e000)
    libsphinxbase.so.1 => /usr/local/lib/libsphinxbase.so.1 (0xb6f07000)
    libpulse.so.0 => /usr/lib/arm-linux-gnueabihf/libpulse.so.0 (0xb6ea8000)
    libpulse-simple.so.0 => /usr/lib/arm-linux-gnueabihf/libpulse-simple.so.0 (0xb6e9c000)
    libpthread.so.0 => /lib/arm-linux-gnueabihf/libpthread.so.0 (0xb6e7d000)
    libm.so.6 => /lib/arm-linux-gnueabihf/libm.so.6 (0xb6e0c000)
    libc.so.6 => /lib/arm-linux-gnueabihf/libc.so.6 (0xb6cdd000)
    libjson.so.0 => /lib/arm-linux-gnueabihf/libjson.so.0 (0xb6ccd000)
    libpulsecommon-2.0.so => /usr/lib/arm-linux-gnueabihf/pulseaudio/libpulsecommon-2.0.so (0xb6c6b000)
    libdbus-1.so.3 => /lib/arm-linux-gnueabihf/libdbus-1.so.3 (0xb6c29000)
    libcap.so.2 => /lib/arm-linux-gnueabihf/libcap.so.2 (0xb6c1e000)
    librt.so.1 => /lib/arm-linux-gnueabihf/librt.so.1 (0xb6c0f000)
    libdl.so.2 => /lib/arm-linux-gnueabihf/libdl.so.2 (0xb6c04000)
    libgcc_s.so.1 => /lib/arm-linux-gnueabihf/libgcc_s.so.1 (0xb6bdb000)
    /lib/ld-linux-armhf.so.3 (0xb6fa8000)
    libX11-xcb.so.1 => /usr/lib/arm-linux-gnueabihf/libX11-xcb.so.1 (0xb6bd2000)
    libX11.so.6 => /usr/lib/arm-linux-gnueabihf/libX11.so.6 (0xb6abe000)
    libxcb.so.1 => /usr/lib/arm-linux-gnueabihf/libxcb.so.1 (0xb6a9f000)
    libICE.so.6 => /usr/lib/arm-linux-gnueabihf/libICE.so.6 (0xb6a82000)
    libSM.so.6 => /usr/lib/arm-linux-gnueabihf/libSM.so.6 (0xb6a73000)
    libXtst.so.6 => /usr/lib/arm-linux-gnueabihf/libXtst.so.6 (0xb6a67000)
    libwrap.so.0 => /lib/arm-linux-gnueabihf/libwrap.so.0 (0xb6a57000)
    libsndfile.so.1 => /usr/lib/arm-linux-gnueabihf/libsndfile.so.1 (0xb69ee000)
    libasyncns.so.0 => /usr/lib/arm-linux-gnueabihf/libasyncns.so.0 (0xb69e2000)
    libattr.so.1 => /lib/arm-linux-gnueabihf/libattr.so.1 (0xb69d4000)
    libXau.so.6 => /usr/lib/arm-linux-gnueabihf/libXau.so.6 (0xb69ca000)
    libXdmcp.so.6 => /usr/lib/arm-linux-gnueabihf/libXdmcp.so.6 (0xb69be000)
    libuuid.so.1 => /lib/arm-linux-gnueabihf/libuuid.so.1 (0xb69b1000)
    libXext.so.6 => /usr/lib/arm-linux-gnueabihf/libXext.so.6 (0xb699b000)
    libXi.so.6 => /usr/lib/arm-linux-gnueabihf/libXi.so.6 (0xb6986000)
    libnsl.so.1 => /lib/arm-linux-gnueabihf/libnsl.so.1 (0xb696a000)
    libFLAC.so.8 => /usr/lib/arm-linux-gnueabihf/libFLAC.so.8 (0xb691f000)
    libvorbisenc.so.2 => /usr/lib/arm-linux-gnueabihf/libvorbisenc.so.2 (0xb67b2000)
    libvorbis.so.0 => /usr/lib/arm-linux-gnueabihf/libvorbis.so.0 (0xb6782000)
    libogg.so.0 => /usr/lib/arm-linux-gnueabihf/libogg.so.0 (0xb6775000)
    libresolv.so.2 => /lib/arm-linux-gnueabihf/libresolv.so.2 (0xb6761000)
pi@scarlettpi ~ $

And if you need to see any information about my microphone ( ps3 eye ):

Had to throw this in pastebin, ran out of room in this post.

http://pastebin.com/gSDZwRHc

Does anyone have any ideas why this isn't working? Please let me know if my question needs any clarification or if I can provide any more information to aid with debugging.

Thanks.

Singles answered 6/8, 2013 at 18:25 Comment(9)
When debugging GStreamer pipelines, it always pays off to make the pipeline as small as possible. So try to only link pulsesrc to fakesink with dump=1 on the fakesink. (same way you set name=vader on the vader element). If there is sound, you should see a lot of ASCII flying by every time you speak.Quesenberry
This may seem stupid, but Alsa on the Raspberry Pi has historically been broken. Are you sure you're getting audio data from your mic in any case? What happens with arecordDoings
@Harvard Graff: Thanks for the comment, I added that line to my code to see ascii debugging, that helped a lot. One thing I did not realize is that you can use pulse + alsa at the same time. I changed to alsasrc device=hw:1 and that worked, now my next issue is that after the keyword is recognized, it's suppose to record the next incoming sound and send it up to google to translate to text. It begins recording, but doesn't have sound which makes me think the device is already busy? Any thoughts? More details here: blacktonystark.tumblr.com/starkjournalSingles
@Idrumm, hey man thanks for the comment, I actually AM able to get audio data using arecord...I did a couple tests to make sure thats the case. Super strange. I have some updates that I posted above ^ If you have any ideas?Singles
@MalcolmJones: I would need to see some code to be able to help you with your next issue, but in general, since you are already accessing the microphone using an alsasrc, you should tee out those buffers, and have one path going into vader, and the other into a wavenc, possibly with a valve in between to be able to start and stop recording.Quesenberry
@HarvardGraff: Interesting, Here are some pastbins of where I got last night. Here's my listener: pastebin.com/V9zZYjNb and my recorder: pastebin.com/uCutLfHg Please excuse any python code-repeat etc. Still a bit new at Python. Based on this could you help me understand how I should go about tee-ing out the buffers (vader/wavenc) ? I think I need to read up on how to properly use a valve in-between ? Please let me know if anything here isn't clear.Singles
Maybe you could ask a separate question so I could use a bit more space and nicer formatting for an answer to this?Quesenberry
@HarvardGraf: Good call, will do, give me a couple mins to put something together ( will split this question into two )Singles
@HavardGraff sorry it took so long, here you go: #18220629Singles
S
4

So I finally got this guy working.

Couple key things I needed to realize:

1. Even if you're using Pulseaudio on your Raspberry Pi, as long as Alsa is still installed you're still able to use it. ( This might seem like a no brainer to others, but I honestly didn't realize I could still use both of these at the same time ) Hint via (syb0rg).

2. When it comes to sending large amounts of raw audio data ( .wav format in my case ) to Pocketsphinx via Gstreamer, (queues) are your friend.

After messing around with gst-launch-0.10 on the command line for a while I came across something that actually worked:

gst-launch-0.10 alsasrc device=hw:1 ! queue ! audioconvert ! audioresample ! queue ! vader name=vader auto-threshold=true ! pocketsphinx lm=/home/pi/dev/scarlettPi/config/speech/lm/scarlett.lm dict=/home/pi/dev/scarlettPi/config/speech/dict/scarlett.dic hmm=/usr/local/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k name=listener ! fakesink dump=1

So what's happening here?

  • Gstreamer is listening to device hw:1 ( Which is my Ps3 Eye USB device ). This device might vary, you can determine this by running :
pi@scarlettpi ~ $ pacmd dump
Welcome to PulseAudio! Use "help" for usage information.

....

load-module module-alsa-card device_id="0" name="platform-bcm2835_AUD0.0"

card_name="alsa_card.platform-bcm2835_AUD0.0" namereg_fail=false tsched=yes fixed_latency_range=no ignore_dB=no deferred_volume=yes card_properties="module-udev-detect.discovered=1"

load-module module-udev-detect

load-module module-bluetooth-discover

load-module module-esound-protocol-unix

load-module module-native-protocol-unix

load-module module-gconf

load-module module-default-device-restore

load-module module-rescue-streams

load-module module-always-sink

load-module module-intended-roles

load-module module-console-kit

load-module module-systemd-login

load-module module-position-event-sounds

load-module module-role-cork

load-module module-filter-heuristics

load-module module-filter-apply

load-module module-dbus-protocol

load-module module-switch-on-port-available

load-module module-cli-protocol-unix

load-module module-alsa-card device_id="1" name="usb-OmniVision_Technologies__Inc._USB_Camera-B4.09.24.1-01-CameraB409241" card_name="alsa_card.usb-OmniVision_Technologies__Inc._USB_Camera-B4.09.24.1-01-CameraB409241" namereg_fail=false tsched=yes fixed_latency_range=no ignore_dB=no

deferred_volume=yes card_properties="module-udev-detect.discovered=1"

....

The important line to notice is:

load-module module-alsa-card device_id="1" name="usb-OmniVision_Technologies__Inc._USB_Camera-B4.09.24.1-01-CameraB409241" card_name="alsa_card.usb-OmniVision_Technologies__Inc._USB_Camera-B4.09.24.1-01-CameraB409241" namereg_fail=false tsched=yes fixed_latency_range=no ignore_dB=no deferred_volume=yes card_properties="module-udev-detect.discovered=1"

Thats my Playstation 3 Eye, and thats on device_id=1. Hence hw:1

  • The audio data coming in from the ps3 eye gets resampled and added to a gstreamer queue and has to pass through a (vader) element before moving on to pocketsphinx. By passing the audio through the vader element w/ the auto-threshold=true flag on, gstreamer can determine the background noise level, which can be important if you have a lousy soundcard or a far-field microphone. This is how the pocketsphinx element will know when an utterance starts and ends.

  • Add the regular pocketsphix arguments to the pipeline that we already determined (here).

  • Pass everything into a fakesink since we don't need to hear anything right now, we only need pocketsphinx to listen to everything. The dump=1 flag provides us with more debugging information to see what's being processed / if audio is being accepted at all.

** After getting that to run successfully, the new python code looks like this: **

self.pipeline = gst.parse_launch(' ! '.join(['alsasrc device=' + scarlett_config.gimmie('audio_input_device'),
                                           'queue',
                                           'audioconvert',
                                           'audioresample',
                                           'queue',
                                           'vader name=vader auto-threshold=true',
                                           'pocketsphinx lm=' + scarlett_config.gimmie('LM') + ' dict=' + scarlett_config.gimmie('DICT') + ' hmm=' + scarlett_config.gimmie('HMM') + ' name=listener',
                                           'fakesink dump=1']))

Hope this helps someone.

NOTE: Please excuse me if my Gstreamer pipline is using excessive elements. I'm fairly new to Gstreamer, and i'm opener to more efficient ways of doing this.

Singles answered 13/8, 2013 at 22:48 Comment(2)
You probably just have pulseaudio misconfigured on your system, so you can't properly use pulse src.Haymaker
@NikolayShmyrev Hey Nikolay, really curious about your thoughts on the next post I made, I know you pay attention to the cmusphinx tags. Wondering if you have any thoughts? #18220629Singles

© 2022 - 2024 — McMap. All rights reserved.