Wake-Up-Word Speech Recognition on a Raspberry Pi with PocketSphinx

In this article, I describe how I implemented Wake-Up-Word Speech Recognition on a Raspberry Pi with PocketSphinx.

Why you need Wake-Up-Word? 

  • So you can wake-up your device by talking to it out loud, without having to press any buttons. You see such feature in the  Amazon Echo “Hi Alexa,” Apple: “Hey Siri”, and Google “OK Google.” I did it for my candy machine; I’d say “Hey Candy.”

Why not use online speech recognition?

Two reasons: cost & connectivity

  1. Cost: online Speech recognition such as IBM Watson Speech to Text service: costs $0.02/min. If you run 24/7 for a month, your bill would be $844/mo.
    • 1 month * 30days/month * 24hours/day * 60min/hour = 43200 minutes/month * $0.02/min = $844/mo
  2. Connectivity
    • What if the internet is down? What about having some barebones functionality in your device. Consider registering words such as ‘help.’

Well, enough background. Here’s how to get it working:

Step 1:

  • MIC must be the “first” device so it to works with PocketSphinx.
  • Most instructions out there were written for Debian version Wheezy. If you have the newer Jessie version, follow this to setup the microphone order:
  • http://raspberrypi.stackexchange.com/questions/40831/how-do-i-configure-my-sound-for-jasper-on-raspbian-jessie

Step 2:

  • Now that you have the mic priority configured, you can follow these:
  • http://cmusphinx.sourceforge.net/wiki/raspberrypi
  • Update: before you run ./make on the pi, do this: ( reference )
    • $ sudo apt-get install python-dev swig
    • $ sudo apt-get install autoconf libtool automake
    • $ ./autogen.sh

Step 3:

  • Here’s how to run it:
    • pocketsphinx_continuous -inmic yes -keyphrase “hey candy” -kws_threshold 1e-20
  • If you need to tweak those params check out this blog

 

Lessons Learn / misc notes:

  • These steps were done on 5/11/2016; some instructions may be outdated.
  • ‘omxplayer’ cannot output audio to USB devices. [to do: add reference]
  • Detection performance was inconsistent. To do: add more testing on performance

Next Steps:

  • Trim the dictionary file, to see if performance improves
  • Post Node-Red flow showing this feature as part of a bigger solution.

 

References:

#1 Blog: Raspberry Pi 2 – Speech Recognition on device – https://wolfpaulus.com/journal/embedded/raspberrypi2-sr/

#2 Wake-on-Voice-Keyword