Configure the Intel® Edison Board to Speak with a Scottish Accent

After seeing the Amazon Echo* talk at a recent local event. I bought one for my birthday. The Echo* is a fun device often described as the Star Trek Computer but its still very new and has its limitations.

While developing demo’s for the first IoT Roadshow in 2016 I decided we needed to refresh the accessories we gave developers to use with the Intel® Edison module, as a large number of the 2015 projects could arguably have been built with an Arduino* and a Wi-Fi Shield.

Therefore, inspired by this webinar and the Echo I decided to add USB audio capabilities, and espeak was a reasonably easy choice. Using a Plantronics USB headset was almost plug and play and its was soon talking and even with a Scottish accent. I've added a placeholder here for my YouTube* link, but that may have to be posted/edited later. This is obviously a live document, so please subscribe for updates.

Swapping in a set of USB connected speakers and a using a separate microphone and getting something that could be easily replicated 20 times at a roadshow proved a bit more work.

Getting Started Guides are all over the Intel® Developer Zone, and this link will spoil you with choices.

Because of the proliferation and history of these guides, I shall break a tradition in the Intel® Edison module community and actually tell you that I started my development with version 149 of the Intel Edison firmware. you can check your’s with the following console command.

$ configure_edison --version

The next step is to use one of the online guides for updating the opkg repo’s to add the highly recommended AlexT’s repo. This will allow you to not only add the programs below, but also to add nano,a much easier editor than vi. Ironically you need to use vi to update the repos to install nano.

Stephanie Moyerman’s Make book has a good set of instructions for getting the original example running using a headset but in addition to adding audio h/w to the roadshow mix, I also decided to add USB webcams, and I make use of the microphone on the camera to record audio. To get a jump-start on the video piece, while you install the audio packages, you may as well add OpenCV. This has the added advantage of getting your system into the same configuration as my demo. Many sites state that Linux audio is finnicky and they are not joking, so starting from a known configuration is highly recommended.

$ opkg install python-opencv espeak alsa-utils

Finding devices and validating h/w is not intuitive (aka finnicky), but the following methods worked for me.

1:$ lsusb   Check device is recognized.

     if not here then reboot, with a complete power down. I found that I could lock up the USB speakers, cause drivers to crash, etc while experimenting with values as I added the devices. You may or may not get a message in the console.

2: Use a (Powered) USB hub : The "powered" feature is a recommendation from Stephanie's book.

    When I demo this feature I always use a hub, that way I can plug in a camera and the speakers.  This h/w buffers seems to “insulate” the speakers from s/w issues, At least in my experience the devices “seem” more stable and to work more often. I shall leave a diagnosis of this as an exercise for the user.

ALSA program command lines programs each seem to have their own syntax, some call devices by name, some don't recognize that name, some prefer cards (a hangover from motherboards and plug in soundcards) and some prefer device numbers.  Even the Linux filesystem seems to work this way. Plug in a few USB devices and explore the /cat/proc/asound directory structure. You will see device names, cards, card1,card2,  and a few other directories that are duplicate or alternates.  These multiple ways to address the same h/w are the reason for some of the seeming confusion in the following section. 

Alsamixer is a graphical interface to the sound devices, and by selecting the correct device and modifying the volume I could hear audio, either via a wav file, or by using espeak. Unfortunately, amixer, which is a command line interface to the audio devices was not so compliant, and I had to try multiple different variants to get it working. Ultimately I decided to use the Card/device number to ensure I could run my demo, but that makes it somewhat dependent on the USB installation order.  

To align device numbers with the real devices use the following command

$ cat /proc/asound/pcm

00-00: Loopback PCM : Loopback PCM : playback 8 : capture 8

00-01: Loopback PCM : Loopback PCM : playback 8 : capture 8

01-00: 14 :  : playback 1 : capture 1

01-01: ((null)) :  : playback 1 : capture 1

01-02: ((null)) :  : playback 1 : capture 1

02-00: USB Audio : USB Audio : playback 1

03-00: USB Audio : USB Audio : capture 1

Device 02-00 is the loudspeaker, the playback device, and device 03-00 is the microphone in the USB web camera. To record from the microphone I use the arecord command. After a lot of experimentation, and reading of blogs and wiki’s the following syntax works for this demo.

The command format is

arecord  -f (format) -c (channels) -D (device) filename.wav.  ALSA has some predefined formats, and I chose CD quality. On its own this implies stereo, audio CD’s are stereo after all, and there is not a stereo microphone in the USB camera,  so i override this with -c (channels) and choose 1 for mono.  The -D (device)  number comes from the output of the $cat /proc/asound/pcm command above, in this case hw:3,0, the capture device. Therefore for recording a mono recording from the USB webcam to file called mono.wav use the following command.

$arecord -f CD -c 1 -D hw:3,0 mono.wav

To hear the audio, $aplay mono.wav will let you hear the recording. If you don't hear the recording, then check the volume using alsamixer.

Once the initial setup has been verified, then using espeak is very straightforward.  The following file sets the volume, then two phrases are created.  

$amixer -q  -c 2 set PCM,0 5000 unmute

$espeak -a 200 -s 120 -v en-sc " Sean Connery, Haggis, Loch Ness Monster,Whisky "

$espeak -a 200 -s 120 -v en-sc " HELLO, THIS IS EDISON, WELCOME TO OUR AUSTIN ROADSHOW"

The syntax for amixer is not in any way similar to the arecord format, but after a lot of playing around its was determined the setup above can be used to  set the loudspeaker volume, in this case to 5000. The other parameters are -q  for quiet, -c 2, for two channels (stereo) and the PCM,0 is analogous to the hw:3,0 parameter in the arecord command. If there was some consistency I should be able to use hw:2,0 for the speaker and hw:3,0  for the microphone, but alas ALSA.

There’s a lot more to explore in the Audio record and playback, especially in the realm of user interface and speech recognition, and the references below are a good starting point.

And if you have read all the way to here, then the way to get a Scottish accent is to invite me to your meetup, or to use the -v command option in espeak. This sets the voice option, in this case to the somewhat oddly named english-scottish (en-sc) option. en-rp is very english, imho, but there are lots of languages to try. Adding a modifier, for example -v en-sc+f2 (for female 2) yields a wierdly robotic Scottish Lass.


PocketSphinx : an offline speech recognition engine:

IBM Watson* Speech to Text

Amazon Echo

Google Speech Recognition using Python*

Follow me on twitter @intel_stewart or check the #intelmaker hashtag for updates.

For more complete information about compiler optimizations, see our Optimization Notice.