Enabling Speech on Smart Home Devices

Overview of market and ecosystem

The technological landscape of the home is undergoing an immense shift, similar in scope and impact to the advent of the refrigerator or the proliferation of telephones. Our living spaces are becoming smarter and able to assist in tasks for running the home, providing peace of mind, and enriching daily life. From TVs to speakers, alarms to sprinkler systems, this explosion of smart home devices will transform the way that we live and interact with our domestic environment. Our homes, which have long been the centers of our personal universe, are on the edge of becoming responsive, autonomous, and perceptive.
Central to this domestic revolution is the voice-enabled Personal Assistant. Personal Assistants, like Alexa*, Siri*, Cortana*, and Google* allow users to schedule dinner reservations, set reminders, order household supplies and much more. While voice-enabled Personal Assistants on mobile devices have been available for a while, we are just scratching the service of devices in the home that can be controlled with the natural and intuitive power of just your voice. The ability for devices such as alarm clocks and traditional appliances such as dishwashers and washing machines to understand and respond to verbal commands will enable our homes to act as an extension of ourselves. 
And the trend toward voice-enablement in the home shows no signs of slowing down! Voice interface adoption on consumer devices has increased exponentially in the last three year. In the last quarter of 2016 alone, Amazon Echo*, Amazon Echo Dot* and Amazon* Tap devices have contributed to 15.3 million units in sales, according to Parks Associates Research. And research firm eMarketer* predicts the number of active U.S. users of voice-controlled speakers to double this year. The voice recognition market is expected to grow from $3.73 billion in 2015 to $9.97 billion in 2022. Voice recognition is more than just a novelty – it has become a clear competitive advantage.

Bring Voice Enabled Personal Assistant Devices to Life 

Intel® Smart Home Developer Kits allow product developers to add voice to a range of form factors, enabling capabilities like far-field voice, speech recognition, and amazing acoustics on low-power devices. With Intel, you can provide your consumers with seamless, intuitive experiences to unlock that value of the Smart Home. 

Personal Assistants – Overview

Personal Assistant software and Artificial Intelligence algorithms form the backbone of voice interface development. Personal Assistant software is broken down into two parts:
  1. Automatic Speech Recognition (ASR)
  2. Dialogue/Conversation manager
Some components of the Personal Assistant software may run locally in the host the and rest in cloud servers. The modules executing in the local system vs. cloud vary by provider, with Alexa, Cortana, and Google Home all having different demarcations of functionality. For Alexa, the hardware is expected to pre-process the audio before providing to it to the Personal Assistant cloud. Google Home and Cortana expect no pre-processing of audio and will perform operations in the cloud.

Far-Field Audio – System Considerations

Personal Assistant effectiveness is contingent on the capability of the device to clearly recognize user commands from a reasonable conversation distance. This poses some considerations for product developers when adding personal assistant capabilities to new devices. 
It is first important to understand some basics of far-field audio, which is defined as the ability to interact with a device from greater than four meters. Additionally, most devices require 360-degree coverage so users can utter commands from anywhere. This comprehensive level of coverage requires array microphones and adds complexity in noise mitigation and beam forming. 
Furthermore, far-field audio algorithms are complicated and dependent on both ambient noise and room conditions. For example, if the device is in the kitchen, noise from the microwave, dishwasher, sink and other appliances can make it difficult to hear a user’s utterance. From a technical perspective, this means that microphones and algorithms need to be very sensitive to understand and interpret commands amid all this ambient noise.
Meeting the performance requirements of Personal Assistant vendors require fine tuning of the algorithms with the system industrial design.

Far-Field Audio Algorithms

Three prominent far-field audio algorithms are used in voice-enabled devices:
  1. Beam Forming identifies the location of speaker and then channels the mic input corresponding to that location. Beam forming also helps in noise mitigation and ambient noise suppression.
  2. Acoustic Echo Cancellation (AEC) suppresses the audio output coming out of the speakers from interfering with the microphone. It allows for “barge-in” commands when music is playing. The AEC algorithm requires the speaker’s output as a reference into DSP for audio suppression.
  3. Key Word Spotting (KWS) detects the utterance of wake-up keywords – “Alexa”, “Hey Cortana”, “OK Google” – and notifies the host system. Firmware implementation in the DSP enables low-power operation of Wake on Voice for Personal Assistants.
Intel® far-field audio IP implements these algorithms in the DSP firmware for Intel® dev kits, enabling better performance and low-power implementation so the processor is free to do other things.
 

Considerations for Prototyping and Productizing 

Far-field audio tuning is more complicated than, say, traditional PC technology enabling, because it depends on the following factors: 
  • Mic array (configurations, performance, geometry) 
  • System form factor
  • Type and location of the speaker (for AEC purposes)
  • Acoustic isolation provided by the platform
  • Audio performance levels expected by the customer
Most Personal Assistant vendors (Amazon, Microsoft, Google) have specifications for far-field audio performance and final systems must meet these specifications to qualify for certification. System audio tuning generally requires specialized lab infrastructure and tuning facilities.
These requirements pose a challenge for developers building Smart Home devices; however, Intel is working on next-gen tools to ease the burden of audio tuning for the product.

Build the Solutions of the Future with Intel® Smart Home Developer Kits

Intel is introducing Intel® Smart Home Developer Kits to empower hardware and software developers to quickly bring new voice-enabled products to market. The primary technology in these kits is a dual DSP with inference engine, which uses a hardware accelerator for Intel®-developed Gaussian Network Accelerator (GNA). The dual DSP provides silicon, algorithms, and a reference design microphone array designed with far-field signal processing algorithms such as beamforming, echo cancellation, and noise reduction. This simplifies the addition of far-field voice, speech recognition and amazing acoustics to multiple kinds of products.  
The first developer kit, the Intel® Speech Enabling Developer Kit, will be available for sale in October 2017. This kit contains the dual DSP with inference engine, mic arrays, speaker mount, and a Raspberry Pi* connector cable to get you quickly prototyping with Alexa Voice Services. Future developer kits will enable additional features, including imaging and sensors.

Conclusion 

The Smart Home of tomorrow is autonomous, with smart and connected devices that are perceptive and responsive – using artificial intelligence to understand how we engage with our homes and delivering seamless experiences. We have just scratched the surface of what we can do and enable with speech recognition. For developers, this new technological frontier presents an incredible opportunity to provide users with devices that can improve their lives in significant, tangible ways.  
Intel® Smart Home Developer Kits provide the building blocks to creating innovative new devices for consumers. Whether you are creating a smart speaker or adding voice recognition to traditional appliances, the sections in the article provide information on some of the considerations for enabling voice. For more information on Intel® Smart Home Developer Kits, check out the additional resources below.

Additional Information

1 https://9to5mac.com/2017/05/08/apple-speaker-market-value/
2 http://www.marketsandmarkets.com/PressReleases/speech-voice-recognition.asp

 

IoT Zone Smart Home

For more complete information about compiler optimizations, see our Optimization Notice.