Harnessing The Potential Of AI At The Edge: Acoustical Evaluation Update October 2023

Over the summer, we gave you an inside peek into how we are deploying AI techniques at the edge. In June, we gave an update on our work delivering real-time acoustical analysis and a use case example.

Recently, our AI in Acoustics work has centered on speeding up the neural network performance to quickly identify the location of noises within our own studio space.  

What are the benefits of improving the speed of the neural net performance?

Let’s start with the benefits of a speedy neural net performance. 

  • Since sound is a temporal (time-sensitive) phenomenon, performance improvements allow us to more precisely identify which sounds originate at what location within our studio.

  • We can increasingly track a moving sound source through the space, nearer to real-time, with improved accuracy.

  • Enables the system to deliver real-time voice separation, recognizing multiple voice commands and enhancing recognition accuracy in noisy environments. 

What can this look like in a real-world setting?  

Here are two examples of how this can be utilized in a real-world setting: 

Example #1: Imagine you have a home theater setup. 

Temporal acoustics AI can help locate sound effects, making movie-watching more immersive. It ensures you can hear a car approaching from the left side of the screen; the sound will then come from the same direction in the room (precise identification of location’s origin).

Example #2: Imagine you’re with your family at home, and it’s a busy evening. 

You have family members chatting, the TV playing, and other sounds in the background. 

Thanks to the system's ability to deliver real-time voice separation, your Amazon Echo can accurately distinguish and act upon multiple voice commands spoken simultaneously. 

For instance, while your kids ask the Echo to play their favorite song, you can ask it to adjust the thermostat and inquire about the weather forecast. 

The system can process all these commands efficiently and execute them without confusion, making your smart home experience seamless, even in a noisy household.

What does this look like from a technical perspective? 

To enhance performance in noisy environments, the sound/noise localization system ingeniously integrates traditional DSP algorithms with cutting-edge neural network technology. 

For instance, the generative net produces coefficients for the MIMO system, implemented with the DSP algorithm. Unlike traditional MIMO systems, no adaptive training is required, even when the noise source position is in motion (i.e., moving around the room). 

This fusion enables the system to deliver real-time voice separation and directional localization. 

With this capability, the system can aid smart voice terminals like Amazon Echo (like in the example above)  in recognizing multiple voice commands simultaneously or enhancing recognition accuracy in noisy environments. 

Designing a DSP and neural network fusion system is an exciting experience, especially considering the lack of similar works in the community. 

For instance, ensuring a smooth gradient backpropagation through the fusion data processing path requires careful DSP algorithm design to prevent the problem of gradient vanishing. 

Overcoming different challenges during the process was both challenging and rewarding. It is also a great journey to program the signal process code with Tensorflow.

Stay tuned for more insights and updates from our “Harnessing the Potential of AI at the Edge” series!