Providing Reliable Signal Processing Solutions Since 2001

BEAMFORMING MICROPHONE ARRAYS


Beamforming Explained

Beamforming microphone arrays are spatial filters that take multiple microphone signals as input and combine them to a single output signal. Usually the combined output is calculated by filtering each microphone signal through a digital FIR filter and summing the output of all filters as shown in the figure. The figure shows a beamforming microphone array constructed using N microphones, N FIR filters labeled FIR-1 to FIR-N and a summer. The filters are designed so that their output add constructively when sound is coming from a specific direction (main lobe) and add destructively when sound is coming from all other directions. This creates the spatial filtering effect of focusing at the sound that is coming from the main lobe direction while attenuating sounds coming from all other directions. Since the microphones are usually placed a few centimeters apart, the FIR filters need not be long; 32 to 64 coefficients for each filter are sufficient.


Beamformer Applications

Beamformers succeeds to improve speech signal to noise ratio in many challenging situations where other methods simply fail. One of those situations is speech recording in crowded places, such as restaurants and bars, where many people talk and lough at the same time, making recording the useful voice impossible. In such situation, the noise is changing fast with time, has spectral contents similar to the useful voice, and comes from all directions around the user. Trying to use a single-microphone device that employs speech recognition technology in such environment proved to be frustrating. On the other hand, a beamformer with two or more microphones can improve speech recognition rate considerably. In fact, any application that relies on microphones to record speech can benefit the same way from beamforming microphone arrays as speech recognition does.

Beamformer Implementations

Beamforming microphone arrays may be implemented in many different ways. In its simplest form, a beamformer may be designed to "listen" to a specific direction (or directions) by designing the FIR filters accordingly. Alternatively, the FIR filters may be implemented as adaptive filters that automatically "listen" to the user speech direction and automatically track the user when he/she walks around in the room. Three different beamformers are directly available.

Fixed Single Main Lobe (FSML) Beamformer

In the FSML beamformer, the coefficients of each FIR filter are calculated at design time and saved in Read Only Memory (ROM). At system initialization, the calculated coefficients are loaded from ROM and used for filtering during real-time operation. This makes the beamformer implementation no more complex than simple FIR filtering which can be implemented easily on any low end DSP or CPU.

Fixed 360° Beamformer

In the Fixed 360° beamformer, the space around the beamformer is divided to M separate main lobe directions and a separate set of N FIR filters are designed for each direction. This is equivalent to implementing M Fixed Single Main Lobe Beamformers and select only one of them during real-time operation, depending on the user position relative to the beamformer. An additional algorithm is needed, however, to select one of the M sets of filters and smoothly switch between the filters when the user moves around (or when another user starts talking) to avoid any distortion in the beamformer output.

The selection and switching algorithm usually does not consume too much processing, and the real-time processing load of the Fixed 360° Beamformer is slightly higher than the complexity of the Fixed Single Main Lobe Beamformer. The only difference is the much higher ROM space needed to store all sets of filter coefficients. If the required resolution for example is 10 degrees, then M=360/10=36 Fixed Single Main Lobe Beamformers are needed. If say, 6 microphones and 32 coefficients FIR filters are used, then the total number of FIR coefficients that need to be stored in ROM equals 6 x 32 x 36 = 6912 coefficients, which can easily fit in the internal memory of any of today's low end DSP or CPU.

Adaptive Beamformer

Adaptive beamformer function similar to the 360° beamformer in that they automatically track the user voice in the listening space. Unlike the Fixed 360° beamformers, however, adaptive beamformers do not use M sets of beamformer filters and don't switch between those sets of filters. Alternatively, a single set of FIR filters is used and the coefficients of those filters are adjusted in real-time during operation using an adaptive algorithm, the goal of which is to optimize a specific costfunction. The adaptation of filters coefficients occur gradually and smoothly as in the case of switching between two sets of filters when the direction of user voice changes. The adaptation process is performed continuously during operation so that the beamformer always listening in all directions and quickly but smoothly adapts to listen to a new direction when needed. The price for this elegant solution however is the higher real-time processing load required for adapting all filters coefficients simultaneously to optimize the cost function.

Beamformers Comparison

Fixed beamformers require little real-time processing load equivalent to N short FIR filters and, therefore, can be easily implemented on a low end DSP, CPU, or microcontroller. The disadvantage of fixed beamformers, however, is that they usually need calibration after manufacturing because of microphones tolerance and placement errors, which can be very costly and time consuming, especially in the case of the Fixed 360° beamformer.

Adaptive beamformers on the other hand design themselves in the field during real-time operation, and, therefore, require no calibration whatsoever. In fact, we use our RAY adaptive beamformer to design and calibrate the coefficients of fixed beamformer variants. The disadvantage of adaptive beamformers, however, is the higher processing load compared to fixed beamformers. But since the beamformer FIR filters are usually very short, the processing load is still moderate.

To place the processing load of adaptive beamformers in context we will take the case of CANEC demonstrator on the OCEAN-ADSP21489 platform. This demonstrator processes 6 microphone signals through 6 Acoustic Echo Cancellers, each has an echo tail length of 250 ms, at 48 kHz sampling rate, then combines the output of the 6 AEC channels using the RAY adaptive beamformer. The RAY beamformer is implemented with 256 coefficients FIR filters to cover larger microphone separation at 48 kHz sampling rate. Even with such large filters, the RAY beamformer consumes as much processing load as a single Acoustic Echo Canceller channel.

Evaluating Beamforming Microphone Arrays

A 6-element microphone array real-time demonstrator is directly available on the OCEAN-ADSP21489 platform. This demonstrator also includes the CANEC speech enhancement, to request this demo, please fill the CANEC demonstration request form.

We will be glad to give you more information on beamforming microphone arrays, please take a moment to contact us.

Copyright © DSP ALGORITHMS, All rights reserved.