Beamforming microphone arrays are spatial filters that take multiple microphone signals
as input and combine them to a single output signal. Usually the combined output is calculated
by filtering each microphone signal through a digital FIR filter and summing the output
of all filters as shown in the figure. The figure shows a beamforming microphone array
constructed using N microphones, N FIR filters labeled FIR-1 to FIR-N and a summer.
The filters are designed so that their outputs
add constructively when sound is coming from a specific direction (main lobe)
and add destructively when sound is coming from all other directions. This creates the
spatial filtering effect of focusing at the sound that is coming from the main lobe
direction while attenuating sounds coming from all other directions.
Since the microphones are usually placed a few centimeters apart, the FIR
filters need not be long; 32 to 64 coefficients for each filter are sufficient.
Beamformers succeeds to improve speech signal to noise ratio in many challenging situations
where other methods simply fail. One of those situations is speech recording in crowded and noisy places,
such as restaurants and bars, where many people talk and lough at the same time, making recording
the useful voice difficult. In such situation, the noise is changing fast with time, has spectral
contents similar to the useful voice, and comes from all directions around. Trying to use a
single-microphone device that employs speech recognition technology in such environment proved
to be frustrating. On the other hand, a beamformer with two or more microphones can improve
speech recognition considerably. In fact, any application that relies on microphones to record
speech can benefit the same way from beamforming microphone arrays as speech recognition does.
Beamforming microphone arrays may be implemented in many different ways. In its simplest
form, a beamformer may be designed to "listen" to a specific direction
(or directions) by designing the FIR filters accordingly. Alternatively, the
FIR filters may be implemented as adaptive filters that automatically "listen" to the
active user direction and automatically track the user when he/she walks around in the
room or when the active user stops talking and another user starts talking. Three different
beamformers implementations are directly available.
Fixed Single Main Lobe (FSML) Beamformer
In the FSML beamformer, the coefficients of each FIR filter are calculated at design
time and saved in Read Only Memory (ROM). At system initialization, the calculated
coefficients are loaded from ROM and used for filtering during real-time operation.
This makes the beamformer implementation no more complex than simple FIR filtering
which can be implemented easily on any low end DSP or CPU.
Fixed Multiple Main Lobes (FMML) Beamformer
A FMML beamformer consists of several FSML. A set of spatial FIR
filters is added for each main lobe direction. The output of all filters from all lobes are
summed together, making the FMML beamformer listen to several directions at the same time.
Fixed 360° Beamformer
The Fixed 360° beamformer is basically a FMML beamformer that covers all 360°
space. The difference is that instead of listening to several directions at the same time,
only one direction (the current active user direction) is selected.
The space around the beamformer is divided to M separate
main lobes, each lobe has resolution (360/M)° and a separate set of FIR filters is
designed for each lobe. An additional algorithm is needed to select one of the M outputs
and smoothly switch from one direction to another when the user moves around (or when
another user starts talking) to avoid any distortion in the beamformer output.
Adaptive beamformers function similar to the 360° beamformer in that they automatically
track the user voice in the listening space. Unlike the Fixed 360° beamformers,
however, adaptive beamformers do not use M sets of beamformer filters and don't switch between
those sets of filters. Only a single set of FIR filters is used and the coefficients
of those filters are adjusted in real-time during operation using an adaptive algorithm
to focus on the current active user direction. The adaptation of filters
coefficients occur gradually and smoothly as in the case of switching between two sets of
filters when the direction of user voice changes. The adaptation process is performed
continuously during operation so that the beamformer always listening in all directions and
quickly but smoothly adapts to listen to a new direction when needed.
Fixed beamformers require little real-time processing load equivalent to N short FIR
filters and, therefore, can be easily implemented on a low end DSP, CPU, or
microcontroller. The disadvantage of fixed beamformers, however, is that they usually need
calibration after manufacturing because of microphones tolerance, which
can be very costly and time consuming, especially in the case of the Fixed 360° beamformer.
Adaptive beamformers on the other hand design themselves in the field during operation,
and therefore, require no calibration whatsoever. In fact, we use our RAY adaptive
beamformer to design and calibrate the coefficients of fixed beamformer. The
disadvantage of adaptive beamformers, however, is the higher processing load
compared to fixed beamformers required by the adptive algorithm. But since the beamformer
FIR filters are usually very short, the processing load is still moderate.
To place the processing load of adaptive beamformers in context we will take the case of
CANEC demonstrator on the OCEAN-ADSP21489 platform. This demonstrator processes 6
microphone signals through 6 Acoustic Echo Cancellers, each has an echo tail length of 250
ms, at 48 kHz sampling rate, then combines the output of the 6 AEC channels using the RAY
adaptive beamformer. The RAY beamformer is implemented with 256 coefficients FIR filters
to cover larger microphone separation at 48 kHz sampling rate. Even with such large filters,
the RAY beamformer consumes 50 MIPS, that is as much processing load as a single Acoustic Echo Canceller channel.
It always comes down to asking what can we expect from investing in a beamformer.
The answer will depend of course on the beamformer design, algorithm, number of microphone elements,
array shape, and microphone spacing. Here we try to show what you can expect by sharing recordings
using our OCEAN-ADSP21489 demonstrator with 6 microphone elements linear array at 5cm spacing and 48 kHz sampling rate.
The first example below has been recorded in a quiet office. The only background noise is from normal
office environment. The second example has been recorded in the same office while music is playing
in the background. In both cases, the recording is a stereo audio file. The LEFT (top) channel is a single
microphone output and the RIGHT (bottom) channel is the RAY adaptive beamformer output. The two channels
have been adjusted to give the same loudness for fair comparison.
In each recording, two users have been simulated by playing audio to two different loudspeakers. One loudspeaker placed one meter directly in front of the array
played a female voice while the other loudspeaker placed two meters at 45° from the array played a male voice.
The recordings should show how the RAY algorithm smoothly switches between the two users without being noticed, how the adaptation is fast and smooth, and how successful the beamformer in separating the user voice from background noises.
As a human listener, you need to listen to those audio files over good quality headphones (never through loudspeakers)
to make any sense of the whole point. Repeatedly playing one phrase while switching between LEFT and
RIGHT should show that the microphone array effectively brings the user voice to the front of the audio scene, putting
every thing else far to the background. If you listen to the two channel playing at the same time through headphones,
you should effectively hear as if the sound is coming more from the right ear although the two channels are at equal loudness.
For a speech recognition system; customers have reported improved recognition (lower word error rate) by average 70% when
the microphone array output is used compared to single microphone.
Evaluating Beamforming Microphone Arrays
A 6-element microphone array real-time demonstrator is directly available on the
This demonstrator also includes the CANEC speech enhancement, to request this demo,
please fill the CANEC demonstration request form.
We will be glad to give you more information on beamforming microphone arrays, please
take a moment to contact us.