Air Force

Detecting background music audio contamination

Improved audio processing by identifying interfering music

Software & Information Technology

Air Force researchers have invented a valuable tool for cleaning audio recordings. Described below, the technology is available to businesses that would develop it into new products or services.

The first step in many audio processing techniques is to purify the audio stream by detecting where speech is and is not. This is called voice activity detection, which usually capitalizes on the energy of the signal and the harmonic structure of speech (or the lack of harmonic structure in noise).

An additional step in voice activity detection involves detecting signals that are contaminating a speech signal, such as music. If audio signals, which are contaminated with background music, are fed to an automated process, such as is language identification, the results may be degraded. Music detection is a difficult task due to the structure of music which can be similar to speech.

Additionally, there is a strong variability of music genres, recording quality, and languages which complicate the process.

Most approaches for music detection compute several features and feed the features to a classifier. Air Force scientists have developed an approach that avoids the pitfall of needing to provide music and non-music examples to train a classifier, such as a neural network.

Instead, their patented approach defines what music is which makes this approach robust to varying recording settings, contaminating signals, and various artifacts. The outcome of this work is an accurate music detection algorithm that can operate in poor conditions but can also succeed in clean recording environments.

The invention provides a way to detect music in speech processing by decomposing an audio signal into component signals in one or more bandwidths.

The invention detects energy levels across preselected time and frequency windows within the narrowest bandwidth components. A predetermined number of detections at predetermined detection levels will result in a characterization of any music in the window.

Do you have questions or need more information on a specific technology? Let's talk.

Contact Us