ASU Learning Sparks

Interpreting Sound Waves Through Audio Analysis

Written by Kimberlee Swisher | May 30, 2023 4:12:02 PM

Sound waves and vibrations are processed through mediums that allow us to hear sounds our brain can understand. Sound vibration frequency, amplitude and timbre all determine what we hear.Computers use this information to convert sound waves through a process called analog to digital conversion (ADC). Audio analysis is a powerful tool that has allowed advances in speech recognition, app development and more.

Sound is a vibration. Sound waves travel through a medium, usually air but also water, solids, humans - through pretty much everything.

Your eardrum is a membrane that waves back and forth in response to incoming sound vibrations. The movement of the eardrum vibrates ossicles, tiny bones in your ears, and eventually the vibrations are transferred to the fluid-filled cochlea, and movement in this fluid activates cilia, tiny hair cells in the inner ear which have a direct neural connection to the brain and send electrical pulses through the auditory nerve into the brain, which takes these incoming electrical signals and processes them into sounds our brain can understand. 

A microphone is similar; a membrane (often called a diaphragm) vibrates back and forth and this movement creates electrical signals in a pattern that matches the incoming sound waves. A microphone is a transducer, something that converts mechanical wave energy (a sound wave) into electrical energy. A computer can interpret the electrical signals just as the brain's auditory cortex interprets the pulses from the cilia. 

There are three physical properties of the sound vibrations themselves that determine what we hear:

  • Frequency, which humans hear as "pitch" whether something is high or low
  • Amplitude, which is the strength of the wave and which humans (sort of) hear as volume, although, it's not linearly related; Volume is a perceptual quality, described by how we experience loudness, which is not directly related to amplitude AND is also dependent on frequency
  • and Timbre, which is the quality of a sound, meaning how we determine if something is a song or a construction truck, and how we identify specific voices of friends and family.

When we digitize sound into audio, we can analyze the audio for these physical properties - which means we can use mathematical and computational processes to figure out the mix of frequencies and amplitudes in an audio signal.

The process of converting a physical phenomenon (the sound wave) into an electrical signal the computer can understand is called an analog to digital conversion, often shortened to its acronym ADC. Once the signal is inside of a computer we have all kinds of ways to alter it using computer software that is specifically built for audio editing. 

We can speed it up, slow it down, mix it with other sounds, apply filters to make certain parts of the sound louder and other parts softer.

A process that takes the digital audio as electrical signals and turns it back into physical vibrations via a speaker is called the digital to analog conversion, or d-a-c- DAC. A common way of visualizing this is a spectrogram, which uses the y-axis as frequency and color as amplitude - here is a spectrogram of a bird call.

We analyze audio for all sorts of reasons; some very common and practical:  speech recognition and transcription, recognizing specific voices, some very artistic: to create understanding of sounds that aid in the production of music. 

One ecologist, David Dunn, even recorded tree trunks to listen to hear if they were infected with bark beetles, enabling uninfected trees to be saved from destruction. 

In my field of interactive media arts, we create and produce music and sounds, but we also analyze audio so that we can transform it into something else -- something, expressive, meaningful, aesthetic. 

Artificial Intelligence (AI) and Machine Learning techniques have brought even more power to audio analysis. Now not only can we analyze the frequency and amplitude in a single audio file, we can analyze patterns of these characteristics across many many thousands of audio files, which has enabled incredibly effective speech recognition, song recognition apps such as Shazam, and even AI algorithms that can compose music alone or in collaboration with a human composer.