We all use acoustic cues in speech perception. These cues play an important role in how we interpret speech emotion and what someone is trying to convey. Deficits in speech emotion perception can occur in conditions like autism and Parkinson's disease. Brain structures and communication pathways are important components of speech emotion recognition given the neural substrates involved. Unfortunately, some people experience difficulty in conveying emotions through speech which negatively impacts social interactions and professional success. Speech analysis and clinical algorithms may be used to assist individuals with emotion perception and expression.
From the time we are infants, we understand and differentially respond to the acoustics of a spoken message - the way speech sounds.
Imagine a mother holding a baby. Her lilting infant-directed speech, with attention-grabbing variations in pitch and words spoken through a smiling mouth, are positively perceived by the baby, reinforcing interaction, connection, and bonding. In contrast, Mom’s stern admonishment, spoken abruptly, loudly, and at a low pitch, evokes a negative response–a frown, even tears, as the infant recognizes that mom is not happy. This innate ability to recognize human emotion from voice is critical for survival and navigating our social environments. And as the baby grows up, these skills of perceiving emotion from voice become more and more refined, such that the subtle cues associated with irony, sarcasm, and humor can be discerned from the acoustics of speech.
Humans, and even other mammals, are exceptionally sensitive to acoustic cues of emotion. How many times have you asked a friend or family member how they’re doing and distrusted their “I’m fine, thanks” because of how they say it? Maybe their intonation is flat or falling; or the interval between “I’m” and “fine” is milliseconds too long. It’s just enough for you to perceive a discrepancy between their positive verbal message and their state of mind.
Being able to analyze the emotion of others from the sound of their voices is reliant on the fact it’s hard for most of us to fake emotions convincingly. Much as though I’d like my voice to stop shaking while public speaking, I can’t make it stop. While I can try to fake enthusiasm for a sushi dinner, those who know me well will sense my insincerity from how I say “I’m in!” Actors are the rare individuals who master the art and science of convincingly conveying emotions they artificially feel.
But there are others of us who are unable to use these acoustic cues to infer emotion. Autism is marked by such deficits, as are mental health disorders such as schizophrenia. In fact, there is evidence that people who acquire Parkinson’s disease become less proficient at inferring another’s emotional state from the way they speak. This all points back to the neural substrates underlying emotion perception. Key input from structures deep within the brain, such as the amygdala, hippocampus, and insula, communicate with interpretive centers in the frontal, temporal, parietal, and occipital lobes to draw inferences. When this communication is absent or impeded, inference about emotion isn’t possible.
These same people who have difficulty perceiving emotion from speech often have difficulty conveying emotion through their own speech. Others may perceive them as being, bored, disengaged, uncaring, or aloof when their speech sounds monotone or abrupt. Imagine the difficulty this creates establishing and maintaining friendships; or being successful in one’s career.
By analyzing speech acoustics, such as the pitch patterns and speaking rate and rhythm, we are able to estimate the amount of emotional expression in the voice. There are even mobile apps designed to do just that, but they’re more for entertainment than for clinical applications. Clinical grade algorithms are being developed to help those with difficulty perceiving and expressing emotion in their voice by providing practice and feedback opportunities.