When a startup called Beyond Verbal began mapping the patterns and frequency of voices expressing various emotions, they noticed something odd: Love and hate looked almost identical. Yet despite the apparent similarities, our human minds can easily grasp the difference, and now machines are learning how to do it, too. In May, the startup revealed a technology that it says can identify hundreds of emotions in real time based on a voice alone.
The key to algorithmically deciphering the difference between love and hate? Beyond Verbal CEO Yuval Mor demonstrating the huffy tone of hatred says.
“When you hate someone you really push the air and speak out.”
Then he switches to a kinder, cooing quality:
“When you love someone, this is the ‘pull’ kind of method.”
While to human ears the two tones are obviously different, on paper, their sound patterns look quite similar--except for these “push” and “pull” qualities, which the company built into its algorithm.
Yoram Levanon (Top) and Lan Lossos (Bottom)
Beyond Verbal’s technology is based on research that two PhD students, Yoram Levanon and Lan Lossos, began in 1995. At the time, they were investigating emotion’s role in decision-making. That developed into gigs as advertising consultants, and eventually, after realizing the connection between intonation and emotion, they started a business that improved call centers by identifying caller emotions. That company went into liquidation, but others saw opportunities for the researchers' data-driven emotion identifier beyond the narrow scope of call centers.
Dominance - This diagram represents the relative location of a voice filled with dominance in each aspect of basic acoustics (Click Image To Enlarge)
If, say, Apple’s Siri understood how you were feeling in addition to what you were saying, it could pull up not just a playlist, but rather a playlist that matched your mood. Politicians could use the technology to practice enhancing qualities such as leadership in their voices while giving speeches. People with Asperger's syndrome, who often have communication difficulties, could use it to understand verbal cues that extend beyond literal words. It could even help air traffic controllers identify when pilots were under stress.
Excitement - This diagram represents the relative location of a voice filled with excitement in each aspect of basic acoustics (Click Image To Enlarge)
Since a tiny startup couldn’t pursue all of these options at once, the researchers helped start a new company, Beyond Verbal, that focuses on making their technology available for developers to use in their own apps. The company has not yet announced its first customers.
Desire - This diagram represents the relative location of a voice filled with desire in each aspect of basic acoustics (Click Image To Enlarge)
During a demonstration in Fast Company’s offices, VP of Marketing & Strategic Accounts Dan Emodi played a video of President Obama talking about Mitt Romney during the 2012 presidential election. Beyond Verbal’s test app, Moodies, identified emotions in his voice like “provocation” and “cynicism.”
While listening to Princess Diana talk about her troubled marriage in a 1995 interview with the BBC, it noticed “love,“ “grief,” and “feeling of loss.“
You can try it on your own bit of monologue here.
The diagram below represents the relative location of a voice filled with Fear in each aspect of basic acoustics. The color represents the degree and location of the tested emotion, in this case fear, within that grid.
Emodi says.
"You think about fear, and fear is cold. We didn't touch this...it so happened that fear is represented by blue colors [in our system for mapping the relationship between voice and emotion]."
The app is not attempting to understand what you say, but how you say it. Researchers based the technology on an analysis of more than 1,000 speeches and questionnaires from several thousand study participants who listened to sound clips and chose corresponding emotions. Beyond Verbal has tested it by asking about 60,000 people online in 26 countries whether the machine had correctly paired sound clips and emotions. That research is continuing, says Emodi, and humans agree with the machine’s analysis 75% to 80% of the time, with success being slightly lower in countries such as Vietnam that speak tonal languages.
Is a machine that understands emotions creepy? A bit. Will politicians use it to practice their false apologies until they sound genuine? Probably. But, argues Mor, you could just as easily use the technology to practice sounding confident as a leader or loving as a parent, and through practice affecting these qualities, a la the “make it till you fake it” method, you may actually achieve them. In any case, you’ll be more in tune with the importance of how you say things.
Mor says.
“It’s not that you are necessarily [just] talking differently. You’re listening. You’re listening to yourself.”
COMMENTARY: In a blog post dated June 21, 2012, I commented on the neuroscience technology developed by Berkeley, California based NeuroFocus which measures the emotional responses of the human brain to external stimuli and how these differ between men and women. What is even more significant is how both sexes neurologically react to advertising and what aspects of an advertisement stimulates them, creating positive responses that ultimately lead to making a purchasing decision. This new research into the human brain has led to the development of the new science of neuromarketing which explores the dynamics of consumer purchasing decisions, customer brand affinity, and customer brand engagement.
Neuroscience research is making it increasingly clear that many of the decisions people make about what products they will buy or what services they will use are a result of intuition and unconscious mental processes rather than analysis or reasoning. Consumers have emotional responses and logical responses to marketing and advertising. And in both types of responses can occur on a conscious basis or an unconscious basis.
Experts at NeuroFocus suggest that approximately 11 million bits of sensory information is processed by the human brain every second. But only 40 bits of sensory information is processed consciously - this is all the conscious mind can handle - and all the rest of the information is processed unconsciously.
Advertising and marketing efforts are part of the universe of sensory information that the brain processes every day in a typical environment in the developed world. Images used in advertising, colors and designs used in packaging, music and fragrances associated with services, and the rich attribute sets of products themselves are all subject to the conscious and unconscious sorting that happens in the marketplace.
I believe that the creaton of emotion-driven apps for voice-activated apps in response to our voice patterns could eventually lead to customized marketing messages that increase the effectiveness of a brands advertising. I could easily see a situation where I could make a phone call to order something and by decoding the emotional patterns of my voice, a salesperson or automated system could custmize its sales pitch to better serve my needs and increase the likelihood of a purchase.
I think that Beyond Verbal's voice pattern technology still have a way to go, and a lot of data still has to be gathered and analyzed, before generalizations can be drawn about the voice patterns of the population as a whole. As NeuroFocus has discovered, there are distinct differences between how males and females respond to marketing messages. The same thing maybe true in our voice patterns.
Courtesy of an article dated July 17, 2013 appearing in Fast Company
Comments