The State of Play in Emotion AI

Emotion AI, also known as Affective Computing or Artificial Emotional Intelligence, is an interdisciplinary field that operates at the intersection of Behavioral Science, Cognitive Computing, Computer Science, Machine Learning, Neuroscience, Psychology, Signal Processing, and others. This is one of the rapidly evolving areas of AI research today.

At the basic level, Emotion AI refers to the development of intelligent machines that recognize and interpret non-verbal cues of humans (such as body language, facial expressions, gestures, skin temperature, and tonality of voice); detect the emotional state based on these cues, and respond accordingly. These machines might even simulate the experiences of these emotions, if designed to do so. While this field of study has been ideated, discussed and researched for several decades, it is only in recent years, especially after 2015-16, that we have witnessed tangible progress in terms of practical application.

Key Applications of Emotion AI

Emotion AI has tremendous potential for application in fields like Advertising, Automotive, Gaming, Healthcare and Robotics. Below are some important use cases.

  • Advertisers use Emotion AI to measure and analyze the emotional engagement that existing (and potential) consumers may have with specific products or brands. These insights are then leveraged to create more effective advertising campaigns.
  • AI-based vocal biomarkers serve as cost-effective, scalable and easily implementable diagnostic systems for detecting psychiatric and cardiovascular ailments, such as autism spectrum disorder (ASD), attention deficit hyperactivity disorder (ADHD), anxiety, depression, brain injuries, and heart problems. These pattern detectors have the potential to significantly transform the preventive healthcare industry.
  • Automobiles are designed to detect the emotional state of their human drivers, and take proactive actions (if needed), such as taking back control (in autonomous driving), or sending notifications to the drivers’ families or the local traffic authorities.
  • Digital games with characters that understand and react to emotions in real-time (to provide a more natural experience to users) are among the next set of innovations in the gaming industry.
  • Robots that possess the ability to understand and respond to human emotions enhance the entire human-machine interaction process in a significant manner. For instance, emotion-expressing robots have a big role to play in patient care or elderly care.
  • Virtual Assistants that can predict and/or decode, and subsequently respond to users’ emotional states at every point of interaction provide a more enhanced experience.

How Does Emotion AI Work?

In theory, Emotion AI machines are expected to operate at two levels:

  • An Emotion Perception level, where the goal is to recognize/understand and interpret the complex emotional dynamics of the human or the non-human subject(s)
  • An Emotion Synthesis and Response level, where the goal is to assimilate the knowledge created at the perception level, and generate appropriate emotional responses to be relayed back to the subject(s)

In practice, much of modern AI research has generally focused on the first part, that is, detecting the emotions of human or non-human subjects. The second part is still a long shot from where we are today. Having said that, several researchers, particularly from the robotics domain, have been conducting critical research to solve this hard problem.

Emotion Detection is carried out through two approaches: Categorical and Dimensional.

  • Categorical – In this approach, emotional expressions are classified into discrete categories, e.g., anger, disgust, fear, happiness, sadness, surprise, shock, and neutral.
  • Dimensional – Emotional expressions are mapped into a continuous spectrum in multi-dimensional space, and studied in the form of two variables – Valence (the degree of emotional positivity or negativity), and Arousal (the intensity of that emotion.)

Emotion AI systems need emotion detection as well as emotion generation capabilities. AI research, to a great extent, has focused on solving the first part of this problem.

Types of Emotion AI Systems

AI researchers, from both academia and industry, have been developing new prototypes and applications to address different aspects of Emotion AI, most of which are at the level of detection and recognition. Here are some key systems that are being developed today.

  • Vision-based Emotion Detection – This is the most prevalent Emotion AI system today where facial expressions and gestures are studied to detect the emotional state of the subject(s). Recent advances in Deep Learning have tremendously boosted the development of these systems.
  • Audio-based Emotion Detection – Acoustic features are analyzed to understand the emotional state of the subject. These include prosodic features (such as loudness, pitch, rhythm and tempo) and spectral or frequency based features (such as Mel Frequency Cepstral Coefficients or MFCC).
  • EEG-based Emotion Detection – EEG signals are decomposed into frequency bands through wavelet transform. Spectral features are extracted from these bands, and the features are then analyzed to determine the emotional state of the subject(s).
  • Emotion Recognition in Conversations (ERC) – ERC is a relatively new area of AI research where emotions are extracted from conversational (text) data such as social media or chat interactions. Sentiment analysis is often considered a basic form of ERC.
  • Multimodal Emotion Detection – This involves the fusion of audio, visual and text data to get a holistic understanding of the emotional state of the subject(s).
  • Fully Affective Computing Systems – This refers to the development of natural human-machine interaction systems (e.g., friendly robots) that possess the deep ability to understand the emotional state of humans, and respond accordingly.

Most of these systems are at the early stages of their respective maturity cycles. Despite all the hype and noise created by their developers, they are likely to take least two to five years (depending on the specific use cases) to fully evolve for large-scale production usage.

Technical Challenges

Emotion AI is an evolving field, both in terms of the scope of the discipline, and the maturity of the underlying technologies. Here are some key challenges that researchers and developers face today.

  • Contextual awareness is key to the study of emotions. Context development in Emotion AI is usually more complex and more layered than most other AI use cases. Moreover, the challenges increase when multiple humans (or characters) interact with one another. This is due to factors like emotional inertia and emotional shift, inter-personal dynamics of the characters, usage of sarcasm in conversations, and others.
  • Emotion detection works best when data from multiple modes are integrated together to understand the total emotional state. This means that data from different modalities (like facial expressions, vocal signals, and conversations) need to be individually encoded as representations first, and then merged to create a singular representation. This second part is an extremely difficult task to to achieve, particularly in live environments.
  • Multiple taxonomies for categorical and dimensional modeling exist today, e.g., Ekman’s approach or Plutchik’s wheel. At this point, there is no universally accepted approach to classify emotions. Consequently, it is often challenging for AI teams, particularly those that operate without the support of neuroscience or psychology functions, to truly determine the distinct types of emotions.
  • Annotation of emotions to label them into classes are practically challenging. Firstly, labelling teams may not fully understand certain classes of emotions, particularly if these were expressed for shorter durations, or are overlapping in nature. Secondly, the annotators’ personal biases and opinions may creep into the labelling process. Both factors lead to low quality labelling, which, in turn, leads to inefficient AI models.
  • Real-time emotion detection needs sophisticated engineering workloads, particularly for large-scale environments. This necessitates designing advanced hardware and software architectures, and developing applications with extreme levels of processing and latency. Such advanced systems are not easy to design, develop and manage.

Recent Advances

The broader field of Human-Computer Interaction (HCI) has witnessed remarkable progress over the past few years, and Emotion AI has been an important part of this development.

Pre-built models on different aspects of Emotion AI are increasingly getting developed, and organizations leverage them through Transfer Learning. New and innovative forms of Convolutional Neural Networks (CNN) and Recurrent Neural Networks (RNN) are becoming state-of-the-art in this field as well. Multimodal detection systems, particularly those that integrate behavioral and physiological patterns, are getting prioritized over stand-alone detection systems.

Emotion AI, in general, is characterized by the availability of a large amount of data. Hence, Deep Learning is fast becoming the norm in this field, particularly in areas like bio-sensing research and affective engineering. Advances in other AI areas like Adversarial Learning, Object Detection, Natural Language Generation, and Deep Reinforcement Learning have had a significant spill-over effect on this field. A key area of focus has been the development of highly context-aware systems. Innovative model development, coupled with advances in High Performance Computing, have provided a significant boost to this field.

Closing Comments

Emotion AI is a good area to invest-in for companies that are seriously considering entering into the AI race. The technical knowledge gap between the established operators in this space and the new entrants – an important entry barrier in advanced technology industries – may not be as high as other AI areas. Having said that, it is important to focus on very narrow use cases in Emotion AI, and also prepare for short-term failures as the underlying technologies are not yet mature. A robust AI strategy, an effective governance model, and a bunch of premium talent can potentially close the gap between the market leaders and the new entrants within two or three years.

Emotion AI offers a great opportunity for companies to get into the AI race, and help shape our shared future.

Share this article.