Industry News Details

Physiological signals could be the key to 'emotionally intelligent' AI, scientists say Posted on : Apr 04 - 2022

The multimodal neural network is used to predict user sentiment from multimodal features such as text, audio, and visual data. In a new study, researchers from Japan account for physiological signals in sentiment estimation while talking with the system, greatly improving the system’s performance. Credit: Shogo Okada from JAIST.

Speech and language recognition technology is a rapidly developing field, which has led to the emergence of novel speech dialog systems, such as Amazon Alexa and Siri. A significant milestone in the development of dialog artificial intelligence (AI) systems is the addition of emotional intelligence. A system able to recognize the emotional states of the user, in addition to understanding language, would generate a more empathetic response, leading to a more immersive experience for the user.

"Multimodal sentiment analysis" is a group of methods that constitute the gold standard for an AI dialog system with sentiment detection. These methods can automatically analyze a person's psychological state from their speech, voice color, facial expression, and posture and are crucial for human-centered AI systems. The technique could potentially realize an emotionally intelligent AI with beyond-human capabilities, which understands the user's sentiment and generates a response accordingly.

However, current emotion estimation methods focus only on observable information and do not account for the information contained in unobservable signals, such as physiological signals. Such signals are a potential gold mine of emotions that could improve the sentiment estimation performance tremendously.

In a new study published in the journal IEEE Transactions on Affective Computing, physiological signals were added to multimodal sentiment analysis for the first time by researchers from Japan, a collaborative team comprising Associate Professor Shogo Okada from Japan Advanced Institute of Science and Technology (JAIST) and Prof. Kazunori Komatani from the Institute of Scientific and Industrial Research at Osaka University. "Humans are very good at concealing their feelings. The internal emotional state of a user is not always accurately reflected by the content of the dialog, but since it is difficult for a person to consciously control their biological signals, such as heart rate, it may be useful to use these for estimating their emotional state. This could make for an AI with sentiment estimation capabilities that are beyond human," explains Dr. Okada.

The team analyzed 2,468 exchanges with a dialog AI obtained from 26 participants to estimate the level of enjoyment experienced by the user during the conversation. The user was then asked to assess how enjoyable or boring they found the conversation to be. The team used the multimodal dialog data set named "Hazumi1911," which uniquely combined speech recognition, voice color sensors, facial expression and posture detection with skin potential, a form of physiological response sensing.

"On comparing all the separate sources of information, the biological signal information proved to be more effective than voice and facial expression. When we combined the language information with biological signal information to estimate the self-assessed internal state while talking with the system, the AI's performance became comparable to that of a human," comments an excited Dr. Okada. View more