Extracting Emotions from Voice Signals

In this project, we have developed an adaptable system that processes human voice and returns a set of emotions and their intensity levels. The system can be easily plugged in into Em-bodied Conversational Agents (ECAs) or other interactive systems to enrich user experience. As research into ECAs is becoming more mature, conversations with ECAs are increasingly being perceived as natural, or at least 'believable'. An important requirement for effective ECAs is their ability to react to behaviour of the trainee in a similar manner as a human interlocutor would do. Otherwise, there is a risk that the system reinforces the wrong behaviour. For in-stance, a virtual agent that only listens to you if you address it with a submissive attitude is probably not very useful for leader-ship training. Hence, making an ECA show the appropriate re-sponse to the appropriate behaviour of the trainee is crucial. Although most ECAs respond to what the user says, they often do not respond to how the user says it. This is a serious limita-tion, as the style of a person's speech is very important during social interactions. Humans heavily rely on vocal cues (such as volume, or speed of talking) to infer other people's emotions. For example, the phrase 'sorry sir, we cannot accept 100 Euro bills' can be perceived as very friendly when it is uttered calmly and gently, but it can be perceived as offensive when it is ut-tered with a quick and monotone voice. Especially for commu-nication training it is important to take such differences into ac-count, as it allows professionals to learn not only what to say during their job, but also how to say it. In this project, such non-verbal aspects of communication are taken into account by inferring a user's emotion from voice sig-nals. This ability is also very useful for other types of interactive systems, thus extending their capacities of providing better and more precise services. Often, context conveys crucial information that is neglected by systems and Human-Computer Interaction applications. In this work, context was incorporated in the form of a Context Aware-ness module. The proposed approach combines 2 algorithms in a complementary way: Support Vector Machines (SVMs) take care of most of the common cases, while decision trees are used to solve borderline situations, hence reducing the possibil-ity to make mistakes. The combination of both algorithms is tuned to exploit the best of both worlds, thus minimising mis-takes and increasing the accuracy.

Useful links