Editor-Curated Robotics Industry Briefing - RobotToday

Visual Language Models Train Robots to Read Human Emotions

A recent study led by Seung Chan Hong at the University of Melbourne explores the emotional capabilities of collaborative robots as they increasingly work alongside humans. Published on May 18 in IEEE Robotics and Automation Letters, the research investigates how robots can better understand human emotions through contextual cues, beyond just facial expressions. Involving 40 volunteers, the study trained a vision language model (VLM) to interpret emotions based on video interactions where robots handed objects to humans. The VLM outperformed traditional AI systems, scoring 0.86 in emotional accuracy compared to 0.77 for conventional methods. This improvement is attributed to the VLM's ability to consider the entire context of interactions rather than isolated facial expressions. In a follow-up experiment, participants interacted with a robot that was programmed to make an error, receiving either an emotionally adaptive apology or a standard one. The majority preferred the adaptive response, but trust in the robot diminished after it failed to complete its task, highlighting that emotional responses cannot compensate for a lack of functionality. While the VLM effectively recognized emotions from a third-party perspective, its accuracy dropped when compared to participants' self-reported feelings, indicating that robots still struggle to fully understand human emotions. The findings suggest that while emotional adaptivity is valuable, the primary concern for users remains the robot's competence in performing tasks.