21th AIAI 2025, 26 - 29 June 2025, Limassol, Cyprus

Integrative Analysis of Video and Audio for Micro-Expression Recognition

Sayed Rehan , Pankaj Pratyush, Sharma Aryan, Srivastava Pragya, Sharath Shylaja S.

Abstract:

  Micro-expression recognition is a tedious task given how subtle and short-lived they are, rather than solely relying on visual cues. This paper presents a method of fusing visual and audio cues using a weighted fusion approach. Visual features are gotten using InceptionResNetV1, while the audio sentiment analysis part utilizes a fine-tuned RoBERTa model. The audio is transcripted using the Whisper technique to allow RoBERTa to read and understand the emotions embedded within the transcribed text. Peak expressions are identified by the detection of the apex frame using LBP-TOP and optical flow detecting spatio-temporal deviations from the neutral baseline. The mechanism of weighted fusion will balance the impacts of visual and audio features dynamically and enhance the accuracy of recognition. The experiments on the joint SAMM and CASME datasets showed that our framework outperformed state-of-the-art methods, more so in complex emotional circumstances, highlighting the power of weighted fusion in micro-expression recognition.  

*** Title, author list and abstract as submitted during Camera-Ready version delivery. Small changes that may have occurred during processing by Springer may not appear in this window.