1. 论文题目:Enhancing Zero-Shot Emotional Voice Conversion via Speaker Adaptation and Duration Prediction
论文作者:王世炎(#),齐天铧(#),路成,罗兆杰,郑文明(*)
论文摘要:Zero-shot Emotional Voice Conversion (EVC) aims to transform a speaker’s emotional state to match a target emotion, even for speakers and emotion categories that were not encountered during training, thereby enhancing the generalization ability of traditional EVC systems. Despite advancements in the field, existing methods often face challenges in preserving speaker identity and ensuring the naturalness of emotional expression, particularly in the context of rhythm modeling. To this end, we propose the Zero-Shot Emotion Voice Conversion (ZSEVC) model, which leverages self-supervised learning for speaker adaptation and duration prediction. To adjust speech rhythm in alignment with the target emotional state, we introduce a rhythm-aware content encoder that captures and refines discrete speech units at a finer granularity. Additionally, a hierarchical emotion fusion scheme is employed to integrate emotional features with content features, enhancing both pronunciation accuracy and emotional expressiveness. Moreover, a residual speaker-emotion fusion module is incorporated to better adapt speaker characteristics to emotional prosodic variation. Experimental results show ZSEVC’s superior performance in terms of naturalness and speaker similarity in zero-shot scenario, successfully generating emotional speeches for unseen emotions and speakers. Speech samples are available at https://wosyoo.github.io/ZSEVC.
2. 论文题目:Unsupervised Motion-Robust Self-Distillation Framework for Remote Physiological Measurement
论文作者:刘安邦,萧善霖,郑文明(*)
论文摘要:Remote photoplethysmography (rPPG) holds great potential in medical surveillance. However, head movements commonly encountered in real-world scenarios often degrade physiological estimation performance, particularly for unsupervised learning methods based on physiological frequency band priors, which usually struggle to detect dynamic signals occurring within this band, thereby limiting their performance ceiling. In this paper, we propose a novel strategy to endow unsupervised learning methods with motion robustness. Specifically, we introduce a simple motion simulation technique, Sliding Crop, to incorporate dynamic signals. Based on this, we develop an unsupervised motion-robust self-distillation framework (UMoRo) with the existing unsupervised learning method, where the model leverages its own high-quality mappings of simple samples as pseudo-labels to guide the learning process of suppressing simulated motion artifacts in challenging samples, thus enhancing motion robustness in real-world movements. Experimental results on three public datasets show that our method achieves superior or competitive performance compared to state-of-the-art supervised methods, demonstrating outstanding motion robustness.
3. 论文题目:Heterogeneous Graph Convolutional Neural Networks for EEG-fNIRS Bimodal Emotion Recognition
论文作者:赵涛,薛云龙,朱洁,郑文明(*)
论文摘要:Leveraging multimodal brain signals, such as electroencephalogram (EEG) and functional near-infrared spectroscopy (fNIRS), for the objective detection of brain activity is regarded as a promising approach for affective brain-computer interface. Existing EEG-fNIRS bimodal methods primarily focus on data alignment and basic feature fusion, neglecting the intrinsic connections between different signals and brain regions. To this end, we propose a novel graph-based method, the Heterogeneous Graph Convolutional Neural Network (HGCN), which integrates multimodal complementary information and constructs a heterogeneous EEG-fNIRS interaction graph within a coordinated hyperspace to model brain networks. Specifically, various directed edges types and node feature aggregation strategies are employed to dynamically update and enhance the representation of brain signals and emotional states, thereby improving the coherence and consistency of cross-modal signals. Furthermore, this approach provides a robust framework for exploring cross-modal spatial connectivity. In this paper, we also developed a novel EEG-fNIRS emotion database collected from 17 subjects by video stimuli. Extensive experimental results demonstrate the superiority of our method and the effectiveness of the introduced heterogeneous EEG-fNIRS interaction graph.
4. 论文题目:Enhancing Task-Specific Feature Learning with LLMs for Multimodal Emotion and Intent Joint Understanding【MEIJU'25 Challenge Track#2 (English Sub-track)冠军论文】
论文作者:李兆阳(#),路成(#),徐啸林,张凯飞,顾予佳,李邦华,宗源(*),郑文明(*)
论文摘要:This paper introduces our solution, the Task-Specific Feature Learning (TSFL) method, designed to address the second track of the MEIJU Challenge at ICASSP 2025, namely, Imbalanced Emotion and Intent Recognition (English). The TSFL method incorporates three core components: the use of LLM features to represent multimodal signals, coarse-grained task-specific feature decomposition, and fine-grained task-specific feature learning. These components enable the effective joint learning of emotion-discriminative and intent-discriminative features. As a result, our method achieved a JRBM score of 0.6230, significantly outperforming the official baseline result and surpassing all other competing teams to win the championship.
5. 论文题目:Reliable Learning From LLM Features for Multimodal Emotion and Intent Joint Understanding【MEIJU'25 Challenge Track#2 (Chinese Sub-track)冠军论文】
论文作者:徐啸林(#),路成(#),李兆阳,刘宇韵,马英豪,罗嘉豪,宗源(*),郑文明(*)
论文摘要:This paper describes a Reliable Learning Framework (RLF) for the 1st Multimodal Emotion and Intent Joint Understanding (MEIJU) Challenge at ICASSP 2025. Our proposed RLF includes a Hierarchical Interaction Network and a Reliable Fusion Strategy. The former can excavate emotion and intent cues from the high-level semantic features of multimodal data (video, audio, and text) generated by pretrained Large Language Models (LMMs), to enhance their representations, and the latter reliably integrates multiple predictions to further improve the robustness of emotion and intent understanding. Our RLF method achieved first place on Track 2 (Mandarin) of MEIJU, with performance scores for emotion, intent, and joint recognition reaching 0.7285, 0.7456, and 0.7370.
祝贺王世炎、齐天铧、刘安邦、赵涛、李兆阳、徐啸林等同学!