886-6-2757575 Ext.62520 Ext.2501
Multimodal emotion recognition plays a crucial role in the fields of medicine and human-computer interaction. With the rapid development of pre-trained models, more researchers are applying them across various domains to enhance model performance. However, some studies have pointed out that pre-trained models may negatively affect certain scenarios. Therefore, this paper explores how pre-trained models impact the performance of fine-tuned models from the perspective of multimodal emotion recognition.
We unified multiple datasets into the same format, performed preprocessing, and merged the datasets for subsequent model training. Additionally, we trained six base models and their corresponding pre-trained models to analyze the results. For the pre-trained models, we examined the specific impacts of three common data issues—noisy labels, class imbalance, and modality competition—on the fine-tuned models.