其他摘要 | Facial expression recognition is currently one of the research hotspots in the field of artificial intelligence. In addition to normal natural expressions, in certain scenarios, people intentionally change their facial expressions for a certain purpose to mask or conceal their true emotions. Currently, there is relatively little research on the recognition of masked expressions. Masked expressions are more complex than ordinary expressions, and the number of masked expression datasets that can be used for model training is limited. The automatic recognition of masked expressions will be a new challenge.
This study utilized the Masked Facial Expression Database (MFED) from the Institute of Psychology, Chinese Academy of Sciences. An end-to-end masked expression automatic recognition was constructed based on two-dimensional static images. The study was divided into two parts. Firstly, convolutional neural networks were used, such as GoogLeNet, ResNet, and MobileNet. Each model was appropriately modified to adapt to the multi-classification task of masked expressions. Secondly, the transfer learning method was used. The pre-trained weights on the ImageNet dataset were transferred to the network model for recognition of masked expressions. To ensure data quality, the images in the MFED dataset were preprocessed to remove interference from unrelated factors. Concurrently, data augmentation and regularization were used to enhance network performance and improve the robustness and generalization ability of the model. For model performance evaluation, leave-one-subject-out cross-validation was used, and accuracy, precision, recall, F1-score, and confusion matrix were utilized as evaluation metrics. Additionally, the study also explored the recognition performance of different frames (start frame, peak frame, end frame) and different expression categories in the dataset, and compared the recognition performance of different models and network structures.
The recognition results of the required expression classification (6R), experienced emotion classification (6E), and mixed expression classification (6R×
6E=36) were as follows: Using convolutional neural networks, the recognition accuracy of the GoogLeNet was 63.62%, 39.97%, and 22.11%, respectively, which is improved by 33.19%, 48.99%, and 96.52% compared to the LBP-TOP+SVM method. Using the transfer learning method, the recognition accuracy of ResNet18 was 64.78%, 42.16%, and 21.21%, respectively, which is improved by 35.61%, 57.14%, and 88.52% compared to the LBP-TOP+SVM method.
The research results demonstrate that the recognition rate of masked expressions has been further improved, and the impact of different frames on expression recognition has been confirmed. Based on the research results of static images, a preliminary exploration was conducted on the masked expression automatic recognition in dynamic images. The study has deepened the understanding on the automatic recognition of masked expressions, providing potential technological means for detecting deception and false statements, and offering valuable insights for future related studies. |
修改评论