PSYCH OpenIR  > 健康与遗传心理学研究室
基于深度学习对伪装表情的识别
其他题名Recognition of Masked Expressions Based on Deep Learning
刘永刚
导师赵 科
2023-12
摘要

面部表情识别是当前人工智能领域的研究热点之一。除了正常的自然表情外,在一定的场景中,人们可能会出于某些目的,有意改变其面部表情,从而伪装或隐藏其真实情感。目前关于伪装表情识别的研究较少。伪装表情相较于普通表情 更复杂,能够用于模型训练的伪装表情数据集样本数量有限,使其成为一个具有挑战性的研究领域。

本研究基于中国科学院心理研究所的伪装表情数据集(MFED),利用二维静 态图像实现端到端的伪装表情自动识别。研究主要分为两个阶段:首先,通过卷 积神经网络,如 GoogLeNet、ResNet、MobileNet 等,对各个模型进行适当改造,以满足伪装表情的多分类任务;其次,采用迁移学习策略,将在ImageNet 数据集上预训练的权重参数应用于上述模型,以实现伪装表情的识别。为了确保数据的质量,对数据集进行了预处理,消除了可能的无关因素干扰。同时,通过数据增强和正则化技术,旨在提高网络的性能和模型的鲁棒性及泛化能力。评估模型的性能时,采用了留一被试交叉验证,并使用了准确率(Accuracy)、精确率 (Precision)、召回率(Recall)、F1 分数(F1-score)和混淆矩阵(Confusion Matrix)作为评价指标。此外,本研究还探讨了数据集中不同帧(起始帧、峰值帧、结束帧)以及不同表情类别的识别效果,比较了不同模型和不同网络结构的识别性能。

要求做出的表情类别(6R)、视频诱发的情绪类别(6E)和混合表情类别(6R× 6E=36)的识别结果:采用卷积神经网络,GoogLeNet 模型的识别准确率(Accuracy) 分别为 63.62%、39.97%和 22.11%,与 LBP-TOP+SVM 方法相比,提高比例为 33.19%、48.99%和 96.52%;采用迁移学习策略,ResNet18 模型的准确率(Accuracy) 分别为 64.78%、42.16%和 21.21%,与 LBP-TOP+SVM 方法相比,提高比例为 35.61%、57.14%和 88.52%。

研究结果表明,伪装表情的识别率得到了进一步的提升,并明确了不同帧对表情识别的具体影响。基于静态图像的分析,本研究对动态图像的伪装表情自动识别进行了初步探索。此研究深化了伪装表情自动识别领域的理解,为欺诈和虚 假陈述的检测提供了潜在的技术手段,并为未来相关研究提供了有价值的参考。

其他摘要

Facial expression recognition is currently one of the research hotspots in the field of artificial intelligence. In addition to normal natural expressions, in certain scenarios, people intentionally change their facial expressions for a certain purpose to mask or conceal their true emotions. Currently, there is relatively little research on the recognition of masked expressions. Masked expressions are more complex than ordinary expressions, and the number of masked expression datasets that can be used for model training is limited. The automatic recognition of masked expressions will be a new challenge.

This study utilized the Masked Facial Expression Database (MFED) from the Institute of Psychology, Chinese Academy of Sciences. An end-to-end masked expression automatic recognition was constructed based on two-dimensional static images. The study was divided into two parts. Firstly, convolutional neural networks were used, such as GoogLeNet, ResNet, and MobileNet. Each model was appropriately modified to adapt to the multi-classification task of masked expressions. Secondly, the transfer learning method was used. The pre-trained weights on the ImageNet dataset were transferred to the network model for recognition of masked expressions. To ensure data quality, the images in the MFED dataset were preprocessed to remove interference from unrelated factors. Concurrently, data augmentation and regularization were used to enhance network performance and improve the robustness and generalization ability of the model. For model performance evaluation, leave-one-subject-out cross-validation was used, and accuracy, precision, recall, F1-score, and confusion matrix were utilized as evaluation metrics. Additionally, the study also explored the recognition performance of different frames (start frame, peak frame, end frame) and different expression categories in the dataset, and compared the recognition performance of different models and network structures.

The recognition results of the required expression classification (6R), experienced emotion classification (6E), and mixed expression classification (6R× 6E=36) were as follows: Using convolutional neural networks, the recognition accuracy of the GoogLeNet was 63.62%, 39.97%, and 22.11%, respectively, which is improved by 33.19%, 48.99%, and 96.52% compared to the LBP-TOP+SVM method. Using the transfer learning method, the recognition accuracy of ResNet18 was 64.78%, 42.16%, and 21.21%, respectively, which is improved by 35.61%, 57.14%, and 88.52% compared to the LBP-TOP+SVM method.

The research results demonstrate that the recognition rate of masked expressions has been further improved, and the impact of different frames on expression recognition has been confirmed. Based on the research results of static images, a preliminary exploration was conducted on the masked expression automatic recognition in dynamic images. The study has deepened the understanding on the automatic recognition of masked expressions, providing potential technological means for detecting deception and false statements, and offering valuable insights for future related studies.

关键词表情识别 伪装表情 卷积神经网络 迁移学习 数据增强
学位类型继续教育硕士
语种中文
学位名称理学硕士
学位专业应用心理学
学位授予单位中国科学院大学
学位授予地点中国科学院心理研究所
文献类型学位论文
条目标识符http://ir.psych.ac.cn/handle/311026/48188
专题健康与遗传心理学研究室
推荐引用方式
GB/T 7714
刘永刚. 基于深度学习对伪装表情的识别[D]. 中国科学院心理研究所. 中国科学院大学,2023.
条目包含的文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
刘永刚-同等学力论文.pdf(3088KB)学位论文 限制开放CC BY-NC-SA请求全文
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[刘永刚]的文章
百度学术
百度学术中相似的文章
[刘永刚]的文章
必应学术
必应学术中相似的文章
[刘永刚]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。