基于语音特征的焦虑识别技术研究

PSYCH OpenIR > 社会与工程心理学研究室

	基于语音特征的焦虑识别技术研究
其他题名	Research on Anxiety Recognition Technology Based on Speech Features
	刘莹
导师	刘晓倩
	2024-12
摘要	根据 2023 年发布的《中国国民心理健康发展报告（2021-2022）》，国内有 15.8%的人口面临焦虑风险。长期处于焦虑状态显著降低了人们的生活质量，具体表现在精神健康总分、生理健康总分以及健康效用评分均明显低于非焦虑群体。因此，焦虑的早期识别与干预对于心理健康服务至关重要。然而，传统的焦虑识别方法依赖于患者的自我报告和专业医生的诊断，成本高、难以满足大规模早期筛查的需求，研发高效便捷的焦虑识别技术对于推动智能化心理健康服务事业发展具有重要意义。语音是人类生存和社会活动极其重要的信息传递和情感表达交流最自然的方式之一，是一种非常常见且容易获得的行为数据。近年来，随着机器学习技术的发展，语音情绪识别技术已经取得了显著进展，并在抑郁等心理健康问题的辅助识别中展现出巨大潜力。针对焦虑的识别，研究多聚焦于声学特征与焦虑情绪的相关性分析。现有基于语音的焦虑识别系统，尽管有潜力，但对语音特征的使用不充分，影响识别效果。且无论是声学特征还是深度学习特征，在单一使用时均难以充分表达语音中丰富的情感信息。针对上述问题，本研究深入探讨了能够准确反映个体焦虑状态的多维度语音特征提取技术，目标是开发一种基于机器学习的自动化焦虑测量方法，这种方法能够高效、准确地评估个体的焦虑状态。主要研究内容如下：研究一深入分析语音的声学特征与焦虑的相关性，同时从生理学、心理学、神经生物学等角度通过大量的文献考证语音特征和焦虑之间的联系，及其在相关性上的性别差异。通过这种多角度、多层次的综合分析，我们期望能够为理解焦虑情绪的声学表现提供更为丰富和细致的视角。研究结果表明，不同焦虑水平的个体在语音特征上呈现出显著差异，这为后续的模型构建提供了关键的特征支持。研究二基于声学特征建立焦虑识别机器学习模型，将样本群体的焦虑量表得分作为因变量，声学特征作为自变量。首先利用特征选择和降维技术，筛选出对焦虑识别最有贡献的特征子集，进而探索不同回归算法对焦虑预测准确度的影响，并选择最优的回归器方法，基于声学特征建立自动化识别回归模型。最终实验结果表明，从回归模型的预测效果来看，在整体样本中，预测分数与焦虑量表的实际得分之间的皮尔逊相关系数达到了 0.663，显示出模型具有较高的预测准确性，对应信度指标为 0.658，进一步验证了模型的稳健性和可靠性。而在女性样本中，相关系数更是高达 0.708。实验结果初步验证了基于声学特征建立自动化焦虑识别模型的可行性。研究三融合深度学习特征进一步优化焦虑识别机器学习模型，提高了识别精度。具体而言，本研究选取了基于时间延迟神经网络（TDNN）的先进深度学习模型 ECAPA-TDNN，以更好捕捉语音信号中的时序信息。通过将 ECAPA-TDNN 提取的深度学习特征集成到我们的自动化焦虑水平识别模型中，实现了模型性能的显著提升。在整体样本中，预测分值与焦虑量表实际得分的皮尔逊相关系数达到了 0.706，对应信度指标为 0.629。在女性样本群体中，皮尔逊相关系数更是显著提升至 0.759。实验结果证明融合深度学习特征与声学特征建立的焦虑识别机器学习模型整体有较好的识别效果。本研究通过广泛捕捉语音中的声学特征，结合深度学习特征建立了多维度特征融合的自动化焦虑识别模型，预测值与焦虑量表实际得分之间的皮尔逊相关系数达到 0.706，在女性样本群体中更是提升至 0.759，达到显著相关。分半信度指标为 0.629，显示模型具备稳定性。研究提高了焦虑识别模型的实用价值和准确性，也为心理健康领域的早期干预和高风险行业的作业人员心理监测提供了新的技术支持。
其他摘要	According to the "China National Mental Health Development Report (2021- 2022)" released in 2023, 15.8% of the population in China is at risk of anxiety. Longterm anxiety significantly reduces the quality of life, with mental health scores, physical health scores, and health utility scores all markedly lower than those of non-anxious groups. Therefore, early identification and intervention of anxiety are crucial for mental health services. However, traditional anxiety identification methods rely on patients' self-reporting and professional doctors' diagnoses, which are costly and cannot meet the needs of large-scale early screening. The development of efficient and convenient anxiety identification technology is of great significance for promoting the development of intelligent mental health services. Voice is one of the most natural ways of information transfer and emotional expression in human survival and social activities, and it is a very common and easily accessible behavioral data. In recent years, with the development of machine learning technology, speech emotion recognition technology has made significant progress and has shown great potential in assisting the identification of mental health issues such as depression. In the identification of anxiety, research has often focused on the correlation analysis between acoustic features and anxiety emotions. Existing speech-based anxiety recognition systems, although promising, do not make full use of voice features, which affects the identification results. Moreover, whether it is acoustic features or deep learning features, using them alone is not enough to fully express the rich emotional information in speech. In response to the aforementioned issues, this study delves into the extraction of multidimensional voice features that can accurately reflect an individual's state of anxiety, with the goal of developing an automated anxiety measurement method based on machine learning that can efficiently and accurately assess an individual's anxiety state. The main research content is as follows: Study One conducts an in-depth analysis of the correlation between the acoustic characteristics of speech and anxiety, while examining the connection between vocal features and anxiety from physiological, psychological, and neurobiological perspectives through extensive literature review, as well as the gender differences in their correlation. Through this multi-angle, multi-level comprehensive analysis, we hope to provide a richer and more nuanced perspective for understanding the acoustic manifestations of anxiety. The research results indicate that individuals with different levels of anxiety exhibit significant differences in vocal characteristics, which provide key feature support for subsequent model construction. Study Two established an anxiety recognition machine learning model based on acoustic features, using the anxiety scale scores of the sample population as the dependent variable and acoustic features as the independent variables. Initially, feature selection and dimensionality reduction techniques were utilized to identify the subset of features that contribute most to anxiety recognition. Subsequently, the study explored the impact of different regression algorithms on the accuracy of anxiety prediction and selected the optimal regression method to establish an automated recognition regression model based on acoustic features. The final experimental results indicated that, in terms of the predictive performance of the regression model, the Pearson correlation coefficient between the predicted scores and the actual scores of the anxiety scale in the overall sample reached 0.663, demonstrating a high level of predictive accuracy for the model, with a corresponding reliability index of 0.658, further validating the robustness and reliability of the model. In the female sample, the correlation coefficient was even higher at 0.708. The experimental results preliminarily verified the feasibility of establishing an automated anxiety recognition model based on acoustic features. Study Three further optimized the anxiety recognition machine learning model by integrating deep learning features, enhancing the recognition accuracy. Specifically, this study selected the advanced deep learning model ECAPA-TDNN, which is based on Time Delay Neural Networks (TDNN), to better capture the temporal information in voice signals. By integrating the deep learning features extracted by ECAPA-TDNN into our automated anxiety level recognition model, a significant improvement in model performance was achieved. In the overall sample, the Pearson correlation coefficient between the predicted scores and the actual scores of the anxiety scale reached 0.706, with a corresponding reliability index of 0.629. In the female sample group, the Pearson correlation coefficient was significantly increased to 0.759. The experimental results demonstrated that the anxiety recognition machine learning model built by integrating deep learning features with acoustic features has a good overall recognition effect. This study established an automated anxiety recognition model with multidimensional feature integration by extensively capturing acoustic features in speech and combining them with deep learning features. The Pearson correlation coefficient between the predicted values and the actual scores of the anxiety scale reached 0.706, and it was further increased to 0.759 in the female sample group, achieving significant correlation. The split-half reliability index was 0.629, indicating the model's stability. The study enhanced the practical value and accuracy of the anxiety recognition model and also provided new technical support for early intervention in the field of mental health and psychological monitoring of workers in high-risk industries.
关键词	焦虑识别声学特征机器学习深度学习 ECAPA-TDNN 模型
学位类型	继续教育硕士
语种	中文
学位名称	理学硕士
学位专业	应用心理学
学位授予单位	中国科学院大学
学位授予地点	中国科学院心理研究所
文献类型	学位论文
条目标识符	https://ir.psych.ac.cn/handle/311026/49642
专题	社会与工程心理学研究室
推荐引用方式 GB/T 7714	刘莹. 基于语音特征的焦虑识别技术研究[D]. 中国科学院心理研究所. 中国科学院大学,2024.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
刘莹-硕士学位论文.pdf（2745KB）	学位论文		开放获取	CC BY-NC-SA	请求全文