PSYCH OpenIR  > 社会与工程心理学研究室
自然情境语音识别抑郁症的研究
Alternative TitleDepression Recognition with Audios Collected under Natural Enviroment
隋小芸
Subtype硕士
Thesis Advisor朱廷劭
2017-05
Degree Grantor中国科学院大学
Place of Conferral中国科学院心理研究所
Degree Name理学硕士
Degree Discipline健康心理学
Keyword语音识别 抑郁症 自然情境 年龄 地域 语音时长
Abstract

当前抑郁症发病率呈逐年上升趋势,传统的诊断方式易受主观因素影响,且需要病人主动配合,正确率一直不高。将患者的语音声学特征与机器学习算法结合,建立自动化的抑郁识别模型,可以不受限于语音的具体内容,有利于抑郁的早期诊断和干预。现有研究大多是在实验室安静环境下采集语音,并且通过特定的访谈主题来激发被试的情绪反应,这些都增加了实验设计的复杂性和抑郁识别的成本;另一方面,由于抑郁患者语音采集的困难性,以往的所有研究都存在着样本量较小的问题,这使得研究者无法针对样本中可能存在的共变因素如年龄、地域、语音时长等进行分析,同时小样本集上建立的诊断方法在更大范围内的适用性也存在问题。目前国内在语音识别抑郁症方面的研究尚处于起步阶段,还需要开展更深入的研究。

针对以往研究的这些问题:样本量小非真实环境、缺少共变分析,本文确立了研究的方向,实验主要围绕两个主题进行:

(一)使用自然情境下采集的汉语语音材料,对抑郁患者和正常入进行识别。首先对抑郁组和健康组被试的语音特征进行差异检验,发现了一些与以往研究相吻合的证据,如基频、声响、vTFCc等特征的差异变化。其次对两组样本做分类预测,在7道人口学问题的样本集上均得到超过60%,个别题目达到70%的分类准确率,这个结果证明了自然情境下采集的语音可以用来识别抑郁症。

(二)针对被试年龄、地域,样本时长、样本量的共变因素分析。首先进行了两个分类实验,证实了年龄和地域对语音的确存在影响。接着在抑郁识别的实验中,按年龄、地域或二者组合重新划分样本集,将数据划分为同质程度更高的若干子集,在各子集内分别进行抑郁识别的预测。在一系列的实验中我们都观察到了南方被试的抑郁识别效果好于北方被试,年青(30-44岁)被试的抑郁识别效果好于中老年(45-60岁),这可能与南北方言的差异或不同人群嗓音质量的差异有关。其中江苏省的样本得到了最好的分类成绩:74.83%.针对样本量和样本时长的影响,本文也开展了一系列实验。从实验结果可以看出样本量在一定条件下‘与分类准确率呈正相关,而较长的语音样本能够携带更丰富的语音特征,也有助于提高分类准确率。

在技术实现上,本文所有分类实验均基于判决融合的机器学习模型,该模型以SVM算法为主要分类算法,利用朴素贝叶斯分类器融合12个分类器的结果,判决输出最终的预测结果。12个分类器分别采用不同的特征选择算法,在特征的筛选上各有侧重,判决融合则能够平衡12个分类器的结果,得到较为稳定且处于较优水平的分类结果。

与以往研究使用的完整被试录音不同,本文的语音数据剪切自访谈对话,在一定程度上破坏了语音的连续性,因此本文没有使用语速、停顿时长等特征,对背景噪音的处理也没有得到预期的结果。在下‘一步的研究中,除了需要保证语音连续和改进降噪处理之外,采集样本时还应当注意平衡不同年龄、地域、性别的被试数量,以减少共变因素的影响;语音时长不宜过短,至少5秒以上;同时将语音特征与文本、肢体动作、表情等其他特征融合起来,也有望提高抑郁识别的精度。

Other Abstract

The incidence of depression in modern life showed an increasing trend year by year, while the traditional methods of diagnosis susceptible to subjective factors, and patients' cooperation is necessary as well. One automated model of depression recognition could be set up by employing the patients’ speech acoustics features on machine learning algorithms. The sample speech is not restricted with text content and such method could be deployed in primary healthy institutions quickly or executed by the patients themselves with certain self-service device. It could help a lot in the early diagnosis and intervention of depression. Previous studies are based on audios collected under laboratory environment, some of them even designed deliberate interviews to stimulate certain emotions of the subjects. All such design add the complexity of experiment and the cost of depression recognition. On the other hand, due to the difficulty of collecting depressive data, all previous studies were based on smaller sample size which restrict the researchers to analyze more on those covariant factors like age, area, duration, also the adaptability of those researches. Domestic studies on depression research with audios are still on initial phase, and to be improved.

Considering the drawbacks: small sample size, unreal environment and less of covariant analysis, we proceed the study surrounding with 2 main topics:

I. We use the Chinese phonetic material collected under the natural environment to study the speech recognition between the depressive and the health.

When comparing the audio features between the depressive and the healthy, we found some evident which ar a consistent with previous studies like the difference of F0, loudness, and MFCC.Then in the binary classification tests, more than 600% accuracy on all 7 demographic questions, 70% on some questions were achieved. That confirms it's feasible to recognize the depressive with audios collected under natural environment.

II. The audios come from more than 1600 subjects which makes possible to analyses covariant.

Firstly the correlation between age and area of subjects and their audio features were prowled with 2 experiments. Then we reorganized the samples by splitting them with age, area or both, in order to get serval sub-datasets with higher homogeneity. The classification was done in each split. It's also founded that the classification result is better in those subjects from the south of China than those from the north. The younger (30-44) subjects have better classification results than the elder (45-60).The 2 founding may be caused by the phonetic difference of south and north dialect, or the physical difference of voice duality among different groups. The best accuracy is 74.83% on samples of Jiangsu province.

A series of experiments were also carried on to analyse the affect of sample size and duration. The sample size is correlated with classification results in some cases. However, longer audios with more features could also improve the classification.

All classifications in our study are based decision-fused machine learning model, which run AVM algorithm for classification. One naive Bayer classifier fuses the classification results of 12 classifiers, then output the final predict result. The 12 classifiers run with different feature reduction algorithms which act differently on feature selection. While the decision fusion could balance them and make sure to output one more stable result on top-middle level.

Unlike the full audio in past studies, the audio clips of our study were extracted from interview recordings, and it breaks the audio integrity to certain degree. Consequently, the features of speech speed and pause duration were not used in our experiments. The prediction result didn’t improve after one try of denoising. In further studies, besides audio integrity and denoising, it also needs to be noticed that the sample size on different age, area, gender should be balanced, so that the effect of covariant is minimal. The duration of audio samples shouldn't be too short. It would be better to be 5 seconds at least .Also it's promising to improve recognition accuracy by fusing audio features with text,body gestures, and face features together.

Pages84
Language中文
Document Type学位论文
Identifierhttp://ir.psych.ac.cn/handle/311026/28666
Collection社会与工程心理学研究室
Recommended Citation
GB/T 7714
隋小芸. 自然情境语音识别抑郁症的研究[D]. 中国科学院心理研究所. 中国科学院大学,2017.
Files in This Item:
File Name/Size DocType Version Access License
隋小芸-硕士学位论文.pdf(9725KB)学位论文 限制开放CC BY-NC-SAApplication Full Text
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[隋小芸]'s Articles
Baidu academic
Similar articles in Baidu academic
[隋小芸]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[隋小芸]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.