基于机器学习算法鉴别轻度认知障碍

PSYCH OpenIR > 健康与遗传心理学研究室

	基于机器学习算法鉴别轻度认知障碍
其他题名	Identification of mild cognitive impairment based on machine learning algorithm
	郭书含
导师	韩布新
	2020-07
摘要	我国是人口老龄化国家，阿尔兹海默症（AD）因疗效差已成为继心血管疾病、癌症、脑卒中后对老年人伤害的“第四大危害”。但AD 患者护理费用为上述三种疾病之和，给患者及家庭造成重大困扰。MCI 作为AD 的前驱阶段，可及时鉴别并有效干预，以降低AD 发病率。现有鉴别MCI 的研究有两方面局限性：1. 选用的数据方面，神经心理测验不能有效测查患者日常生活情况，敏感性和特异性也十分有限；MRI、生物标志物数据获取难度大，且需配备专业人员和器材，耗费大量人力物力。2. 选用的方法，传统分析方法无法深入挖掘数据间关系，受自身性能影响，无法有效处理不平衡样本、对缺失敏感等问题。机器学习算法已在多种疾病的早期识别方面取得良好的应用，如果机器学习算法可以利用老年人的神经心理和日常生活习惯测查数据，敏感有效的识别MCI，这样就可以不受限于专业器材、人力物力等限制，从而方便MCI 早期筛查和干预，有显著的经济和社会价值。对此，本论文进行了两个方面的研究，通过CiteSpace 分析，总结可以相对有效分类MCI 的机器学习算法；利用这些算法，考察通过分析生活习惯数据从而在社区老年人中筛查MCI 的可行性。具体研究方法和结果如下：研究一：采用CiteSpace 5.5 作为辅助工具，对现有符合本文纳入标准的538篇文献，得到在识别MCI 方面准确率较高的三个模型，分别是支持向量机（SVM）、随机森林（RF）、人工神经网络（NN）。Citespace 关键词图谱显示，基于机器学习鉴别MCI 的研究内容聚焦于fMRI 和生物标志物。研究二：基于研究一得出结果，使用RF、NN 两种模型对2011 年收集的1033份调查问卷（MCI 老人91 人，NC 老人942 人）进行分析，RF 模型acc 为0.92，SVM 模型acc 为0.84，NN 模型acc 为0.90。但SVM 模型auc 为0.5，表明在分类中无预测能力，因此在后续研究中去除该算法。绘制ROC 曲线，比较RF 模型分类结果与Logistics 回归分类结果，两者差异显著（p<0.01）。为单独检验生活方式数据预测MCI 准确性，将2019 年收集的2151 份问卷（MCI 老人374 人，NC 老人1777 人）中生活方式部分共41 个题项提取出来进行差异检验，其中25项呈显著差异。生活方式数据对MCI 具备一定预测能力。将2019 年生活习惯数据纳入RF、NN 两种模型进行分析，RF 模型acc 为0.78，NN 模型acc 为0.83，其中NN 模型效果优于RF 模型，可见两种模型对MCI 具备一定分类能力。综上，本论文通过Citespace 筛选出在MCI 识别方面有效的机器学习算法，并证实了利用这些算法通过分析社区老人的生活习惯数据来筛查MCI 的可行性，为在社区大规模老人群体中经济性的早期筛查MCI 提供了一个解决途径。
其他摘要	China is an aging country. Alzheimer's disease has become the "fourth major harm" to the elderly after cardiovascular disease, cancer and stroke due to poor curative effect. But the cost of nursing care for AD patients is the sum of the above three diseases, which causes great trouble to patients and their families. As the precursor stage of AD, MCI can identify and intervene effectively in order to reduce the incidence rate of AD. There are two limitations in the existing research on MCI identification: 1. In terms of data selection, neuropsychological tests can not effectively test the daily life of patients, and the sensitivity and specificity are also very limited; MRI and biomarker data acquisition is difficult, and it needs to be equipped with professionals and equipment, which consumes a lot of manpower and material resources. 2. The traditional analysis methods can not deeply mine the relationship between data, and can not effectively deal with unbalanced samples, sensitive to missing and other issues due to its own performance. Machine learning algorithm has achieved good application in the early recognition of various diseases. If the machine learning algorithm can use the data of the elderly's neuropsychology and daily life habits to identify MCI sensitively and effectively, it will not be limited by the restrictions of professional equipment, manpower and material resources, so as to facilitate the early screening and intervention of MCI, and has significant economic and social value. In this regard, this paper studies two aspects. 1. Through CiteSpace analysis, we summarize the machine learning algorithms which can classify MCI effectively. 2. Using these algorithms, we investigate the feasibility of screening MCI among the elderly in the community by analyzing the life habit data. The specific research methods and results are as follows: Study 1: CiteSpace 5.5 is used as an auxiliary tool to obtain three models with high accuracy in identifying MCI from 538 literatures that meet the inclusion criteria of this paper, which are support vector machine (SVM), random forest (RF) and artificial neural network (NN). CiteSpace keyword map showed that the research content of MCI identification based on machine learning focused on fMRI and biomarkers. Study 2: Based on the results of study 1, 1033 questionnaires (91 elderly MCI and 942 elderly NC) collected in 2011 were analyzed using RF and NN models. The ACC of RF model was 0.92, that of SVM model was 0.84, and that of NN model was 0.90. But the AUC of SVM model is 0.5, which indicates that it has no prediction ability in classification, so the algorithm is removed in the follow-up study. ROC curve was drawn to compare the classification results of RF model and logistic regression. The difference was significant (p < 0.01). In order to independently test the accuracy of life style data in predicting MCI, 41 items in lifestyle part of 2151 questionnaires (374 elderly MCI and 1777 elderly NC) collected in 2019 were extracted for difference test, among which 25 items showed significant difference. Life style data can predict MCI. The life habits data in 2019 were included into RF and NN models for analysis. The ACC of RF model was 0.78, and that of NN model was 0.83. The effect of NN model was better than that of RF model. It can be seen that the two models have certain classification ability for MCI. To sum up, this paper screened out the effective machine learning algorithms in MCI recognition through CiteSpace, and confirmed the feasibility of using these algorithms to screen MCI by analyzing the living habits data of the elderly in the community, which provides a solution for the economic early screening of MCI in large-scale community elderly groups.
关键词	轻度认知障碍早期诊断生活习惯机器学习 Citespace
学位类型	硕士
语种	中文
学位名称	理学硕士
学位专业	发展与教育心理学
学位授予单位	中国科学院心理研究所
学位授予地点	中国科学院心理研究所
文献类型	学位论文
条目标识符	https://ir.psych.ac.cn/handle/311026/33907
专题	健康与遗传心理学研究室
推荐引用方式 GB/T 7714	郭书含. 基于机器学习算法鉴别轻度认知障碍[D]. 中国科学院心理研究所. 中国科学院心理研究所,2020.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
郭书含-硕士学位论文.pdf（2090KB）	学位论文		开放获取	CC BY-NC-SA	请求全文