PSYCH OpenIR  > 社会与工程心理学研究室
基于微博分析的自杀风险预测关键技术研究
其他题名Key Methodology Research of Prediction of Suicide Risk Based on Analyses of Microblog Users
管理
学位类型硕士
导师朱廷劭
2015-06
学位授予单位中国科学院研究生院
学位授予地点北京
学位专业心理学
关键词自杀风险 新浪微博 文本分析 分类模型
摘要自杀是一个严重的社会问题和公共卫生问题,需要进行深入研究。社区范围内传统自杀评估方法主要采用问卷、心理量表等,大规模运用时需要耗费较大的成本且时效性有所欠缺,且有研究表明在中国有很多具有自杀风险的个体并不主动寻求帮助,从而现有依赖自我报告的评估和筛查方法难以找到一些隐藏的具有自杀风险者。随着人们越来越多地在虚拟集群中吐露感受和观点,微博等平台成为社会媒体用户自我表达的途径,其中也包含了与自杀有关的表达。当代学者开始关注通过社交网络平台收集用户的心理健康信息,并在群体层面上进行了一些分析,但目前针对个体层面的自杀风险分析研究较少。本研究课题通过新浪微博
平台用户的行为数据和文本数据分析,通过三个研究,达成两个科学目标:第一,验证存在对于自杀风险具有鉴别力的微博特征,对于个体层面微博特征分析自杀风险具有可能性;第二,验证通过微博行为和文本特征识别具有高自杀可能性的个体具有可行性,利用计算机模型进行初筛,协助传统研究方法,可以在一定程度上提升大规模实时评估个体自杀风险的效率。研究结果显示:
(1)微博用户行为特征中,自杀死亡组微博链接率(含有链接微博数与公开微博总数的比值)和微博互动率(平均每篇微博@其他用户的次数)低于无自杀意念对照组[ 0.04 (0.04) vs. 0.06 (0.04), P=0.029; 0.60 (0.27) vs. 0.69 (0.18),P=0.028],自我关注程度(平均每篇公开微博使用的第一人称单数次数)高于对照组[ 0.47 (0.25) vs. 0.30 (0.10), P=0.010];语言特征中,自杀死亡组在数量单位词、工作词、省略号的使用率低于对照组(均P<0.05),在代名词、特定人称代名词、第三人称单数、非特定人称代名词、社会历程词、焦虑词、排除词、性词、宗教词、第二人称单数、人类词、消极情绪词、愤怒词、悲伤词和死亡词的使用率均高于对照组(均P<0.05)。网络识别自杀死亡用户微博互动更少,更加关注自我,更频繁地使用表达排除意义的词语,从情感层面上有更多负性表达,使用更多与死亡、宗教相关而更少与工作相关的表达。
(2)微博用户自杀可能性水平与“社交活跃度”和“未来词”使用频率呈负相关(r=-0.082、-0.073,P<0.05),与“夜间活跃度”和“第三人称单数”、“否定词”使用频率呈正相关(r=0.081、0.077、0.066,P<0.05);高自杀可能组的“社交活跃度”、“集体关注度”和“未来词”使用频率低于低自杀可能组(P<0.05),高自杀可能组的“夜间活跃度”和“死亡词”使用频率大于低自杀可能组(P<0.05)。不同自杀可能性的用户在微博行为和语言表达上存在差别,高自杀可能性用户与其他用户相比社交活跃度低,夜间更加活跃,关注别人更少,使用更多表达否定、死亡的词语,并使用更少指向未来的词语。
(3)对于微博用户自杀可能性总分以及下面的4 个维度(敌意、自杀意念、负性自我评价、绝望),运用决策树、简单逻辑斯回归和随机森林三种分类器均可以实现召回70%以上的高风险标记用户;与填写量表进行筛查相比,使用分类模型进行初筛可以普遍降低1/4 至1/2 的筛查工作量。采用机器学习算法,通过微博行为和语言特征可以找到微博中具有较高自杀风险的个体,与传统筛查方法相比可以显著减小工作量。
以上研究结果进一步说明基于微博分析的自杀风险预测与传统方法相比,在数据收集的时效性和完整性,对潜在自杀风险者的识别性,以及对年轻用户的渗透性方面具有优势。将微博分析方法与专家评估结合起来,采用微博分析对具有潜在自杀风险的用户进行大规模初筛标记,送至专家进行深入评估和干预,能够有效提升自杀预防工作的广度和效率。
其他摘要Suicide is a severe social and public health issue in need of advanced research. Traditional evaluation methods of suicide risk within the community is generally dependent on surveys and psychological scales with high cost and lack of efficiency when applied to massive individuals. In addition, they are unable to locate some hidden individuals with suicide risk who are unwilling to seek professional help actively. Nowadays, since microblog has stood out as a representative social network platform for social media users to reveal feelings, researchers have begun to search for potential suicide expressions on such platforms. Nevertheless, there has been little research concerning suicide risk evaluation from the individual level in China.
This project is focused on online evaluation of suicide risk based on individual microblog data. It has achieved two scientific objectives via three studies. The 1st objective is to find effective microblog features to identify potential suicide risk; the 2nd objective is to demonstrate that individuals in China with high suicide probability are recognizable due to profile and text-based information revealed from microblog Preliminary screening of risky individuals via machine learning algorithms may work side-by-side with expert scrutiny to increase efficiency in large-scale-surveillance of suicide probability from online social media.
The results of the three studies are shown as follows:
(1) As for behavioral features, suicide group used hyperlinks and “@” less frequently than control group [0.04 (0.04) vs. 0.06 (0.04), P=0.029; 0.60 (0.27) vs. 0.69 (0.18), P=0.028], and was more self-focused [0.47 (0.25) vs. 0.30 (0.10), P=0.010]. As for linguistic features, suicide group showed less frequency than control group in using quantity unit word, work word and apostrophe (Ps<0.05), and showed more frequency in using pronoun, specific personal pronoun, third person singular, non-specific personal pronoun, social process word, anxiety word, exclusive word, sex word, religion word, second person singular, human word, negative emotion word, anger word, sadness word and death word (Ps<0.05). Suicide people seem to interact less with others, be more self-concerned, reveal more negative expressions on emotional level, use more cognitively exclusive, death-related, religion-related expressions, and use less work-related expressions.
(2) Suicide probability level was negatively correlated with social activeness and frequency of future word use (r=-0.082, -0.073, Ps<0.05), and was positively correlated with nocturnal activeness and frequency of third person singular word and negative word use (r=0.081, 0.077, 0.066, Ps<0.05); high-suicide-probability group was less social interactive, expressed themselves less in first person plural, used “future words” less frequently than low-suicide-probability group, was more active at night, and used “death words” more frequently (Ps<0.05).High-suicide-probability group was less social interactive, expressed themselves less in first person plural, used “future words” less frequently than low- suicide- probability group, was more active at night, and used “death words” more frequently.
(3) Given the best performance of the classification models (Decision Tree, Simple Logistic Regression and Random Forest), we were able to retrieve over 70% of the labeled high-risk individuals in overall suicide probability as well as in four dimensions (hostility, suicide ideation, negative self-evaluation, desperation). Screening Efficiency of most models varied from 0.25 to 0.5. The results suggest that prediction of individual suicide risk based on Weibo data has its advantage in the timeliness and completeness of data collection, the ability to identify potentially risky “hidden” individuals, and penetration of young generation. Preliminary screening of risky individuals via machine learning algorithms can work side-by-side with expert scrutiny to improve efficiency and scale in the surveillance of suicide probability from online social media.
学科领域应用心理学
语种中文
文献类型学位论文
条目标识符http://ir.psych.ac.cn/handle/311026/19516
专题社会与工程心理学研究室
作者单位中国科学院心理研究所
推荐引用方式
GB/T 7714
管理. 基于微博分析的自杀风险预测关键技术研究[D]. 北京. 中国科学院研究生院,2015.
条目包含的文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
管理-硕士学位论文1 毕业论文.pdf(2051KB)学位论文 开放获取CC BY-NC-SA请求全文
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[管理]的文章
百度学术
百度学术中相似的文章
[管理]的文章
必应学术
必应学术中相似的文章
[管理]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。