PSYCH OpenIR  > 认知与发展心理学研究室
中文阅读中词切分与识别的竞争机制
其他题名The competition mechanism in word segmentation and recognition during Chinese reading
马国杰
2015-05
摘要与大多数拼音文本相比,中文文本的特殊性之一在于词汇之间没有空格标记词边界。在这种情况下,中文读者如何进行词切分与词汇识别,是语言心理学领域的重要研究课题。这篇论文的主要贡献在于系统地探讨了该问题,并在前人研究的基础上,进一步回答了中文阅读中词切分与词汇识别的相关理论问题。
    该论文包括两部分研究。研究1包含3 个实验,系统探讨了交集歧义字段的切分与识别。交集歧义字段指的是三字交集歧义字段,譬如“学生活”,中间汉字和左右两侧汉字都能组成词汇(我们称为左侧词汇和右侧词汇)。实验 1采用部分报告法探讨是否存在左侧加工优势。实验要求被试命名交集歧义字段中间的汉字,该汉字是多音字,譬如“卫校订”,我们操纵了该字段左右词汇的频率,一个是高频词而另外一个是低频词。结果发现,中间汉字的命名偏向于高频词汇中的发音,而不管该高频词汇在左侧还是右侧。因此,实验1否定了左侧优势假设,因为右侧词汇也能优先获胜。实验2将配对的三字歧义字段嵌套在不同的句子框架下,用眼动追踪设备记录读者的句子阅读行为。三字歧义字段左侧词汇相同,右侧词汇的频率一高一低。我们发现歧义字段右侧词汇的词频影响了左侧词汇的加工。该数据表明,歧义字段两侧词汇的加工并不是相互独立的,而是相互影响的。实验3依然采用句子阅读任务。该实验不仅操纵了交集歧义字段的左右词频,而且控制了句子语境,同样的交集歧义字段在不同的语境下形成两种不同的切分:AB-C 或A-BC。结果发现,词汇竞争在早期阶段依赖于局部词频线索,高频词汇更容易竞争获胜,当依赖于局部词频的切分与句子背景不一致时,读者对歧义字段的第二遍阅读时间以及回视次数都显著高于一致条件。研究1 的系列实验支持了竞争假设,即知觉广度内所有汉字组成的词汇都会被激活,并参与词汇竞争,激活水平最高的竞争获胜,并被识别和切分出来。
研究2 进一步推广了歧义字段切分的竞争机制,探讨了是否存在跨汉字词汇的激活?跨汉字组词是中文阅读中常见的现象,譬如:在公司名字“北大方正”中,第一个汉字与第三个汉字可以组成词汇“北方”。研究2探讨了这类词汇在阅读中能否被识别的问题。该研究包括两个实验。实验4采用汉字识别任务来探讨跨汉字词汇能否被识别出来。在四个汉字ABCD 中,AB 和CD 是两个双字词,一种条件下AC 能够组成词汇,譬如“素食质点”;一种条件下不能,譬如“素食助教”。我们发现读者在AC 组词条件下,报告汉字AC 的概率显著高于AC 非词条件。该实验证明了词汇识别可以突破词边界,跨汉字词汇能够被激活并与左侧词汇形成竞争。在实验5 中,我们探讨了句子阅读中跨汉字词汇的加工,跨汉字组词和控制条件下的四字字段嵌套在同样的句子框架下。结果表明,在跨汉字组词条件下,读者在相应区域的注视时间显著增长。研究2 拓展了研究1 的发现,证明了跨汉字词汇的激活,并参与词切分与识别的竞争过程。
在讨论环节,我们讨论了词切分与词汇识别的关系,并指出了当前词切分与词汇识别模型中存在的问题,以及未来构建模型的方向。
其他摘要Compared with most alphabetic languages, one special property  of Chinese language is that there are no spaces between Chinese words. However, previous studies have shown that words have a psychological reality in Chinese reading. Thus, it is important to investigate how Chinese readers group contiguous characters into separate words. In this  dissertation, we explored  the mechanism of  word segmentation using creative paradigms.  
This thesis includes two main studies. In the first study, we explored how Chinese readers segmented  a 3-character overlapping ambiguous string where the middle character could constitute a word with both the first and third character. The first study contained  3 experiments.  In Experiment 1, subjects named the middle character, which was a polyphone. They tended to pronounce it as if it belonged to the higher-frequency word, regardless of its position (left or right). The results were inconsistent with left-priority hypothesis which supposed that only the left-hand word win the competition.  In Experiment 2, we embedded two sets of overlapping ambiguous strings with identical left-hand words (AB) but different right-hand words (BC or BD) in the same sentence frames. Fixation times were longer on AB when the right-hand word was of higher frequency. These results were not consistent with an independent processing hypothesis which proposed that the bilateral words did not influence with each other.  In Experiment 3,  each 3-character string was embedded into two sentences (that only differed after the critical 3-character strings) which constrained the overlapping ambiguous string so that it could be either segmented as AB-C or A-BC. The frequencies of the two words in the string were also manipulated such that it could be segmented as AB-C if the frequency of AB was higher than BC and as A-BC if the frequency of BC was higher than AB. Second-pass reading time was shorter and regression-in probability was lower in the ambiguous region when the segmentation fit with the sentence context than when it did not. All these results support a competition mechanism where that the characters in the perceptual span activate all of the words they can constitute, and any word (left-hand or right-hand word)  can win the competition if its activation is high enough.
In the second  study, we  extended  the competition mechanism  into  reading Chinese texts to another situation. The concrete question was whether readers could recognize a word composed of noncontiguous characters  (a cross-character word). In Experiment 4, participants were  asked to report as many characters as possible after they briefly viewed four Chinese  characters ABCD where both AB and CD were 2-character words. In the cross-character word condition, AC was a word, but in the control condition, AC was not a word. Readers were more likely to report the combination of characters A and C in the cross-character word condition than in the control condition. In Experiment  5,  we embedded the two kinds of 4-character strings into the same sentence frame to explore whether cross-character words could be  recognized in sentence reading.  Readers  spent  more time  locally  in the cross-character word condition than in the control condition. These results suggested that  the cross-character word  is activated and the activated word participates in the word competition process during Chinese reading.
Finally, we discussed the relationship between word segmentation and word recognition. We  also  raised  some critical  questions on  previous  models  of word segmentation and recognition in Chinese reading and provided potential methods to improve it based on the present study.  
学科领域基础心理学
关键词中文阅读 词切分与识别 交集歧义字段 词汇竞争 汉字位置编码
学位类型博士
语种中文
学位专业心理学
学位授予单位中国科学院研究生院
学位授予地点北京
文献类型学位论文
条目标识符http://ir.psych.ac.cn/handle/311026/19637
专题认知与发展心理学研究室
作者单位中国科学院心理研究所
推荐引用方式
GB/T 7714
马国杰. 中文阅读中词切分与识别的竞争机制[D]. 北京. 中国科学院研究生院,2015.
条目包含的文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
马国杰-博士学位论文.pdf(2226KB)学位论文 限制开放CC BY-NC-SA请求全文
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[马国杰]的文章
百度学术
百度学术中相似的文章
[马国杰]的文章
必应学术
必应学术中相似的文章
[马国杰]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。