PSYCH OpenIR  > 认知与发展心理学研究室
汉语语篇文本一语音对接语料库的构建及相关研究
其他题名Constructure and Study of Corpus on Text-Speech in Chinese Discourse
陈玉东
2009-05
出版地北京
报告类型最终报告
产权排序1
摘要

本文报告的工作是国家自然科学基金项目《汉语语篇中语句焦点和焦点一重音投射》研究的重要基础和主要成果。报告介绍了语料库建设的规划实施和基于语料库的相关初步研究。

语料库的建设工作首先从实施方案的制定开始,然后是语料的搜集、录制和整理等基础工作。标注体系主要包括文本标注、语音标注和语篇修辞结构三个部分。系统的培训和试标,反复讨论,进一步统一标准,修正标注规则,确保语料加工的质量。一专家的指导,标注人在标注过程中信息的不断反馈,对语料加工的内部一致性和彼此标准的统一性都发挥了重要作用。解决了不同层级标注信息的对接问题,搭建便于导入、检索和统计,并一可以无限扩充的语料库平台,建成了初具规模的、语音、文本君}语篇结构三个部分相互支撑的汉语语篇文本一语音对接语料库。

基于语料库的初步相关研究,包括两个部分,一个是关于语篇修辞结构和韵律表达的个案研究,一个是不同语体朗读中的重一音特征的研究。报告还总结了工作存在的问题,分析了进一步工作的有利条件和存在的实际困难,展望了本语料库的应用前景。

其他摘要

This work is an important base and achievement of the research project on Sentence Focus and Focus-accent Projection in Chinese Discourse from National Natural Science Foundation of China. In this paper, the scheme and implement of corpus construct and the elementary studies based on it were reported.

A suit of projects were established at first, and the collection, recording and arrangement of datum were advanced in accordance with it. The annotation system of corpus consists of three pants: text, speech and rhetoric structure of discourse. By all-around training and sample annotation, discussions and consolidated criterions, the quality of datum processing was insured. Expert's guidance and annotator's feedback play important roles in inner accordance and criterion oneness. Different layers of annotation information were jointed and integrated into the corpus, which is propitious for datum putting-in, query, statistic and unlimited expansion. A corpus composed of three parts which are text, speech and discourse structure and support each other was built.

The two elementary studies based on the corpus were reported. The one is a case study on the rhetoric stz0ucture and prosodic expression in Chinese discourse, and the other is an investigation on the characters of accents in different styles of Chinese discourse recitation. The advantaged condition and actual difficulties of next work was analyzed, and the application foreground was expected.

关键词语篇结构 焦点 重音 语音 文本
页数99
语种中文
文献类型科技报告
条目标识符http://ir.psych.ac.cn/handle/311026/29123
专题认知与发展心理学研究室
作者单位中国科学院心理研究所
推荐引用方式
GB/T 7714
陈玉东. 汉语语篇文本一语音对接语料库的构建及相关研究[R]. 北京,2009.
条目包含的文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
陈玉东-博士后研究报告.pdf(7762KB)学位论文 限制开放CC BY-NC-SA请求全文
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[陈玉东]的文章
百度学术
百度学术中相似的文章
[陈玉东]的文章
必应学术
必应学术中相似的文章
[陈玉东]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。