###
计算机系统应用英文版:2023,32(3):300-308
本文二维码信息
码上扫一扫!
数字人文环境下融入多特征的词命名实体识别
(1.中北大学 软件学院, 太原 030051;2.北京语言大学 语言智能研究院, 北京 100083)
Named Entity Recognition of Poetry by Integrating Multi-features in Digital Humanities
(1.School of Software, North University of China, Taiyuan 030051, China;2.Institute of Language Intelligence, Beijing Language and Culture University, Beijing 100083, China)
摘要
图/表
参考文献
相似文献
本文已被:浏览 541次   下载 922
Received:August 17, 2022    Revised:September 15, 2022
中文摘要: 近年来, 数字人文受到广泛关注, 数字人文环境下的词命名实体识别研究日渐兴起, 但鲜有研究从字特征的特征表示能力、分词的准确性、领域知识的有效性等方面进行探究. 鉴于此, 针对汉字的象形文字特点和词文本的特殊性, 在字特征的基础上, 引入部首特征、格律特征和声韵特征, 提出特征增强单元和特征抽取单元, 并将词牌知识三元组通过ANALOGY得到的知识向量表示为词牌知识向量, 通过双向长短时记忆网络、注意力机制等模型将部首向量、字向量、格律向量、声韵向量、词牌知识向量进行深度融合, 最终构建出融入多特征的词命名实体识别方法. 在《花间集全译》自制语料上的对比实验和消融实验的结果表明, 本文所提方法能够有效利用多特征提升词命名实体识别性能. 其F1值达到了85.63%, 完成了词命名实体识别任务.
中文关键词: 命名实体识别  多特征  格律  数字人文  诗词
Abstract:In recent years, research on the named entity recognition of poetry in digital humanities is emerging, but few studies have been conducted with regard to the feature expressiveness of character features, word segmentation accuracy, and the effectiveness of domain-specific knowledge in poetry texts. According to the characteristics of Chinese pictographs and the particularity of poetry texts, a recognition method of named poetry entities with a feature enhancement unit and a feature extraction unit is proposed, which integrates multiple features such as characters, radicals, sounds, and metrical rules. The method presents the knowledge vectors obtained from the knowledge triples of tune pattern titles through the ANALOGY model as the knowledge vectors of tune pattern titles. Then, the radical vector, character vector, metrical rule vector, sound vector, and knowledge vector of tune pattern titles are deeply fused through the bidirectional long short-term memory network and attention mechanism models. In this way, the recognition method of named poetry entities fusing multi-features is constructed. The results of comparative experiments and ablation experiments on the self-made corpus of Translation of Among Flowers (Hua Jian Ji) (《花间集全译》) show that the proposed method can effectively use multi-features to improve the recognition performance of named entities, and its F1 score reaches 85.63%, which means it completes the recognition task of named poetry entities.
文章编号:     中图分类号:    文献标志码:
基金项目:教育部哲学社会科学研究后期项目(21JHQ081)
引用文本:
张朦,刘忠宝.数字人文环境下融入多特征的词命名实体识别.计算机系统应用,2023,32(3):300-308
ZHANG Meng,LIU Zhong-Bao.Named Entity Recognition of Poetry by Integrating Multi-features in Digital Humanities.COMPUTER SYSTEMS APPLICATIONS,2023,32(3):300-308