###
计算机系统应用英文版:2022,31(1):47-54
本文二维码信息
码上扫一扫!
简要案情的命名实体识别技术
(1.湘潭大学 计算机学院·网络空间安全学院, 湘潭 411105;2.湖南警察学院 信息技术系, 长沙 410138)
Named Entity Recognition Technology for Brief Case
(1.School of Computer Science and School of Cyberspace Security, Xiangtan University, Xiangtan 411105, China;2.Department of Information Technology, Hunan Police College, Changsha 410138, China)
摘要
图/表
参考文献
相似文献
本文已被:浏览 655次   下载 1180
Received:March 24, 2021    Revised:April 21, 2021
中文摘要: 简要案情是公安机关为提高“协同办案系统”录入信息质量,确保信息检索与案件串并工作高效开展而对案情记载的简要描述,其中各类实体间包含了大量与受害者和作案人相关的案情信息.因此,对简要案情文本的深度挖掘是掌握案件始末和分析案情的有效手段之一.简要案情文本中的实体稠密分布、实体间相互嵌套以及实体简称,给准确捕捉案件实体带来了巨大的挑战.针对简要案情文本的特殊性和复杂性,本文对字符向量生成的方法进行了改进,提出了RC-BiLSTM-CRF (Roberta-CNN-BiLSTM-CRF)网络架构,相比于主流的“Bert-BiLSTM-CRF”架构,该架构可以对字符向量特征进行提取,解决了通过预训练模型带来的字符向量冗长的问题,通过减少模型的参数量进而提高了模型整体参数的收敛速度.对比实验选用5种主流的架构在湖南省省公安机关提供的简要案情数据集上进行比较,本文提出的方法在准确率、召回率和F1值上均为最优,F1值达到了88.02%.
Abstract:A brief case is a brief description of a case record made by a public security organ to improve the quality of information input in the Collaborative Case Handling System and ensure efficient information retrieval and joint investigation. A large amount of case information related to the victim and the perpetrator is between various entities. Therefore, in-depth excavation of brief case texts is an effective means to grasp the beginning and end of a case and to analyze the case. The dense distribution, inter-nesting, and abbreviation of entities in a brief case text bring great challenges to the accurate capture of the case entities. In response to the particularity and complexity of brief case texts, this study improves the method of character vector generation and proposes a Roberta-CNN-BiLSTM-CRF (RC-BiLSTM-CRF) network architecture. Compared with the mainstream Bert-BiLSTM-CRF architecture, this architecture can extract the character vector features, thereby solving the problem of a lengthy character vector brought by model pre-training. The model parameter number is reduced for a higher overall parameter convergence rate. In the comparative experiment, five mainstream architectures are selected and compared on the brief case dataset provided by the public security organs of Hunan Province. The method proposed in this study is proved to be the best in terms of accuracy, recall rate, and F1 value, and its F1 value reaches 88.02%.
文章编号:     中图分类号:    文献标志码:
基金项目:湖南省自然科学基金(2018JJ2107);湖南省科技重大专项(2017SK1040);湖南省公安厅科技计划(2018No.3)
引用文本:
陈柱辉,刘新,张明键,张达为.简要案情的命名实体识别技术.计算机系统应用,2022,31(1):47-54
CHEN Zhu-Hui,LIU Xin,ZHANG Ming-Jian,ZHANG Da-Wei.Named Entity Recognition Technology for Brief Case.COMPUTER SYSTEMS APPLICATIONS,2022,31(1):47-54