针对以分散形式存储学科信息导致资源难以统计的问题, 基于计算机学科领域本体模型, 融合多源异质的学科数据构建高校计算机学科知识图谱. 首先通过网络爬虫等技术从相关网站和已有文档中获取领域知识, 并基于BERT模型对数据进行清洗; 然后利用Word2Vec判断人物研究方向之间的相似度, 解决实体对齐问题; 最终将数据导入Neo4j图数据库中实现知识的存储. 根据构建好的知识图谱建立计算机学科可视化系统, 能够提供信息检索与图形显示等多种功能, 实现计算机学科基础数据的快捷查询和资源统计, 以期促进后续的学科评估工作更加高效地完成.
It is difficult to count the discipline information stored in a scattered form. With regard to this problem, based on the domain ontology model of computer discipline, the computer discipline knowledge graph in universities is constructed by integrating the multi-source and heterogeneous data. First, domain knowledge is acquired from relevant websites and existing documents through Web crawlers and other tools, and the data are cleaned on the basis of the BERT model. Then, Word2Vec is used to judge the similarity between the research directions of characters, so as to solve the problem about entity alignment. Finally, the data are imported into the Neo4j graph database to realize the storage of knowledge. According to the knowledge graph, the visualization system of computer discipline is established, which can fulfil information retrieval, graphic display, and other functions and realize quick query and resource statistics of computer discipline data. It is expected to facilitate the follow-up discipline evaluation work and make it more efficient.