Similar Text Matching Based on Siamese Network and Char-word Vector Combination
Author:
Affiliation:

Clc Number:

Fund Project:

  • Article
  • |
  • Figures
  • |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • |
  • Materials
  • |
  • Comments
    Abstract:

    Text similarity matching is the basis of many natural language processing tasks. This study proposes a text similarity matching method based on a Siamese network and char-word vector combination. The method adopts the idea of the Siamese network to model the overall text so that the text similarity can be determined. First, when text feature vectors are extracted, BERT and WoBERT models are used to extract character-level and word-level sentence vectors which are then combined to have richer text semantic information. If the dimension is too large during feature information fusion, the principal component analysis (PCA) algorithm is employed for the dimension reduction of high-dimensional vectors to remove the interference of redundant information and noise. Finally, the similarity matching result is obtained through the Softmax classifier. The experimental results on the LCQMC dataset show that the accuracy and F1 score of the model in this study reach 89.92% and 88.52%, respectively, which can better extract text semantic information and is more suitable for text similarity matching tasks.

    Reference
    Related
    Cited by
Get Citation

李奕霖,周艳平.基于孪生网络和字词向量结合的文本相似度匹配.计算机系统应用,2022,31(10):295-302

Copy
Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:January 21,2022
  • Revised:February 22,2022
  • Adopted:
  • Online: June 24,2022
  • Published:
You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-3
Address:4# South Fourth Street, Zhongguancun,Haidian, Beijing,Postal Code:100190
Phone:010-62661041 Fax: Email:csa (a) iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063