Similar Text Matching Based on Siamese Network and Char-word Vector Combination

doi:10.15888/j.cnki.csa.008756

WeChat

Mobile website

Home > Archive>Volume 31, Issue 10, 2022 >295-302. DOI:10.15888/j.cnki.csa.008756

PDF HTML XML Export Cite reminder

Similar Text Matching Based on Siamese Network and Char-word Vector Combination
DOI:
                        10.15888/j.cnki.csa.008756
                    
Author:
                        
                        
                    
Affiliation:
Clc Number:
Fund Project:

Article

Figures

Metrics

Reference

Cited by

Materials

Comments

Abstract:

Text similarity matching is the basis of many natural language processing tasks. This study proposes a text similarity matching method based on a Siamese network and char-word vector combination. The method adopts the idea of the Siamese network to model the overall text so that the text similarity can be determined. First, when text feature vectors are extracted, BERT and WoBERT models are used to extract character-level and word-level sentence vectors which are then combined to have richer text semantic information. If the dimension is too large during feature information fusion, the principal component analysis (PCA) algorithm is employed for the dimension reduction of high-dimensional vectors to remove the interference of redundant information and noise. Finally, the similarity matching result is obtained through the Softmax classifier. The experimental results on the LCQMC dataset show that the accuracy and F1 score of the model in this study reach 89.92% and 88.52%, respectively, which can better extract text semantic information and is more suitable for text similarity matching tasks.

Reference

Cited by

Get Citation

李奕霖,周艳平.基于孪生网络和字词向量结合的文本相似度匹配.计算机系统应用,2022,31(10):295-302

Copy

Article Metrics

Abstract:
PDF:
HTML:
Cited by:

History

Received:January 21,2022
Revised:February 22,2022
Adopted:
Online: June 24,2022
Published:

You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-3
Address：4# South Fourth Street, Zhongguancun,Haidian, Beijing,Postal Code：100190
Phone：010-62661041 Fax： Email：csa (a) iscas.ac.cn
Technical Support：Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063

WeChat

Mobile website

Get Citation

Share

Article Metrics

History