###
DOI:
计算机系统应用英文版:2015,24(6):183-187
本文二维码信息
码上扫一扫!
基于SVM的不良文本信息识别
(东北石油大学 计算机与信息技术学院, 大庆 163318)
Undesirable Text Recognition Based on SVM
(Institute of Computer and Information Technology, Northeast Petroleum University, Daqing 163318, China)
摘要
图/表
参考文献
相似文献
本文已被:浏览 1219次   下载 2260
Received:October 12, 2014    Revised:November 28, 2014
中文摘要: 不良文本识别的实际应用中, 大多数文本之间总有交界甚至彼此掺杂, 这种非线性不可分问题给不良文本识别带来了难度. 应用SVM通过非线性变换可以使原空间转化为某个高维空间中的线性问题, 而选择合适的核函数是SVM的关键. 由于单核无法兼顾对独立的不良词汇和词汇组合的识别, 使识别准确率不高, 而且也无法兼顾召回率. 针对不良文本识别的特定应用, 依据Mercer定理结合线性核与多项式核提出了一种新的组合核函数, 这种组合核函数能兼顾线性核与多项式核的优势, 能够实现对独立的不良词汇以及词汇组合进行识别. 在仿真实验中评估了线性核、齐次多项式核以及组合核函数, 实验结果表明组合核函数的识别准确率与召回率都比较理想.
中文关键词: SVM  组合核函数  不良文本  信息识别  召回率
Abstract:In practical application of undesirable text information identification, most of the text always have intersection even doped with each other. The nonlinear non-separable problem has brought difficulty to undesirable text information identification. SVM can make a nonlinear problem in the original space into a linear problem in high dimension space by nonlinear transformation. And the key of the SVM is to choose the appropriate kernel function. A single kernel function can not recognize the independent undesirable vocabulary and vocabulary combination at the same time, so the recognition accuracy rate is not high and the Rcall value is not ideal. For the specific application of undesirable text information identification, combining with linear kernel and homogeneous polynomial kernel it structured a new combination kernel function according to the Mercer theorem. This combination kernel function has the advantage of both linear kernel and polynomial kernel, and could identify the independent undesirable vocabulary and vocabulary combination. Then it evaluated the linear kernel, homogeneous polynomial kernel and combination kernel function in the sample experiment. The experimental results showed that the recognition accuracy rate and the Rcall value of combination kernel function was more ideal than other kernel functions.
文章编号:     中图分类号:    文献标志码:
基金项目:
引用文本:
吕洪艳,杜鹃.基于SVM的不良文本信息识别.计算机系统应用,2015,24(6):183-187
LV Hong-Yan,DU Juan.Undesirable Text Recognition Based on SVM.COMPUTER SYSTEMS APPLICATIONS,2015,24(6):183-187