###
计算机系统应用英文版:2019,28(6):235-242
本文二维码信息
码上扫一扫!
基于改进K-medoids的聚类质量评价指标研究
(1.广东松山职业技术学院 电气工程系, 韶关 512126;2.广东松山职业技术学院 计算机系, 韶关 512126)
Cluster Quality Evaluation Index Based on K-medoids Algorithm
(1.Department of Electrical Engineering, Guangdong Songshan Polytechnic, Shaoguan 512126, China;2.Department of Computer Science, Guangdong Songshan Polytechnic, Shaoguan 512126, China)
摘要
图/表
参考文献
相似文献
本文已被:浏览 1520次   下载 1884
Received:December 19, 2018    Revised:January 10, 2019
中文摘要: 为了更好地评价无监督聚类算法的聚类质量,解决因簇中心重叠而导致的聚类评价结果失效等问题,对常用聚类评价指标进行了分析,提出一个新的内部评价指标,将簇间邻近边界点的最小距离平方和与簇内样本个数的乘积作为整个样本集的分离度,平衡了簇间分离度与簇内紧致度的关系;提出一种新的密度计算方法,将样本集与各样本的平均距离比值较大的对象作为高密度点,使用最大乘积法选取相对分散且具有较高密度的数据对象作为初始聚类中心,增强了K-medoids算法初始中心点的代表性和算法的稳定性,在此基础上,结合新提出的内部评价指标设计了聚类质量评价模型,在UCI和KDD CUP 99数据集上的实验结果表明,新模型能够对无先验知识样本进行有效聚类和合理评价,能够给出最优聚类数目或最优聚类范围.
Abstract:In order to better evaluate the clustering quality of unsupervised clustering algorithm and solve the problem of invalidation of clustering evaluation results caused by overlapping cluster centers, the commonly used cluster evaluation index is analyzed and a new internal evaluation index is proposed, the product of the minimum square of the distance between the adjacent boundary points and the number of samples in the cluster is taken as the separation degree of the whole sample set, the relation between the degree of separation between clusters and the degree of compactness within clusters is balanced; a new density calculation method is proposed, which takes the object with a larger average distance ratio between the sample set and each sample as a high-density point, and uses the maximum product method to select the relatively dispersed data object with a higher density as the initial cluster center, thus enhancing the representativeness of the initial center of K-medoids algorithm and the stability of the algorithm. On this basis, the cluster quality evaluation model is designed with the newly proposed internal evaluation index. The experimental results on UCI and KDD CUP 99 data sets show that the new model can effectively cluster and reasonably evaluate non-prior knowledge samples, and can give the optimal number or range of clustering.
文章编号:     中图分类号:    文献标志码:
基金项目:韶关市科技计划项目(2017CX/K055);广东松山职业技术学院重点科技项目(2018KJZD001)
引用文本:
邹臣嵩,段桂芹.基于改进K-medoids的聚类质量评价指标研究.计算机系统应用,2019,28(6):235-242
ZOU Chen-Song,DUAN Gui-Qin.Cluster Quality Evaluation Index Based on K-medoids Algorithm.COMPUTER SYSTEMS APPLICATIONS,2019,28(6):235-242