基于影响空间的初始中心点优化K-means聚类算法An Optimization K-means Clustering Algorithm of Initial Center Objects Based on Influence Space
赵文冲,蔡江辉,张继福
摘要(Abstract):
针对K-means聚类算法依赖初始点、聚类结果受初始点的选取影响较大的缺陷,给出了一种稳定的基于影响空间的初始点优化K-means聚类算法。该算法借助了影响空间数据结构和定义的加权距离吸引因子,将特殊中心点合并为K个微簇,并对微簇中的数据点加权平均得到K个初始中心点,然后执行K-means算法;最后,理论分析和实验结果表明,该初始点优化K-means聚类算法能够有效降低噪声数据对聚类结果的影响,在聚类结果、聚类过程效率方面有较大优势。
关键词(KeyWords): K-means算法;影响空间;加权距离吸引因子;初始点优化
基金项目(Foundation): 国家自然科学基金(41372349);; 山西省社会发展攻关项目(20140313023-2);; 山西省高校优秀青年学术带头人项目
作者(Author): 赵文冲,蔡江辉,张继福
参考文献(References):
- [1]孙吉贵,刘杰,赵连宇.聚类算法研究[J].软件学报,2008,19(1):48-61.
- [2]MACQUEEN J.Some methods for classification and analysis of multivariate observations[C]//Proceedings of the fifth Berkeley symposium on mathematical statistics and probability,1967,1(14):281-297.
- [3]CELEBI M E,KINGRAVI H A,VELA P A.A comparative study of efficient initialization methods for the K-means clustering algorithm[J].Expert Systems with Applications,2013,40(1):200-210.
- [4]PENA J M,LOZANO J A,LARRANAGA P.An empirical comparison of four initialization methods for theK-Means algorithm[J].Pattern recognition letters,1999,20(10):1027-1040.
- [5]HE J,LAN M,TAN C L,et al.Initialization of cluster refinement algorithms:A review and comparative study[C]//Neural Networks,2004.Proceedings.2004 IEEE International Joint Conference on.IEEE,2004,1.
- [6]FORGY E W.Cluster analysis of multivariate data:efficiency versus interpretability of classifications[J].Biometrics,1965,21:768-769.
- [7]ZHU M,WANG W,HUANG J.Improved initial cluster center selection in K-means clustering[J].Engineering Computations,2014,31(8):1661-1667.
- [8]李金宗.模式识别导论[M].北京:高等教育出版社,1994.
- [9]武霞,董增寿,孟晓燕.基于大数据平台hadoop的聚类算法K均值优化研究[J].太原科技大学学报,2015,36(2):92-96.
- [10]SATHIYA G,KAVITHA P.An Efficient Enhanced K-means Approach with Improved Initial Cluster Centers[J].Middle-East Journal of Scientific Research,2014,20(1):100-107.
- [11]YANG S Z,LUO S W.A novel algorithm for initializing clustering centers[C]//Machine Learning and Cybernetics,2005.Proceedings of 2005 International Conference on.IEEE,2005(9):5579-5583.
- [12]BIANCHI F M,LIVI L,RIZZI A.Two density-based K-means initialization algorithms for non-metric data clustering[J].Pattern Analysis and Applications,2014:1-19.
- [13]BREUNING M M,KRIEGEL H P,Ng R T,et al.LOF:identifying density-based local outliers[C].ACM Sigmod Record.ACM,2000,29(2):93-104.
- [14]JIN W,TUNG A K H,HAN J,et al.Ranking outliers using symmetric neighborhood relationship[J].Advances in Knowledge Discovery and Data Mining.Springer Berlin Heidelberg,2006:577-593.
- [15]JARVIS R A,PATRICK E A.Clustering using a similarity measure based on shared near neighbors[J].Computers,IEEE Transactions on,1973,100(11):1025-1034.
- [16]MODHA D S,SPANGLER W S.Feature weighting in K-means clustering[J].Machine learning,2003,52(3):217-237.