系统工程与电子技术

• 软件、算法与仿真 • 上一篇    下一篇

基于约束正则化的生成聚类分析

於跃成1,2, 生佳根3, 邹晓华1   

  1. 1. 江苏科技大学计算机科学与工程学院, 江苏 镇江 212003; 2. 中国民航大学中国民航信息技术科研基地, 天津 300300; 3. 江苏科技大学继续教育学院, 江苏 镇江 212003
  • 出版日期:2014-04-24 发布日期:2010-01-03

Generative clustering analysis with constraints regularization

YU Yue-cheng1,2, SHENG Jia-gen3, ZOU Xiao-hua1   

  1. 1. College of Computer Science and Engineering, Jiangsu University of Science and Technology,  Zhenjiang 212003, China; 2. Information Technology Research Base of Civil Aviation Administration of China, Civil Aviation University of China, Tianjin 300300, China; 3. College of Further Education, Jiangsu University of Science and Technology, Zhenjiang 212003, China
  • Online:2014-04-24 Published:2010-01-03

摘要: 基于现有的硬约束高斯混合模型不能处理约束违反情形,而软约束高斯混合模型又没有封闭的参数估计表达式,提出了一种基于约束正则化的生成聚类方法。该方法将约束一致正则化算子引入高斯混合模型,通过惩罚似然来处理约束违反,使满足正约束的成对样本的后验概率尽可能相似,满足负约束的成对样本的后验概率尽可能不相似;同时封闭的参数估计迭代公式降低了参数估计的计算复杂度。在一组真实数据集上的实验表明,与现有的相关方法相比,该方法能有效改善聚类性能,并对噪音约束有着更好的适应性。

Abstract: Most existing Gaussian mixture model (GMM) with hard equivalence constraints cannot solve the problem of violating pairwise constraints, and the GMM with soft equivalence constraints lacks of closed estimation forms of model parameters. This paper presents a generative clustering analysis algorithm with constraints regularization (GCACR), in which the constraints consistent assumption is integrating into GMM. To penalize the constraint violation, the penalized likelihood function is designed. This makes the posterior probability of the pairwise data points similar or dissimilar according to the data points coming from positive constraints or negative constraints. Meanwhile, the computational complexity of model parameter estimation is reduced for the closed estimation forms of model parameters being provided. Experimental results on realworld datasets show that the proposed algorithm can improve the clustering performance, and can better adapt to dealing with the noise pairwise constraints compared with the stateoftheart generative clustering algorithms.