深度嵌入子空间聚类网络

Deep embedding subspace clustering network

  • 摘要: 传统的子空间聚类算法通过学习自表达系数矩阵体现数据间的相似性,然而该策略无法有效应对大规模数据集与样本外点问题,因此提出一种深度嵌入子空间聚类网络模型. 该模型首先利用自编码器获得原始数据的潜在表示;然后通过预先定义的函数计算潜在表示之间的相似程度并以此为依据构建自表达系数矩阵;最后使用谱聚类算法获得聚类结果. 所提出的模型避免直接学习数据间的自表达系数,而选择通过映射函数获得数据的潜在表示与相似度,具有更广泛的适用领域. 在4个公开大规模数据集上开展实验,其结果表明所提模型在准确率与调整兰德系数两个评价指标上均取得最佳效果,并在标准化互信息上获得了1.25的平均排名. 参数敏感性实验和泛化性实验进一步验证所提模型具有较强的鲁棒性和样本外点处理能力.

     

    Abstract: Traditional subspace clustering algorithms express the similarity between data by learning the self-expression matrix. However, this strategy fails to deal with large-scale datasets and out-of-sample problems. A deep embedding subspace clustering network is proposed to address these limitations in this paper. Firstly, the autoencoder is employed to obtain the latent representation of the original data. Then, the similarity between latent representations is calculated by a predefined function. Thereby, the self-expression matrix is constructed correspondingly. Finally, the spectral clustering algorithm is utilized to obtain the clustering results. The proposed model avoids directly learning the self-expression coefficients between the data and instead chooses to obtain the latent representation and inherent similarity of the data through the mapping function, which has a wider field of application. Experimental results on four publicly available large-scale datasets demonstrate that the proposed model achieves state-of-the-art results in both accuracy and adjusted rand index, with an average ranking of 1.25 in normalized mutual information. Parameter sensitivity and generalization experiments further validate robustness and ability to handle out-of-sample problems.

     

/

返回文章
返回