基于重要自主性的主题数目预测方法

Importance-autonomy based method for predicting the number of topics

  • 摘要: 现有的主题数目预测方法主要关注主题指标能否度量主题模型的泛化能力及主题的解释性,缺少对真实主题数目所在范围以及指标稳定性的分析. 在复杂场景下,主题数目预测结果容易陷入局部最优或在重复实验中得到的结果差异较大. 为此,本文从主题重要性和主题自主性两种概念出发,提出了重要自主性(importance-autonomy, IA)评价指标,并基于IA指标设计了一种确定主题数目上界的方法. 然后,将主题数目上界引入隐含狄利克雷分布(latent Dirichlet allocation, LDA)模型中,提出一种高效且有效的主题数目预测方法. 在生成数据集和公开数据集上的实验结果表明,所提出方法的准确性、稳定性及效率均优于所有对比方法.

     

    Abstract: The existing methods for predicting the number of topics mainly focus on whether topic indicators can measure the generalization ability of topic models and the interpretability of topics, but lack of the analysis on the range of the actual number of topics and the stability of indicators. In complex scenes, the prediction results of the number of topics will fall into local optimization or the results obtained from repeated experiments show significant differences. To address these issues, the Importance-Autonomy (IA) indicator is proposed based on the importance and autonomy of topics. Then, the IA indicator is introduced to design a method for determining the upper bound of topic number. By incorporating the upper bound of topic number into the Latent Dirichlet Allocation (LDA) model, an efficient and effective method is proposed to predict the number of topics. Experimental results on synthetic and real datasets demonstrate that our proposed method outperforms all comparison methods on accuracy, stability and efficiency.

     

/

返回文章
返回