Abstract:
The existing methods for predicting the number of topics mainly focus on whether topic indicators can measure the generalization ability of topic models and the interpretability of topics, but lack of the analysis on the range of the actual number of topics and the stability of indicators. In complex scenes, the prediction results of the number of topics will fall into local optimization or the results obtained from repeated experiments show significant differences. To address these issues, the Importance-Autonomy (IA) indicator is proposed based on the importance and autonomy of topics. Then, the IA indicator is introduced to design a method for determining the upper bound of topic number. By incorporating the upper bound of topic number into the Latent Dirichlet Allocation (LDA) model, an efficient and effective method is proposed to predict the number of topics. Experimental results on synthetic and real datasets demonstrate that our proposed method outperforms all comparison methods on accuracy, stability and efficiency.