基于混合特征和XGBoost算法的微博转发预测

张林森; 包崇明; 周丽华; 孔兵

doi:10.7540/j.ynu.20190647

基于混合特征和XGBoost算法的微博转发预测

Prediction forwarding of Weibo based on hybrid features and XGBoost algorithm

摘要

摘要: 微博转发是微博信息传播的重要途经. 影响微博转发的因素主要是用户属性、微博内容、用户社交和用户兴趣. 已有的微博转发预测模型仅考虑部分因素，实际上4个因素对用户转发行为都有影响，此外也应该关注预测模型的计算时间. 基于此，提出一种基于混合特征和XGBoost算法的微博转发预测模型. 首先根据4个因素分别提取用户特征、微博特征、社交特征以及兴趣特征；然后基于PageRank算法计算用户影响力，基于隐含狄利克雷分布（Latent Dirichlet Allocation，LDA）模型和KL距离计算兴趣相似度，定义用户转发活跃度和用户交互影响力的计算公式；最后利用XGBoost算法构建预测模型，对转发预测进行分析. 实验结果表明，新的预测方法在准确率和时间等评价指标上有较好的表现，同时也验证了综合考虑4个因素的重要性和有效性.

Abstract: Weibo forwarding is an important way to spread information on Weibo. The factors affecting the forwarding of Weibo are mainly user attributes, Weibo content, user social and user interests. The existing forecasting model only considers some factors. In fact, the four factors have an impact on the user's forwarding behavior. In addition, we should also pay attention to the real-time nature of the calculation time of the prediction model. Based on the above analysis, a Weibo forwarding prediction model based on hybrid features and XGBoost algorithm is proposed. Firstly, user features, Weibo features, social features and interest features are extracted according to four factors. Then user influence is calculated based on PageRank algorithm, interest similarity is calculated based on Latent Dirichlet Allocation（LDA） model and KL distance and define the calculation formula of user forwarding activity and interaction intensity between users. Finally, the XGBoost algorithm is used to construct the prediction model and perform forwarding prediction analysis. The experimental results show that the prediction method of this paper has a good performance in the evaluation index of accuracy and time, and also verifies the importance and effectiveness of considering four factors comprehensively.

HTML全文

参考文献(20)

施引文献

资源附件(0)