基于词频与窗口机制的特征融合文本分类方法

张赜涛; 马迪南; 施睿; 杨杰; 唐菁敏

doi:10.7540/j.ynu.20250263

基于词频与窗口机制的特征融合文本分类方法

A frequency-aware window-enhanced feature fusion framework for text classification

摘要

摘要: 针对文本分类中低频词语义表达不足和通用词干扰等问题，提出了一种基于词频的窗口增强特征融合文本分类方法（frequency-based window-enhanced convolutional neural network, F-WCNN）. 首先利用百分位法对文本划分高频词与低频词，针对高低频词特征差异提出分层建模策略：然后高频词结合预训练词向量与TF-IDF加权以减少通用词干扰，低频词通过滑动窗口机制引入上下文信息以增强语义表达；接着对词特征融合，构建基于注意力机制与焦点损失函数的多通道卷积神经网络，提升难分类样本的识别能力及标签不平衡的鲁棒性；最后，实验结果表明，F-WCNN在新闻文本分类优于其他基准算法，且具有较强的泛化能力和实用价值.

Abstract: This study introduces a frequency-based window-enhanced feature fusion method for text classification (F-WCNN) to address the inadequate semantic representation of low-frequency words and the interference caused by common terms. Words are first stratified into high- and low-frequency groups using the percentile method. A hierarchical modeling strategy is then employed: high-frequency words are integrated with pre-trained embeddings and TF-IDF weighting to alleviate the influence of common terms, while low-frequency words are enriched with contextual information via a sliding-window mechanism to strengthen semantic expressiveness. The fused representations are subsequently processed by a multi-channel convolutional neural network incorporating an attention mechanism and focal loss, which enhances the recognition of hard-to-classify samples and improves robustness under class imbalance. Experimental results on news text classification tasks demonstrate that F-WCNN achieves superior performance compared with baseline methods, exhibiting strong generalization ability and practical value.

HTML全文

参考文献(16)

施引文献

资源附件(0)