基于词频与窗口机制的特征融合文本分类方法

A Frequency-Aware Window-Enhanced Feature Fusion Framework for Text Classification

  • 摘要: 针对文本分类中低频词语义表达不足和通用词干扰等问题,提出了一种基于词频的窗口增强特征融合文本分类方法(frequency-based window-enhanced convolutional neural network, F-WCNN). 首先利用百分位法对文本划分高频词与低频词,针对高低频词特征差异提出分层建模策略:高频词结合预训练词向量与TF-IDF加权以减少通用词干扰,低频词通过滑动窗口机制引入上下文信息以增强语义表达;接着对词特征融合,构建基于注意力机制与焦点损失函数的多通道卷积神经网络,提升难分类样本的识别能力及标签不平衡的鲁棒性. 最后,实验结果表明,F-WCNN在新闻文本分类优于其他基准算法,且具有较强的泛化能力和实用价值.

     

    Abstract: This study introduces a frequency-based window-enhanced feature fusion method for text classification (F-WCNN) to address the inadequate semantic representation of low-frequency words and the interference caused by common terms. Words are first stratified into high- and low-frequency groups using the percentile method. A hierarchical modeling strategy is then employed: high-frequency words are integrated with pre-trained embeddings and TF-IDF weighting to alleviate the influence of common terms, while low-frequency words are enriched with contextual information via a sliding-window mechanism to strengthen semantic expressiveness. The fused representations are subsequently processed by a multi-channel convolutional neural network incorporating an attention mechanism and focal loss, which enhances the recognition of hard-to-classify samples and improves robustness under class imbalance. Experimental results on news text classification tasks demonstrate that F-WCNN achieves superior performance compared with baseline methods, exhibiting strong generalization ability and practical value.

     

/

返回文章
返回