林海香, 陆人杰, 卢冉, 许丽. 基于文本挖掘的铁路信号设备故障自动分类方法[J]. 云南大学学报(自然科学版), 2022, 44(2): 281-289. doi: 10.7540/j.ynu.20210168
引用本文: 林海香, 陆人杰, 卢冉, 许丽. 基于文本挖掘的铁路信号设备故障自动分类方法[J]. 云南大学学报(自然科学版), 2022, 44(2): 281-289. doi: 10.7540/j.ynu.20210168
LIN Hai-xiang, LU Ren-jie, LU Ran, XU Li. Automatic classification method of railway signal fault based on text mining[J]. Journal of Yunnan University: Natural Sciences Edition, 2022, 44(2): 281-289. DOI: 10.7540/j.ynu.20210168
Citation: LIN Hai-xiang, LU Ren-jie, LU Ran, XU Li. Automatic classification method of railway signal fault based on text mining[J]. Journal of Yunnan University: Natural Sciences Edition, 2022, 44(2): 281-289. DOI: 10.7540/j.ynu.20210168

基于文本挖掘的铁路信号设备故障自动分类方法

Automatic classification method of railway signal fault based on text mining

  • 摘要: 铁路信号设备在运营维护过程中积累了大量以文本方式记录的维护数据,为了实现高效准确分类,提出将Word2vec、SMOTE算法与卷积神经网络(Convolutional Neural Networks, CNN)相结合的铁路信号设备故障文本自动分类方法. 首先,对故障文本使用自然语言方法完成预处理,并采用Word2vec训练词向量;其次,通过SMOTE算法自动生成小类别文本向量数据,嵌入至CNN的输入层;再次,利用CNN的卷积层和池化层提取故障文本的局部上下文高层特征;最后,通过softmax分类器对故障文本自动分类. 依据某铁路局所记录的信号设备故障文本数据进行实验分析并与其他方法对比,实验结果表明新方法可使各评价指标得到明显提升,其中分类准确率和召回率分别达到95.26%和94.32%,可以作为铁路信号设备故障自动分类的有效方法.

     

    Abstract: Railway signal equipment has accumulated a large amount of text-recorded maintenance data during the operation and maintenance process. In order to realize efficient and precise classification, an automatic classification method of railway signal equipment fault text combining Word2vec, SMOTE algorithm and Convolutional Neural Network(CNN) was proposed in this paper. Firstly, the fault text was preprocessed by natural language methods, and Word2vec was used to train word vector, then text vector data of small category was generated automatically by SMOTE algorithm. Secondly, the generated word vectors were embedded in the input layer of CNN, then convolutional and pooling layer were used to extract high-level features of the local context of the fault text. Finally, softmax classifier was used to complete automatic classification of the fault text data. According to the test analysis of fault text of signal equipment recorded by a railway bureau and comparison with other methods, the test results indicate that this method can obviously upgrade the evaluation indexes, among which classification precision rate and recall rate can reach 95.26% and 94.32% respectively, and it can be used as an effective method for automatic classification of railway signal equipment faults.

     

/

返回文章
返回