Abstract:
Multilingual event discovery is the clustering of multiple language texts that describe the same event into the same cluster, and it is the foundation of multilingual event analysis. Deep clustering methods based on optimizing the distance between text representations are used to achieve clustering, and their performance heavily depends on the model's representation ability. In a multilingual environment, text representation alignment is not ideal, which makes multilingual event clustering difficult. This paper proposes a multilingual event discovery method based on augmentation contrastive learning. This method optimizes the distance between event texts and the centroids of clusters, as well as the distance between multilingual positive and negative samples. This enhances the proximity of multilingual texts describing the same event in the representation space and improves the model's representation ability for multilingual texts. Additionally, the method introduces event features as the representation of event clustering centers, further improving the effectiveness of multilingual event clustering. Experimental results on the Reuters dataset show that the proposed method improves the performance of multiple pre-trained models, achieving the best accuracy and standardized mutual information of 76.14% and 91.09%, respectively.