Abstract:
To address the challenges faced by current millimeter-wave radar-based aquatic human activity recognition methods in complex environments, this paper proposes a network model based on the cross-self-attention mechanism. First, a multipath feature fusion module is designed to extract behavior features at different scales in parallel, achieving complementary representation of fine-grained and global features. Next, a multi self-attention and multi cross-attention module is introduced to enable deep interactions both within and across feature domains, effectively distinguishing behaviors with high similarity. Finally, a Transformer module is employed to capture the global correlation of spatial features, further improving the model's representational capability. Experimental results on the publicly available AHAR (Aquatic Human Activity Recognition) dataset demonstrate that the proposed model achieves classification performance with an accuracy, recall, and F-score of 0.926 1, 0.928 0, and 0.927 5, respectively, surpassing the best comparative model by 0.102 4, 0.102 8, and 0.101 7. Additionally, the proposed model exhibits superior generalization performance compared to other baseline models.