Abstract:
To support effective clinical decision-making, accurate diagnostic prediction is crucial, requiring efficient extraction and representation learning from electronic health records (EHR). However, representing and learning patient visit information remains a significant challenge in diagnostic prediction. To address this issue, we propose a novel approach called Heterogeneous Graph Contrastive Learning (HGCL) for diagnostic prediction. HGCL constructs a bipartite graph composed of diseases and medications, utilizing graph neural networks to propagate information through the graph. By treating disease embeddings and their corresponding outputs after even propagation steps as positive sample pairs, a contrastive loss is defined on the bipartite graph to learn the complex relationships between diseases and medications. Furthermore, disease embeddings are used to represent patient visit information and learn intra-visit features. Experimental results on two real-world EHR datasets demonstrate that HGCL significantly outperforms existing methods on weighted F1 score (F1
w) and top-k recall (
R @
k) metrics. For instance, on the MIMIC-Ⅲ dataset, HGCL achieves an F1
w of 26.96%, improving by 1.75 percentage points over the best existing method; on the MIMIC-Ⅳ dataset, its top-10 recall (
R @10) reaches 35.01%, surpassing the best existing method by 3.38 percentage points. Additionally, HGCL’s representation of diseases and medications in two-dimensional space highly aligns with professional domain knowledge, demonstrating the interpretability and clinical application value of this model.