用于诊断预测的异构图对比学习

Heterogeneous graph contrastive learning for diagnosis prediction

  • 摘要: 为了支持有效的临床决策,准确的诊断预测至关重要,这要求能够从电子健康记录(electronic health records, EHR)中有效地提取和学习表示. 然而,如何表示和学习患者就诊信息是诊断预测中的一大挑战. 为了解决这一问题,提出一种称为异构图对比学习(heterogeneous graph contrastive learning,HGCL)的新方法用于诊断预测. HGCL构建一个由疾病和药物组成的二分图,利用图神经网络在图中传播信息. 通过将疾病的嵌入及其偶数传播步后的相应输出视为正样本对,在二分图上定义对比损失以学习疾病和药物之间的复杂关系. 此外,疾病的嵌入被用来表示患者的就诊信息并学习就诊内部特征. 在两个真实世界EHR数据集上进行的实验结果表明,HGCL在加权F1分数(F1w)和前 k 召回率( R @ k )指标上显著优于现有方法. 例如,在数据集MIMIC-Ⅲ上,HGCL的F1w达到26.96%,相比现有最佳方法提升了1.75个百分点;在数据集MIMIC-Ⅳ上,其前10召回率( R @10)达到 35.01%,相比现有最佳方法提升了3.38个百分点. 此外,HGCL将疾病和药物映射到二维空间后的表示与专业领域知识高度吻合,这充分展示了该模型的可解释性和临床应用价值.

     

    Abstract: To support effective clinical decision-making, accurate diagnostic prediction is crucial, requiring efficient extraction and representation learning from electronic health records (EHR). However, representing and learning patient visit information remains a significant challenge in diagnostic prediction. To address this issue, we propose a novel approach called Heterogeneous Graph Contrastive Learning (HGCL) for diagnostic prediction. HGCL constructs a bipartite graph composed of diseases and medications, utilizing graph neural networks to propagate information through the graph. By treating disease embeddings and their corresponding outputs after even propagation steps as positive sample pairs, a contrastive loss is defined on the bipartite graph to learn the complex relationships between diseases and medications. Furthermore, disease embeddings are used to represent patient visit information and learn intra-visit features. Experimental results on two real-world EHR datasets demonstrate that HGCL significantly outperforms existing methods on weighted F1 score (F1w) and top-k recall (R @ k) metrics. For instance, on the MIMIC-Ⅲ dataset, HGCL achieves an F1w of 26.96%, improving by 1.75 percentage points over the best existing method; on the MIMIC-Ⅳ dataset, its top-10 recall (R @10) reaches 35.01%, surpassing the best existing method by 3.38 percentage points. Additionally, HGCL’s representation of diseases and medications in two-dimensional space highly aligns with professional domain knowledge, demonstrating the interpretability and clinical application value of this model.

     

/

返回文章
返回