带隐变量贝叶斯网学习方法:研究综述

Approaches for learning a Bayesian network with latent variables: A survey

  • 摘要: 带隐变量贝叶斯网是一种重要的概率图模型,通过引入隐变量,对数据中的隐含知识进行定性和定量描述,从而实现不确定性知识的表示和推理. 近年来,带隐变量贝叶斯网的学习,成为了不确定人工智能和知识发现领域中的重要研究方向. 文章分析总结了目前带隐变量贝叶斯网学习研究面临的挑战,针对所涉及的确定隐变量的势和个数、参数学习及结构学习这3个方面的工作,介绍确定隐变量势和个数的基本思想,对学界广泛关注的参数学习和结构学习的代表性成果进行了综述,给出相关方法的适用场景、基本思想和主要步骤,也给出相应的对比分析. 确定隐变量的势及个数方面,阐述了基于聚类的方法和基于团的方法;参数学习方面,阐述了包括插补、梯度上升、EM算法在内的方法,以及基于EM算法的改进方法;结构学习方面,阐述了基于评分搜索方法和基于条件独立方法的基本思想,以及基于评分搜索算法的改进方法. 此外,基于对现有研究成果的分析总结,指出了带隐变量贝叶斯网学习进一步研究的问题及重点.

     

    Abstract: As an important Probabilistic Graphical Model (PGM), Bayesian Network with Latent Variables (BNLV) is an effective framework for representing and inferring uncertainty knowledge by incorporating latent variables to qualitatively and quantitatively describe the implicit knowledge implied in data. In recent years, BNLV learning has become an important research issue in the field of uncertain artificial intelligence and knowledge discovery. In this paper, we analyze and summarize the challenges of BNLV learning, and survey the representative methods in three aspects: determination of the cardinality and number of latent variables, parameter learning, and structure learning. Moreover, we give the applicable scenarios, basic ideas and principal steps of the above methods, as well as the corresponding comparative analysis. Regarding the determination of the cardinality and number of latent variables, we interpret the cluster-based and clique-based methods. Then, we discuss the methods of parameter learning, including the imputation, gradient ascent and EM algorithm, as well as the EM based improvement methods. Further, we present the basic ideas of structure learning, including condition independence and scoring & search, as well as the scoring & search based improvement methods. Upon the analysis and summary of the state-of-the-art research findings, we also point out some problems and emphasis of further study of BNLV learning.

     

/

返回文章
返回