秦雅琴, 夏玉兰, 卢梦媛, 王锦锐, 谢济铭. 抗乳腺癌活性化合物的ADMET性质预测模型[J]. 云南大学学报(自然科学版), 2022, 44(6): 1127-1134. doi: 10.7540/j.ynu.20210642
引用本文: 秦雅琴, 夏玉兰, 卢梦媛, 王锦锐, 谢济铭. 抗乳腺癌活性化合物的ADMET性质预测模型[J]. 云南大学学报(自然科学版), 2022, 44(6): 1127-1134. doi: 10.7540/j.ynu.20210642
QIN Ya-qin, XIA Yu-lan, LU Meng-yuan, Wang Jin-rui, XIE Ji-ming. Predictive modeling of ADMET properties of anti-breast cancer active compounds[J]. Journal of Yunnan University: Natural Sciences Edition, 2022, 44(6): 1127-1134. DOI: 10.7540/j.ynu.20210642
Citation: QIN Ya-qin, XIA Yu-lan, LU Meng-yuan, Wang Jin-rui, XIE Ji-ming. Predictive modeling of ADMET properties of anti-breast cancer active compounds[J]. Journal of Yunnan University: Natural Sciences Edition, 2022, 44(6): 1127-1134. DOI: 10.7540/j.ynu.20210642

抗乳腺癌活性化合物的ADMET性质预测模型

Predictive modeling of ADMET properties of anti-breast cancer active compounds

  • 摘要: 为提升抗乳腺癌药物虚拟筛选过程中吸收(absorption)﹑分配(distribution)﹑代谢(metabolism)、排泄(excretion)﹑毒性(toxicity)等属性的预测能力,提出一种抗乳腺癌药物定量结构−ADMET性质预测模型. 首先,从化合物的分子描述符数据中遴选出对ADMET性质具有影响的319个特征变量;然后,以逻辑回归(Logistic Regression,LR)、朴素贝叶斯(Naïve Bayes,NB)、梯度提升决策树(Gradient Boosting Decision Tree,GBDT)作为ADMET分类预测的候选模型,筛选出GBDT模型为最优模型;最后,针对GBDT模型训练成本较高的问题,借助概率代理模型拟合超参数与预测精度之间的关系(即黑箱模型)构建GBDT*模型. 结果显示,GBDT*集成学习模型整体表现最优,准确率、精准率、灵敏度、AUC指标分别达90%、88%、89%、0.95以上,误报率低于15%,表明GBDT*集成机器学习模型在抗乳腺癌活性化合物的ADMET性质预测方面具有良好的性能.

     

    Abstract: A quantitative structure-ADMET prediction model is proposed to improve the prediction of absorption, distribution, metabolism, excretion and toxicity of anti-breast cancer drugs in the virtual screening process. Firstly, 319 variables are selected from the molecular descriptors of the compounds. Then Logistic Regression (LR), Naïve Bayes (NB) and Gradient Boosting Decision Tree (GBDT) are used to predict the properties of ADMET. Finally, in order to address the problem of high training cost of GBDT models, the GBDT* model is constructed by fitting the relationship between hyperparameters and prediction accuracy (i.e. black box models) with the help of a probabilistic agent model. The results show that the GBDT* integrated learning model performs best overall. The results show that the overall performance of the GBDT* integrated learning prediction model is optimal. The accuracy, precision, sensitivity and AUC of GBDT* reach over 90%, 88%, 89%, and 0.95, respectively, and the false alarm rate is less than 15%, indicating that the GBDT* integrated machine learning model has good performance in predicting the ADMET properties of anti-breast cancer active compounds.

     

/

返回文章
返回