Abstract:
A quantitative structure-ADMET prediction model is proposed to improve the prediction of absorption, distribution, metabolism, excretion and toxicity of anti-breast cancer drugs in the virtual screening process. Firstly, 319 variables are selected from the molecular descriptors of the compounds. Then Logistic Regression (LR), Naïve Bayes (NB) and Gradient Boosting Decision Tree (GBDT) are used to predict the properties of ADMET. Finally, in order to address the problem of high training cost of GBDT models, the GBDT* model is constructed by fitting the relationship between hyperparameters and prediction accuracy (i.e. black box models) with the help of a probabilistic agent model. The results show that the GBDT* integrated learning model performs best overall. The results show that the overall performance of the GBDT* integrated learning prediction model is optimal. The accuracy, precision, sensitivity and AUC of GBDT* reach over 90%, 88%, 89%, and 0.95, respectively, and the false alarm rate is less than 15%, indicating that the GBDT* integrated machine learning model has good performance in predicting the ADMET properties of anti-breast cancer active compounds.