首页 | 本学科首页   官方微博 | 高级检索  
     

基于机器学习高通量筛选吸附甲烷的金属有机框架材料
引用本文:于天鑫,彭璇. 基于机器学习高通量筛选吸附甲烷的金属有机框架材料[J]. 北京化工大学学报(自然科学版), 2021, 48(2): 100-107. DOI: 10.13543/j.bhxbzr.2021.02.013
作者姓名:于天鑫  彭璇
作者单位:北京化工大学 信息科学与技术学院, 北京 100029
摘    要:采用决策树(DT)模型及其衍生的随机森林(RF)模型、极端随机树(ET)模型和梯度提升树(GBDT)模型,对用于甲烷吸附的金属有机框架材料(MOFs)进行了高通量的计算筛选。利用1 800种材料的特征向量数据,计算了特征向量之间的相关性并进行重要度分析,发现材料的结构特征与化学信息特征的相关性不大,但是结构特征的重要度较高。将数据库中的1 260种材料作为训练集并使用上述4种机器学习模型进行训练,再将剩余的540种材料作为测试集对模型的筛选结果进行比较和评估。接收者操作特征(ROC)曲线和查准率-查全率(PR)曲线结果表明,GBDT模型自身稳定性强且预测结果精度高,因而成为筛选吸附甲烷的高性能金属有机框架材料的最佳模型。针对RF模型和GBDT模型进行参数优化,发现协调单个决策树的个数和决策树节点的分裂特征数量能够有效改善RF模型的性能,而调节回归树的学习速率和迭代次数可有效改善GBDT模型性能。最后基于540种材料的测试集,利用GBDT模型筛选出前20种高性能吸附材料,分析了它们的主要特征向量与甲烷吸附量之间的关系。

关 键 词:甲烷吸附  金属有机框架材料  机器学习  高通量筛选  
收稿时间:2020-08-25

High throughput screening of metal-organic framework materials based on machine learning
YU TianXin,PENG Xuan. High throughput screening of metal-organic framework materials based on machine learning[J]. Journal of Beijing University of Chemical Technology, 2021, 48(2): 100-107. DOI: 10.13543/j.bhxbzr.2021.02.013
Authors:YU TianXin  PENG Xuan
Affiliation:College of Information Science and Technology, Beijing University of Chemical Technology, Beijing 100029, China
Abstract:High throughput screening of metal organic frameworks (MOFs) for methane adsorption has been carried out using a decision tree (DT) model and its derived random forest (RF) model, an extreme random tree (ET) model and a gradient lifting tree (GBDT) model. Using the eigenvector data of 1 800 kinds of materials, the correlation and importance were calculated. It was found that the structural characteristics of materials had little correlation with chemical information characteristics, but the importance of the structural characteristics of materials was higher. 1 260 kinds of materials in the database were used as training sets and the four machine learning models were used for training, and the remaining 540 materials were used as test sets to compare and evaluate the screening results of the models. On the basis of the receiver operating characteristic (ROC) curve and the precision recall (PR) curve, it is found that the GBDT model has strong stability and high prediction accuracy, making it the best way to select MOF materials for adsorption of methane. For the parameter optimization of RF and GBDT models, it was found that the coordination of the number of single decision tree and the number of split features of decision tree nodes can effectively improve the performance of RF model, while adjusting the learning rate and iteration times of regression tree can effectively improve the performance of GBDT model. Based on the test set of 540 materials, the relationship between the main feature vector and methane adsorption capacity was analyzed by using the first 20 high performance adsorption materials screened by the GBDT model.
Keywords:methane adsorption   metal-organic framework material   machine learning   high throughput screening
本文献已被 CNKI 万方数据 等数据库收录!
点击此处可从《北京化工大学学报(自然科学版)》浏览原始摘要信息
点击此处可从《北京化工大学学报(自然科学版)》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号