Including crystal structure attributes in machine learning models of formation energies via Voronoi tessellations
While high-throughput density functional theory (DFT) has become a prevalent tool for materials discovery, it is limited by the relatively large computational cost. In this paper, we explore using DFT data from high-throughput calculations to create faster, surrogate models with machine learning (ML) that can be used to guide new searches. Our method works by using decision tree models to map DFT-calculated formation enthalpies to a set of attributes consisting of two distinct types: (i) composition-dependent attributes of elemental properties (as have been used in previous ML models of DFT formation energies), combined with (ii) attributes derived from the Voronoi tessellation of the compound's crystal structure. The ML models created using this method have half the cross-validation error and similar training and evaluation speeds to models created with the Coulomb matrix and partial radial distribution function methods. For a dataset of 435 000 formation energies taken from the Open Quantum Materials Database (OQMD), our model achieves a mean absolute error of 80 meV/atom in cross validation, which is lower than the approximate error between DFT-computed and experimentally measured formation enthalpies and below 15% of the mean absolute deviation of the training set. We also demonstrate that our method can accurately estimate the formation energy of materials outside of the training set and be used to identify materials with especially large formation enthalpies. We propose that our models can be used to accelerate the discovery of new materials by identifying the most promising materials to study with DFT at little additional computational cost.