Thermoelectric generators offer a promising avenue for harnessing waste heat, yet their efficiency hinges on enhancing the thermoelectric figure of merit. A key strategy to improve this figure is to reduce thermal conductivity, particularly lattice thermal conductivity (LTC). Accurate evaluation of LTC, comparable to experimental data, demands computational methods that surpass standard Density Functional Theory (DFT) calculations. This necessity arises because capturing the intricate interactions among phonons, known as anharmonic lattice dynamics, requires computational resources significantly greater—by orders of magnitude—than typical DFT calculations for primitive cells. Such computationally intensive analyses are only practical for a limited number of simple compounds. Consequently, high-throughput screening of extensive DFT databases for LTC becomes an impractical approach unless the search space is drastically narrowed.
Fig. 1.7
Recently, a significant advancement was reported by Togo et al., who introduced a method to systematically derive theoretical LTC values through first-principles anharmonic lattice dynamics calculations [23]. Figure 1.7a showcases the outcomes of these first-principles LTC calculations for 101 compounds, presented as functions of crystalline volume per atom, V. Notably, PbSe, with its rocksalt structure, exhibits the lowest LTC at 0.9 W/mK (at 300 K), aligning with recent findings on low LTC in lead- and tin-chalcogenides.
A comparison between computed and available experimental data is illustrated in Figure 1.7b. The commendable agreement between these datasets underscores the utility of first-principles LTC data for advancing research in this field. A phenomenological relationship linking (log kappa _L) to (log V) has been proposed [24]. While a qualitative correlation between our LTC and V is observable, relying solely on this phenomenological relationship for quantitative predictions of LTC or the discovery of novel low-LTC compounds proves challenging. It is important to note that the dependence on V varies significantly between rocksalt-type and zincblende- or wurtzite-type compounds, although zincblende- and wurtzite-type compounds with the same chemical composition show similar LTC values. This comprehensive dataset of 101 first-principles LTC values has been instrumental in developing a model to predict LTC within a broader chemical library [5]. Initially, a Gaussian process (GP)-based Bayesian optimization [25] was employed, utilizing two physical quantities as descriptors for machine learning: V and density, (rho ). These descriptors are readily accessible in most experimental or computational crystal structure databases. Despite the proposed phenomenological link between (log kappa _L) and V, their correlation is weak, and the correlation between (log kappa _L) and (rho ) is even weaker.
The process begins with an observed dataset of five compounds randomly selected from the larger dataset. Bayesian optimization is then used to search for compounds with the maximum probability of improvement [26] from the remaining data. This involves identifying the compound with the highest Z-score derived from the GP model and adding it to the observed dataset. This iterative process, involving Bayesian optimization and random searches, is repeated 200 times to assess the average number of observed compounds needed to pinpoint the optimal compound.
The average number of compounds required for optimization, denoted as (N_{mathrm{ave}}), is significantly lower for Bayesian optimization (11) compared to random searches (55). This indicates that Bayesian optimization, using just volume and density as descriptors, substantially improves the efficiency of finding compounds with the lowest LTC, such as rocksalt PbSe. However, relying solely on these two descriptors for Bayesian optimization lacks robustness in consistently identifying the lowest LTC. For instance, when the Bayesian optimization is repeated after intentionally excluding the compounds with the first and second lowest LTC, the (N_{mathrm{ave}}) to find LiI increases to 65, which is even higher than that of a random search ((N_{mathrm{ave}} = 50)). This optimization delay suggests that LiI is an outlier when LTC is modeled using only V and (rho ). Identifying such outlier compounds with low LTC becomes difficult with only these two descriptors.
To address the outlier issue and enhance the robustness of the prediction, descriptors for machine learning were expanded to include information about the constituent chemical elements. While various options exist for such variables, binary elemental descriptors were introduced. These descriptors are sets of binary digits representing the presence of each chemical element. Given that the 101 LTC dataset comprises 34 distinct elements, 34 binary elemental descriptors were used. Incorporating these elemental descriptors, alongside V and (rho ), significantly improved the search efficiency. When searching for both PbSe and LiI, the compound with the lowest LTC was found with an average of (N_{mathrm{ave}} = 19) observations. This demonstrates that using binary elemental descriptors effectively enhances the robustness of the efficient search process.
Fig. 1.8
Stronger correlations with LTC can be achieved using parameters derived from the phonon density of states. Figure 1.8 illustrates the relationships between LTC and various physical properties. In addition to volume and density, phonon calculations yielded quantities such as mean phonon frequency, maximum phonon frequency, Debye frequency, and Grüneisen parameter. The Debye frequency is determined by fitting the phonon density of states in the range of 0 to 1/4 of the maximum phonon frequency to a quadratic function. The thermodynamic Grüneisen parameter is derived from mode-Grüneisen parameters, calculated using a quasi-harmonic approximation, and mode-heat capacities. The correlation coefficients R between (log kappa _L) and these physical properties are presented in the corresponding panels. However, this study does not employ these phonon parameters as descriptors for machine learning due to the lack of comprehensive data libraries for phonon parameters across a wide range of compounds. The subsequent results are based solely on the descriptor set composed of 34 binary elemental descriptors, in conjunction with V and (rho ).
A GP prediction model was then utilized to screen for low-LTC compounds within a large library. In the biomedical field, prediction model-based screening is termed “virtual screening” [27]. For this virtual screening, all 54,779 compounds from the Materials Project Database (MPD) library [28], predominantly sourced from the Inorganic Crystal Structure Database (ICSD) [29], were included. The majority of these compounds have been experimentally synthesized at least once. Based on the GP prediction model, trained with V, (rho ), and the 34 binary elemental descriptors for machine learning using the 101 LTC data points, compounds were ranked by their Z-scores to identify potential low-LTC candidates from the 54,779 compounds.
Fig. 1.9
Figure 1.9 presents the Z-score distribution for the 54,779 compounds in relation to V and (rho ). The magnitude of the Z-score is indicated in panels corresponding to the constituent elements. The wide distribution of compounds in the (V-rho ) space highlights the difficulty of identifying promising candidates without Bayesian optimization augmented with elemental descriptors. The broadly distributed Z-scores for light elements like Li, N, O, and F suggest that their presence has a minimal impact on lowering LTC. However, when these light elements combine with heavier elements, the resulting compounds tend to exhibit high Z-scores. Conversely, compounds composed of lighter elements such as Be and B often show high LTC. Elements like Pb, Cs, I, Br, and Cl demonstrate unique characteristics, with many of their compounds exhibiting high Z-scores. In fact, most compounds with positive Z-scores are combinations of these five elements. Interestingly, neighboring elements in the periodic table do not show similar trends. For example, Tl and Bi, which are neighbors of Pb, rarely exhibit high Z-scores, despite (text {Bi}_2text {Te}_3) being a well-known thermoelectric material and some Tl-containing compounds having low LTC. This discrepancy might stem from the “biased” nature of the training dataset, which is limited to AB compounds with 34 elements and simple crystal structures. This bias is currently unavoidable due to the computational cost of first-principles LTC calculations, hindering the creation of an unbiased dataset large enough to encompass the diversity of chemical compositions and crystal structures. Nevertheless, the effectiveness of even a biased training dataset in discovering low-LTC materials is still valuable and will be further investigated. While this bias might prevent the discovery of all low-LTC materials in the library, it still enables the identification of some. It’s crucial to acknowledge that a Z-score ranking does not perfectly correlate with a true first-principles LTC ranking. Therefore, validating virtual screening candidates through first-principles calculations is a critical step in “discovering” low-LTC compounds. First-principles LTC evaluations were performed for the top eight compounds identified by virtual screening. Although LTC calculation for (text {Pb}_2text {RbBr}_5) was unsuccessful due to imaginary phonon modes, the top five compounds—(text {PbRbI}_3), PbIBr, (text {PbRb}_4text {Br}_6), PbICl, and PbClBr—all demonstrated LTC values below 0.2 W/mK (at 300 K), significantly lower than rocksalt PbSe (0.9 W/mK at 300 K). This outcome confirms the potent capability of the GP prediction model, utilizing appropriate descriptors for machine learning, to efficiently discover low-LTC compounds. This method holds significant promise for materials discovery across diverse applications where optimizing material chemistry is essential.
Fig. 1.10
Finally, the performance of Bayesian optimization was evaluated using compound descriptors derived from elemental and structural representations for the LTC dataset, which included compounds identified through virtual screening. GP models were constructed using two sets of descriptors for machine learning: (1) means and standard deviations (SDs) of elemental representations and Gaussian radial distribution functions (GRDFs), and (2) means and SDs of elemental representations and bond-order potentials (BOPs). Figure 1.10 compares the lowest LTC achieved during Bayesian optimization with that of a random search. The optimization aimed to identify PbClBr, known for its low LTC. With the GP model using BOP descriptors, the average number of samples required for optimization, (N_{mathrm{ave}}), was only 5.0, ten times smaller than that of the random search ((N_{mathrm{ave}} = 50)). This clearly demonstrates that Bayesian optimization, when coupled with effective descriptors for machine learning, significantly enhances the efficiency of discovering PbClBr compared to random searching.
To assess the method’s ability to find a diverse range of low-LTC compounds, two datasets were prepared, each intentionally excluding certain low-LTC compounds. In these tests, CuCl and LiI, which rank as the 11th and 12th lowest LTC compounds respectively, were targeted for optimization. Using the GP model with BOP descriptors, the average number of observations needed to find CuCl and LiI was (N_{mathrm{ave}} = 15.1) and 9.1, respectively. These numbers are considerably smaller than those for random searches. However, with the GP model using GRDF descriptors, the average number of observations for finding CuCl and LiI increased to (N_{mathrm{ave}} = 40.5) and 48.6, respectively. This delayed optimization might be attributed to CuCl and LiI being outliers in the GRDF descriptor model, even though this model has a Root Mean Square Error (RMSE) similar to the BOP descriptor model. These results highlight the importance of optimizing the set of descriptors for machine learning by evaluating Bayesian optimization performance across a broad spectrum of compounds to effectively identify outlier compounds and ensure robust materials discovery.