Bayesian techniques: an overview

Krappe, H. J.; Holub-Krappe, E.; Konishi, T.; Rossner, H. H.

doi:10.1107/S1574870722005511

RELATED SITES: IUCr | IUCr Journals

International
Tables for
Crystallography
Volume I
X-ray absorption spectroscopy and related techniques
Edited by C. T. Chantler, F. Boscherini and B. Bunker

International Tables for Crystallography (2024). Vol. I. ch. 5.15, pp. 702-704
https://doi.org/10.1107/S1574870722005511

Chapter 5.15. Bayesian techniques: an overview

Hans J. Krappe,^a ^* Elizabeta Holub-Krappe,^a Takehisa Konishi^b and Hermann H. Rossner^a

^aRetired from Helmholtz-Zentrum Berlin für Materialien und Energie GmbH, Hahn-Meitner-Platz 1, 14109 Berlin, Germany, and ^bGraduate School of Advanced Integration Science, Chiba University, 1-33 Yayoi-cho, Inage, Chiba 263-8522, Japan
Correspondence e-mail: [email protected]

The standard single-scattering formula gives the experimental extended X-ray absorption fine-structure (EXAFS) cross section as function of the wavenumber in terms of a set of model parameters, including the average distances of the atoms involved in producing the EXAFS signal. To solve the inverse problem of determining the model parameters from the cross section, measured over some range of wavenumbers, a Bayesian approach is used. It allows the subspace of the model-parameter space in which the data essentially determine the model parameters to be determined.

Keywords: EXAFS; model parameters; Bayesian approach.

1. Introduction

The analysis of X-ray absorption fine-structure (XAFS) data is traditionally based on least-squares fitting methods. However, if there are more parameters than the data alone allow to be determined, this leads to an ill-conditioned system of equations (Stern, 1988 ; Rehr & Albers, 2000 ). We instead propose a Bayesian approach (Krappe & Rossner, 1985a ,b , 2004 ). This allows the subspace $[\cal R]$ of the total model-parameter space $[\cal Q]$ in which the data decide predominantly upon the outcome of the fit to be determined, whereas in complementary space a priori assumptions essentially fix the result.

2. The direct problem and the inverse problem in X-ray data fitting

We start the discussion with the direct problem, in which the cross section for photoelectron absorption μ_exp(k) on the K edge or an isolated L edge is calculated as a function of the wavenumber k of the absorbed X-ray. The calculation is based on the single-scattering formula for monoatomic, unoriented samples or samples with cubic symmetry, valid for k > k_cut, where the energy k²ℏ²/(2m) is sufficiently larger than the threshold energy for the photoeffect (Stern, 1988 ; Zabinsky et al., 1995 ; Bunker, 1983 ; Tröger et al., 1994 ), $[\eqalignno {\chi(k) & = {{\mu_{\rm exp}(k)} \over {\mu_{0}(k)}}-1 \cr & = {{S_{0}^{2}} \over {k}} {\textstyle \sum\limits_{j}} N_{j}{{|f_{j}(k,R_{j})|} \over {R^{2}_{j}}} \exp[-2k^{2}\sigma^{2}_{j}-2R_{j}/\lambda(k)] \cr &\ \quad {\times}\ \sin\left[2k(R_{j}-\delta R _{j})+\varphi_{j}(k)-{{4} \over {3}}C_{3,j}k^{3}\right], & (1)}]$ with the length correction δR_j = δR_j∥ + δR_j⊥ (Fornasini et al., 2004 ; Rossner et al., 2006 ), where $[\eqalign {\delta R_{j\parallel} & = 2\sigma^{2}_{j}(R_{j}^{-1}+\lambda^{-1}), \cr \delta R_{j\perp} & = (1/2)\textstyle\sum\limits_{i}(\sigma^{\rm (therm)})_{i}^{2}/|R_{i-1}-R_{i}|,}]$ and truncating the sum beyond the Jth term, related to the parameter k_cut. The model parameters in this approach are the following: the correction factor S₀ for many-electron effects, the half-path distances of the J scattering paths, R_j, j = 1, …, J, the projected Debye–Waller (DW) parameters and the anharmonicity parameters, $[\sigma^{2}_{j}]$ and C_3;j, respectively. The correction factor S₀ does in fact depend slightly on the path j and on k. We define $[S_{0}^{2}]$ as the average of the actual $[S^{2}_{j}(k)]$ averaged over j and over k in the relevant k-range, with the remaining k and j dependence being absorbed in the scattering amplitudes f_j(k, R_j). The absorption coefficient for the embedded absorbing atom μ₀(k), the amplitudes f_j(k, R_j), the scattering phases φ_j(k) and the damping parameter λ(k) follow from an (approximate) solution of the n-electron scattering problem and are, for example, calculated by the FEFF code (Zabinsky et al., 1995 ; Ankudinov, 1996 ; Rehr & Albers, 2000 ; Kas et al., 2024 ).

To solve the inverse problem, that is to determine the model parameters from a given set of measured values μexp(E_l) at energies E_l, we first obtain wavenumbers k_l = [2sm(E_l − E₀)]^1/2/ℏ, where E₀ is the effective threshold energy. Since the latter is known only approximately, it is often treated as another of the model parameters to be determined by the fit. As usual, a smooth background contribution μ_back(k) is subtracted from μ_exp(k), which is obtained from a polynomial extrapolation of the pre-edge μ_exp to the post-edge region as described by Victoreen (1948 ) and Milledge (1962 ). Unfortunately, the precise extrapolation recipe influences the final fit parameters somewhat. The μ_exp(k) are measured with an energy-dependent efficiency A(k). As in Krappe & Rossner (2004 ), we obtain $[\overline{\mu}_{\rm exp}(k)]$ from μ_exp(k) by a polynomial smoothing procedure, described in detail in the appendix to Krappe & Rossner (2004 ). Similarly, $[\overline{\mu}_{0}(k)]$ is obtained from μ₀(k). The ratio A(k) = $[[\overline{\mu}_{\rm exp}(k)-\overline{\mu}_{\rm back}(k)]/\overline{\mu}_{0}(k)]$ for k > k_edge is then interpreted as the overall efficiency of the experimental setup in the EXAFS energy range k > k_cut.

The FEFF result for μ₀ needs corrections (Krappe & Rossner, 2004 ). We therefore write μ₀(k) = $[\mu_{0}^{(FEFF)}(k)]$ + $[\delta\mu_{0}(k)]$ , where δμ₀(k) is represented by a cubic spline on an equally spaced grid of support points, the number of which T is to be chosen to make the spline just sufficiently flexible for the purpose for which it is introduced (Krappe & Rossner, 2004 ). The ordinates δμ_t, t = 1, …, T are also treated as model parameters to be determined in the fit together with all other model parameters.

We have therefore to fit the function $[\eqalignno {\mu_{\rm exp}(k_{l}) & = \mu_{\rm back} (k_{l})+A(k_{l})[\mu_{0}^{(FEFF)}(k_{l}) \cr &\ \quad +\ \delta\mu_{0}(k_{l}\semi \delta\mu_{1},\ldots,\delta\mu_{T})][\chi(k_{l})+1]\cr & \equiv g({\bf x}) & (2)}]$ for l = 1, …, L, where χ is given for k > k_cut by equation (1 ). The set of model parameters is $[{\bf x}^{T} = (\delta\mu_{1},\ldots\delta\mu_{T},S_{0}^{2},E_{0},R_ {1},\ldots R_{J},\sigma^{2}_{1},\ldots\sigma^{2}_{J},C_{3\semi 1},\ldots C_{3\semi J}). \eqno (3)]$

3. Bayesian approach to an ill-posed inversion problem

We give the experimental data $[z_{l} = \chi_{\rm exp}^{2}(k_{l})]$ a Gaussian probability distribution $[P_{\rm exp}({\bf z})\propto(-\chi^{2}_{\rm exp}/2)]$ , characterized by the quadratic form $[\chi^{2}_{\rm exp} = \textstyle \sum\limits_{ll^{\prime}}(z-\overline{z})_{l}F_{ll^ {\prime}}(z-\overline{z})_{l^{\prime}}. \eqno (4)]$ It is usually assumed that the matrix F, which is the inverse of the variance matrix, is diagonal: $[F_{ll^{\prime}} = (\Delta z_{l})^{-2}\delta_{ll^{\prime}}]$ . One also has to associate errors with the FEFF code because the electron multiple-scattering problem can only be treated approximately, for instance by including in the sum in equation (1 ) only terms which contribute more than 4% of the total, and integrals have to be approximated by finite sums. We again associate with these errors a Gaussian probability that z′ is true for a given x, i.e. $[P_{\rm model}[{\bf z}^{\prime}|{\bf g}({\bf x)}]\propto\exp(-\chi^{2}_{\rm model}/2)]$ , with $[\chi^{2}_{\rm model} = [{\bf g}({\bf x})-{\bf z}^{\prime}]^{T}{\bf B}[{\bf g}({\bf x})-{\bf z}^{\prime}],]$ where the matrix B is the inverse of the variance matrix.

The conditional probability $[P_{\rm cond}]$ that the outcome of the observation is $[\overline{{\bf z}}]$ , once x is given, may be expressed in terms of $[P_{\rm exp}]$ and $[P_{\rm model}]$ by $[P_{\rm cond}(\overline{{\bf z}}|{\bf x}) \propto \textstyle \int P_{\rm exp}(\overline{{\bf z}}|{\bf z}^{\prime})P_{\rm model}[{\bf z}^{\prime}|{\bf g}({\bf x})]\,{\rm d}^{L}{\bf z}^{\prime}. \eqno (5)]$ The integral can be evaluated analytically and yields a Gaussian in g(x), $[P_{\rm cond}(\overline{{\bf z}}|{\bf x})\propto\exp(-\chi^{2}_{\rm cond}/2)]$ , with $[\chi^{2}_{\rm cond}[\overline{{\bf z}},{\bf g}({\bf x})] = [{\bf g}({\bf x})-{\overline{\bf z}}]^{\rm T}{\bf C}\,[{\bf g}({\bf x})-\overline{{\bf z}}] \eqno (6)]$ in terms of the L × L matrix $[{\bf C} = ({\bf F}^{-1}+{\bf B}^{-1})^{-1}. \eqno (7)]$

We expand g(x) around a first-guess value x⁽⁰⁾ for the solution of the inverse problem $[g_{l}({\bf x}) = g_{l}({\bf x}^{(0)})+\textstyle\sum\limits_{n}G_{ln}(x_{n}-x_{n}^{(0)}), \eqno (8)]$ where the L × N matrix G is defined as $[G_{ln} = \partial_{x_{n}}g_{l}({\bf x})|_{{\bf x} = {\bf x}^{(0)}}. \eqno (9)]$ Inserting into equation (6 ) and calling x − x⁽⁰⁾ in the following x to simplify the notation, one obtains a second-order polynomial in x, $[\chi^{2}_{\rm cond}({\bf x},{\overline{\bf z}}) = {\bf x}^{\rm T}{\bf Q}\,{\bf x}-2{\bf b}^{\rm T}{\bf x}+[{\overline{\bf z}}-{\bf g}({\bf x}^{(0)})]{\bf C}[{\overline{\bf z}}-{\bf g}({\bf x}^{(0)})] \eqno (10)]$ in terms of the N × N matrix Q = G^TCG and the vector $[{\bf b} = {\bf G}^{\rm T}{\bf C}[{\overline{\bf z}}-{\bf g}({\bf x}^{(0)})]]$ . The matrix Q is the inverse of the variance matrix of the distribution P_cond in terms of the variable x.

In order to find the probability distribution for the parameter values x, once the $[{\overline{\bf z}}]$ are given, $[P_{\rm post}({\bf x}|{\overline{\bf z}})]$ , we use Bayes' theorem $[P_{\rm post}({\bf x}|{\overline{\bf z}})\propto P_{\rm cond}({\overline{\bf z}}|{\bf x})P_{\rm prior}({\bf x}). \eqno(11)]$ Bayes' theorem therefore solves the inversion problem in probability theory, but at the price of introducing the prior probability P_prior(x), which expresses the knowledge that we have about the model parameters before the experiment is made. Let us assume for the moment that we have an average value x^(prior) and a variance matrix A⁻¹ so that $[P_{\rm prior}\propto\exp[-\chi^{2}_{\rm prior}({\bf x})/2]]$ with $[\chi^{2}_{\rm prior} = ({\bf x}-{\bf x}^{\rm (prior)})^{T}{\bf A}\,({\bf x}-{\bf x}^{\rm (prior)})\eqno(12)]$ and let us further restrict the matrix A to be diagonal, $[A_{nn^{\prime}} = \alpha_{n}\delta_{nn^{\prime}}]$ and choose x^(prior) = x⁽⁰⁾. Maximizing P_post yields the normal equations $[(1/2)\partial_{{\bf x}}\chi^{2}_{\rm post} = ({\bf Q}+{\bf A})\,{ \bf x}-{\bf b} = 0.\eqno(13)]$

4. Turchin's proposal to determine the matrix A

An optimal choice of the diagonal matrix A must obviously take the quality of the data into account. Turchin and Nozik (Turchin & Nozik, 1969 ; Turchin et al., 1970 ; Turchin, 1985 ) assume that there is a probability distribution of α_n which depends on $[{\overline{\bf z}}]$ . They first define the conditional probability $[\eqalignno {P({\overline{\bf z}}|\boldalpha ) & = \textstyle \int P_{\rm cond}({\overline{\bf z}}|{\bf x})P_{\rm prior}({\bf x}\semi \boldalpha )\,{\rm d}^{N}{\bf x}\cr & = c\cdot[\det{\bf A}/\det({\bf Q}+{\bf A})]^{1/2}\cdot\exp[(1/2){\bf b}^{\rm T}({\bf Q}+{\bf A})^{-1}{\bf b}],\cr&&(14)}]$ where equations (10) and (12) have been used to obtain the last equation. The normalization parameter c of this equation only contains terms that do not depend on A, Q or b. The dependence on the matrices A and Q is shown explicitly, including contributions from the normalization factors of P_cond and P_prior.

However, instead of $[P({\overline{\bf z}}|\boldalpha)]$ the inverse conditional probability $[P(\boldalpha|{\overline{\bf z}})]$ is needed. It is obtained by using Bayes' theorem once more: $[P(\boldalpha|{\overline{\bf z}})\propto P({\overline{\bf z}}| \boldalpha)P^{\prime}_{\rm prior}(\boldalpha).]$ Very often the function $[P({\overline{\bf z}}|\boldalpha)]$ defined in equation (14) is sharply peaked in α-space at a point $[\boldalpha^*]$ . One can then choose $[P^{\prime}_{\rm prior}]$ very broadly without affecting the α dependence of $[P(\boldalpha|{\overline{\bf z}})]$ around the peak. Therefore, close to $[\boldalpha^*]$ one has $[P(\boldalpha^{*}|{\overline{\bf z}}) = P({\overline{\bf z}}|\boldalpha^{*})]$ . One may use this peak value $[\boldalpha^*]$ as the regularization vector in equation (13 ). With the condition $[\partial_{\boldalpha}P({\overline{\bf z}}|\boldalpha) = 0]$ , equation (14) yields N nonlinear equations for the vector of eigenvalues $[\boldalpha^*]$ : $[(A^{-1})_{nn}-([Q+A]^{-1})_{nn}-\left[\textstyle\sum\limits_{n^{\prime}}(Q+A)^{-1}_{nn^{\prime}}b_{n^{\prime}}\right]^{2} = 0 \eqno (15)]$ for n = 1, …, N.

Note that the regularization method sketched above does not require an a priori restriction of the number of model parameters. Instead, it automatically determines that subspace $[{\cal R}]$ of the whole model-parameter space $[{\cal Q}]$ in which the data determine the outcome of the fit. In the complementary space the a priori values $[x_{n}^{(0)}]$ determine the fit. Strong error correlations between two model parameters indicate that the data do not determine them independently.

More extended versions of this article, which includes applications to some typical EXAFS and magnetic EXAFS examples, can be found in Krappe & Rossner (2004 ) and Krappe et al. (2014 ).

References

Ankudinov, A. L. (1996). PhD thesis. University of Washington, USA.Google Scholar

Bunker, G. (1983). Nucl. Instrum. Methods Phys. Res. 207, 437–444.Google Scholar

Fornasini, P., a Beccara, S., Dalba, G., Grisenti, R., Sanson, A., Vaccari, M. & Rocca, F. (2004). Phys. Rev. B, 70, 174301.Google Scholar

Kas, J. J., Vila, F. D. & Rehr, J. J. (2024). Int. Tables Crystallogr. I, ch. 6.8, 764–769 .Google Scholar

Krappe, H. J., Holub-Krappe, E., Konishi, T. & Rossner, H. H. (2014). XAS Research Review, Vol. 13. The International X-ray Absorption Society.Google Scholar

Krappe, H. J. & Rossner, H. H. (1985a). Advanced Methods in the Evaluation of Nuclear Scattering Data, edited by H. J. Krappe & R. Lipperheide, pp. 215–222. Berlin, Heidelberg: Springer-Verlag.Google Scholar

Krappe, H. J. & Rossner, H. H. (1985b). Advanced Methods in the Evaluation of Nuclear Scattering Data, edited by H. J. Krappe & R. Lipperheide, pp. 242–248. Berlin, Heidelberg: Springer-Verlag.Google Scholar

Krappe, H. J. & Rossner, H. H. (2004). Phys. Rev. B, 70, 104102.Google Scholar

Milledge, H. J. (1962). International Tables for X-ray Crystallography, Volume III, edited by C. H. MacGillavry & G. D. Rieck, pp. 171–173. Birmingham: The Kynoch Press.Google Scholar

Rehr, J. J. & Albers, R. C. (2000). Rev. Mod. Phys. 72, 621–654.Google Scholar

Rossner, H. H., Schmitz, D., Imperia, P., Krappe, H. J. & Rehr, J. J. (2006). Phys. Rev. B, 74, 134107.Google Scholar

Stern, E. A. (1988). X-ray Absorption: Principles, Applications, Techniques of EXAFS, SEXAFS and XANES, edited by D. C. Koningsberger & R. Prins, pp. 3–52. New York: John Wiley & Sons.Google Scholar

Tröger, L., Yokoyama, T., Arvanitis, D., Lederer, T., Tischer, M. & Baberschke, K. (1994). Phys. Rev. B, 49, 888–903.Google Scholar

Turchin, V. F. (1985). Advanced Methods in the Evaluation of Nuclear Scattering Data, edited by H. J. Krappe & R. Lipperheide, pp. 33–49. Berlin, Heidelberg: Springer-Verlag.Google Scholar

Turchin, V. F., Kozlov, V. P. & Malkevich, M. S. (1970). Usp. Fiz. Nauk, 102, 345–386.Google Scholar

Turchin, V. F. & Nozik, V. Z. (1969). Izv. Akad. Nauk. SSSR Ser. Fiz. Atm. Okeana, 5, 29.Google Scholar

Victoreen, J. A. (1948). J. Appl. Phys. 19, 855–860.Google Scholar

Zabinsky, S. I., Rehr, J. J., Ankudinov, A., Albers, R. C. & Eller, M. J. (1995). Phys. Rev. B, 52, 2995–3009.Google Scholar

International Tables for Crystallography (2024). Vol. I. ch. 5.15, pp. 702-704
https://doi.org/10.1107/S1574870722005511