Statistical measures of confidence

Booth, C. H.

doi:10.1107/S1574870720007296

RELATED SITES: IUCr | IUCr Journals

International
Tables for
Crystallography
Volume I
X-ray absorption spectroscopy and related techniques
Edited by C. T. Chantler, F. Boscherini and B. Bunker

International Tables for Crystallography (2024). Vol. I. ch. 5.8, pp. 672-675
https://doi.org/10.1107/S1574870720007296

Chapter 5.8. Statistical measures of confidence

Corwin H. Booth^a ^*

^aLawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
Correspondence e-mail: [email protected]

The effect of random errors in EXAFS data are considered when estimating fit-parameter uncertainties. Reviewed methods include conventional χ² minimization and the F-test.

Keywords: accuracy; errors.

1. Introduction

The EXAFS technique is not a count-rate-limited technique; that is, even if one could collect an infinite number of photons, there would still be limitations on the accuracy of the measurement. Although this is ultimately the case due to limitations of sample preparation (Bridges, 2024 ), beam issues (for example harmonic contamination; Chantler, 2024 ) and even errors in the lineshapes used for fitting (Booth, 2024 ), there are many cases where a given measurement (as opposed to the technique) is limited by the number of photons collected simply because the absorbing species in a sample is dilute. There are other cases where a source of error is partially random, such as time-dependent beam-stability issues coupled with sample non-uniformity. In cases such as these, a proper treatment of the random (assuming normally distributed) errors in EXAFS can, in fact, give an accurate estimate of the uncertainties (standard deviations) s_j in the parameters p_j used to determine the fit function f_i at each data point i. The discussion below focuses on conventional error-analysis techniques as applied to EXAFS, much of which is discussed more completely in the literature (see, for example Filipponi, 1995 ). Note that other methods are in use, most notably those utilizing reverse Monte Carlo techniques (Gurman & McGreevy, 1990 ; Curis & Bénazeth, 2005 ).

Assuming that an estimate of the variance $[e_{i}^{2}]$ on a given measurement x_i is representative of the actual variance of the distribution of x_i, one may calculate the statistical χ², $[\chi^{2} = {{n_{\rm ind}} \over {n_{\rm tot}}}{\textstyle \sum \limits_{i=1}^{n_{\rm tot}}} {{(x_{i}-f_{i})^{2}} \over {e_{i}^{2}}}, \eqno (1)]$ where the usual formula for χ² has been scaled to account for the typical issue in spectroscopy that the total number of data points n_tot is not necessarily the same as the number of independent data points n_ind, as will be shown below. With this definition, one may use standard techniques for error analysis, where χ²/ν → 1 for large degrees of freedom, ν = n_ind − n_par, and n_par is the number of parameters used in the fit. In this section, ways of estimating the uncertainties from random errors will be presented. Since determining errors requires an understanding of the information content in EXAFS spectra to a certain degree, some Fourier transform concepts will also be presented.

2. Information content

Typical EXAFS fitting techniques are limited in their information content both by the range in k of the data themselves and by the range in r of any model fitted to these data. This limitation exists whether a spectrum is fitted in k-space or r-space, since data are effectively limited in their r range by the number of scattering shells used in a given model. When fitting in r-space with defined ranges in k and r, the number of independent data points n_ind is $[n_{\rm ind} = 2{{\Delta_{k}\Delta _{r}} \over {\pi}}+2, \eqno (2)]$ where the fit ranges are inclusive; that is, the 2 accounts for the data points at the beginning and the end of the fit range. This formula is a variant of the Shannon–Nyquist theorem, but is generally referred to in EXAFS work as `Stern's rule' (Stern, 1993 ), and has also been verified in simulations (Booth & Hu, 2009 ).

The limitation on n_ind is in fact a mathematical identity, a fact that is sometimes misunderstood due to the practice of interpolating data onto a fixed grid in k-space with a constant separation between data points δ_k, which is a requirement of the fast Fourier transform (FFT) method (Cooley & Tukey, 1965 ; Gauss, 1866 ). When applying a particular model to the fit of an EXAFS spectrum, any statistical estimate of the fit-parameter uncertainties in that model depends on not trying to fit to data with more fitting parameters than independent data points. In otherwords, the degrees of freedom ν of a fit should not be negative if the fit is to be unique.

A corollary to the number of independent data points is the limitation on the number of fit parameters within any particular r range within the total fit range. This limitation is more difficult to quantify owing to the extended range in r of any specific scattering peak, so the usual rule of thumb is to not attempt to fit two scattering shells with equal coordination and pair-distribution widths if they are close enough together that the first beat frequency is outside the fit range, otherwise the difference in pair distance, δ_r = r₂ − r₁, will be too strongly correlated to the Debye–Waller factors. This limit occurs when $[\delta_{r} = {{\pi} \over {2k_{\max}}}, \eqno(3)]$ where k_max is the upper limit of the fit range. In practice, it is important to realize that this limit is a best-case scenario and that, for instance, if the coordination numbers of the two proposed shells are not equal, the correlation between N, r and σ will be stronger. These correlations should be observable in a conventional error analysis.

3. Estimates of parameter errors from multiple spectra

Bearing in mind the principles in Section 2 when constructing a fit model, the best and easiest method for determining truly random uncertainties of the final fit parameters is to make multiple fits to many spectra and to average the parameter p_j results to determine s_j. This method has real advantages over other methods relying on a single spectrum because the uncertainties in the background functions, especially the so-called `post-edge' atomic background absoption function μ₀(E), can be better included. For instance, one might mistakenly assume that random noise is strictly proportional to the square root of the collected number of photons and that, therefore, relative uncertainties in χ increase with k; however, errors in μ₀(E) are generally largest at low k due to background-subtraction issues. Such errors also translate to larger errors at low r, as depicted in Fig. 1 . It is important to understand that errors determined in this way still do not account for systematic errors or for errors that are not normally distributed.

Figure 1

(a) Fit results for sample copper-foil data. Data were transformed between 2.5 and 15.8 Å⁻¹ with a 0.3 Å⁻¹ wide Gaussian window. Data are from the average of eight scans. Error bars [difficult to discern in this plot, see (b)] were determined from the standard deviation of the mean (sdom) of these scans. (b) Estimated error (sdom) of the modulus in (a).

4. Estimates of parameter errors utilizing χ² methods

While sophisticated approaches to error estimates in EXAFS data analysis based on a Bayesian formalism that generalizes the least-squares method in multi-parameter space have been illustrated (Krappe & Rossner, 2000 ), conventional χ² methods remain the dominant tool for determining the fit-parameter errors s_j.

In order to perform a standard statistical χ² analysis to obtain s_j, one first needs to have some estimate of e_i. A good method is to average many spectra and determine the standard deviation of the mean as an estimate of e_i at each value of k or, alternatively, to do the same on the real and imaginary parts of the Fourier transform in r-space, as depicted in Fig. 1 . One may also propogate the k-space-measured errors into the Fourier transform (Curis & Bénazeth, 2000 ). Unfortunately, most fitting routines do not allow an estimated error as a function of k or r. This limitation is not considered to be very significant, since systematic sources of error generally make estimates of parameter uncertainties somewhat unreliable. An alternate method is to estimate data uncertainties by the magnitude average over some range in r at high r of a Fourier transform where the oscillations in χ(k) have presumably been overwhelmed by multiple-scattering interference and the factor of 1/r² in the EXAFS equation. This estimate does not account for the larger errors due to background corrections, but can still be an overestimate in some spectra because EXAFS oscillations from the structure may still contribute. A copper-foil spectrum is an extreme example, where for a single spectrum for the data in Fig. 1 one would estimate an uncertainty of about e ≃ 1 in these k³-weighted units from the high-r data, whereas the standard deviation (sd) between multiple spectra in the typical fit range between about 1 and 5 Å is about sd ≳ (8)^1/2 × 0.02 = 0.06, given the approximate average in the fit range of 0.02 from the figure and the eight measurements that were averaged.

Once the uncertainty per data point has been determined, one may perform a standard error estimate using statistical χ² minimization methods. Once one has obtained the best fit, for instance by using the Levenberg–Marquadt method, the parameter errors s_j are typically obtained from the diagonal elements of the inverse curvature (Hessian or approximate Hessian) matrix, which gives the covariance matrix (Press et al., 1992 ). This method works well for high-quality fits and variations of χ²(p_j) around its minimum where the parameters p_j = p_j0 are well behaved, i.e. χ²(p_j) is at a global minimum and is a quadratic function in the vicinity of p_j0. However, this method can underestimate parameter uncertainties in many EXAFS models because the models and parameter correlations are often such that χ²(p_j) is not well approximated around its minimum by the low-order Taylor expansion $[\chi^{2}(p_{j}^{\prime})\simeq\chi^{2}(p_{j0})+\left.{{\partial^{2}\chi^{2}} \over {\partial^{2}p_{j}}}\right|_{p_{{j0}}}(p_{j}^{\prime}-p_{j0})^{2}. ]$ A clear example occurs for moderately large values of a Debye–Waller factor parameter σ_j, which necessarily has asymetric uncertainties around σ_j0. In these cases, a profiling method may be used that is less sensitive to the expansion approximation above, where one first determines the best-fit parameter p_j0 and then determines the error around it by freezing all p_k≠j = p_k0 and varying p_j around p_j0 until $[\chi^{2}/\nu = \chi^{2}_{0}+1]$ (Arndt & MacGregor, 1966 ; Bevington & Robinson, 1992 ), at which point the standard deviation for p_j0 is $[s_{{j0}} = p_{j}|_{{\chi^{2} = \chi^{2}_{0}+1}}-p_{{j0}}. \eqno (4)]$ This method naturally allows the determination of both the positive and negative uncertainties around a given p_j0 and accounts for parameter correlations.

5. Using the F-test for model evaluation

As described above, the estimation of uncertainties in the data is critical to determining parameter uncertainties in experiments limited by random variations in the data. If one is trying to determine whether one fitting model is statistically more significant than another, the χ² test is best eschewed for the F-test (Bevington & Robinson, 1992 ) in EXAFS methodologies (Joyner et al., 1987 ; Freund, 1991 ; Michalowicz et al., 1999 ; Klementev, 2001 ; Piazza, 2002 ; Downward et al., 2007 ), which is not as dependent on explicitly determining the data errors, although it still assumes they are normally distributed. F is defined as $[F = {{\chi_{1}^{2}/\nu_{1}} \over {\chi_{0}^{2}/\nu _{0}}},\eqno (5)]$ where the subscript 0 denotes the better of the two fits. The assumption in equation (4) is that the fit parameters are independent between ν₀ and ν₁. A formulation that is commonly applied in crystallography (Hamilton, 1965 ) considers that only some of the parameters are actually different; that is, model 1 is nested within model 0 (for instance, includes an additional scattering shell): $[F = {{(\chi_{1}^{2}-\chi_{0}^{2})/(\nu_{1}-\nu_{0})} \over {\chi_{0}^{2}/\nu_{0}}}. \eqno (6)]$ Since the estimated data errors are the same for $[\chi _{0}^{2}]$ and $[\chi _{1}^{2}]$ and they approximately cancel in F, the fit residual $[\cal R]$ can be used instead as long as $[{\cal R}^{2}]$ is defined such that $[{\cal R}^{2}\propto\chi^{2}]$ in equations (4) and (5). Note that some fitting routines, in particular IFEFFIT, define $[\cal R]$ as the square of the definition used here, that is, $[{\cal R}_{{\rm ifeffit}}\propto\chi^{2}]$ . An alternative formulation (Hamilton, 1965 ) that makes explicit the number of degrees of freedom that have changed between the two fit models utilizes n_ind, the number of fit parameters in the better fit m, and b = ν₁ − ν₀ or the number of fit parameters that have changed, depending on the situation: $[F = \left[\left({{{\cal R}_{1}} \over {{\cal R}_{0}}}\right)^{2}-1\right]{{(n_{\rm ind}-m)} \over {b}}. \eqno (7)]$ Once F has been defined, one can calculate the probability α that the experimentally determined $[F_{b,n_{\rm ind}-m}]$ is actually smaller than the actual F distribution for the model to give the degree of confidence of the fit (Bacchi et al., 1996 ), $[\alpha = 1-I_{x}\left({{n_{\rm ind}-m} \over {2}},{{b} \over {2}}\right), \eqno (8)]$ where $[I_{x}\left({{n-m} \over {2}},{{b} \over {2}}\right)]$ is the incomplete beta function and $[x = {{n-m} \over {n-m+bF}} = \left({{{\cal R}_{0}}\over{{\cal R}_{1}}}\right)^{2}.]$ For $[{\cal R}_{0}]$ to represent a significantly better fit than $[{\cal R}_{1}]$ , α needs to be greater than 67%, and is generally not said to have passed the F-test until α ≥ 95%.

A simple example of applying the F-test is in testing whether adding a scattering shell has a significant effect on the fit. Assuming a fit with m = 7 (a two-shell fit with a single ΔE₀ and each shell having individual N, r and σ parameters), b = 3 (testing whether the second shell is necessary), n_ind = 20, $[{\cal R}_{0} = 5\%]$ and $[{\cal R}_{1} = 10\%]$ gives α = 99.9%, which passes the F-test. Other examples of applying these equations to EXAFS are given by Downward et al. (2007 ).

Funding information

This work was supported by the US Department of Energy (DOE), Office of Science (OS), Office of Basic Energy Sciences (OBES) under Contract No. DE-AC02-05CH1123.

References

Arndt, R. A. & MacGregor, M. H. (1966). Methods Comput. Phys. 6, 253.Google Scholar

Bacchi, A., Lamzin, V. S. & Wilson, K. S. (1996). Acta Cryst. D52, 641–646.Google Scholar

Bevington, P. R. & Robinson, D. K. (1992). Data Reduction and Error Analysis for the Physical Sciences, 2nd ed., ch. 11. Boston: WBC/McGraw-Hill.Google Scholar

Booth, C. H. (2024). Int. Tables Crystallogr. I, ch. 5.9, 676–677 .Google Scholar

Booth, C. H. & Hu, Y.-J. (2009). J. Phys. Conf. Ser. 190, 012028.Google Scholar

Bridges, F. (2024). Int. Tables Crystallogr. I, ch. 3.13, 370–374 .Google Scholar

Chantler, C. T. (2024). Int. Tables Crystallogr. I, ch. 3.38, 537–538 .Google Scholar

Cooley, J. W. & Tukey, J. W. (1965). Math. Comput. 19, 297.Google Scholar

Curis, E. & Bénazeth, S. (2000). J. Synchrotron Rad. 7, 262–266.Google Scholar

Curis, E. & Bénazeth, S. (2005). J. Synchrotron Rad. 12, 361–373.Google Scholar

Downward, L., Booth, C. H., Lukens, W. W. & Bridges, F. (2007). AIP Conf. Proc. 882, 129–131.Google Scholar

Filipponi, A. (1995). J. Phys. Condens. Matter, 7, 9343–9356.Google Scholar

Freund, J. (1991). Phys. Lett. A, 157, 256–260.Google Scholar

Gauss, C. F. (1866). Werke, Vol. 3, pp. 265–320. Göttingen: Akademie der Wissenschaften.Google Scholar

Gurman, S. J. & McGreevy, R. L. (1990). J. Phys. Condens. Matter, 2, 9463–9473.Google Scholar

Hamilton, W. C. (1965). Acta Cryst. 18, 502–510.Google Scholar

Joyner, R. W., Martin, K. J. & Meehan, P. (1987). J. Phys. C Solid State Phys. 20, 4005–4012.Google Scholar

Klementev, K. V. (2001). Nucl. Instrum. Methods Phys. Res. A, 470, 310–314.Google Scholar

Krappe, H. J. & Rossner, H. H. (2000). Phys. Rev. B, 61, 6596–6610.Google Scholar

Michalowicz, A., Provost, K., Laruelle, S., Mimouni, A. & Vlaic, G. (1999). J. Synchrotron Rad. 6, 233–235.Google Scholar

Piazza, F. (2002). J. Phys. Condens. Matter, 14, 11623–11634.Google Scholar

Press, W. H., Teukolsky, S. A., Vetterling, W. T. & Flannery, B. P. (1992). Numerical Recipes in Fortran 77: The Art of Scientific Computing, 2nd ed., ch. 15. Cambridge University Press.Google Scholar

Stern, E. A. (1993). Phys. Rev. B, 48, 9825–9827.Google Scholar