International
Tables for
Crystallography
Volume I
X-ray absorption spectroscopy and related techniques
Edited by C. T. Chantler, F. Boscherini and B. Bunker

International Tables for Crystallography (2024). Vol. I. ch. 6.18, pp. 816-821
https://doi.org/10.1107/S1574870720003432

Chapter 6.18. PrestoPronto: a software package for large EXAFS data sets

Carmelo Prestipinoa*

aUniversity of Rennes 1, CNRS, 263 Avenue du Général Leclerc, 35042 Rennes, France
Correspondence e-mail: [email protected]

PrestoPronto is a free and open-source collection of programs aimed at performing X-ray absorption spectroscopy analysis of large data sets. The software is composed of three programs: (i) PrestoPronto_GUI, which imports spectra in various formats, pre-processes data and performs classical data analysis, (ii) LinearCom_GUI, which performs linear combination analysis, and (iii) PCA_GUI, which performs principal component analysis. All parts of the software package include practical and intuitive graphical user interfaces (GUIs) with plotting capabilities. The main objective of these programs is to quickly and interactively monitor time-resolved experiments from Quick-EXAFS (QEXAFS) and dispersive EXAFS beamlines.

Keywords: Quick-EXAFS; PrestoPronto; dispersive EXAFS.

1. Introduction

Since the early development of rapid-acquisition X-ray absorption spectroscopy, such as dispersive EXAFS (Kaminaga et al., 1981link to reference; Matsushita & Phizackerley, 1981link to reference) and Quick-EXAFS (QEXAFS; Frahm, 1988link to reference; Stötzel et al., 2010link to reference), time-resolved experiments have become more and more widespread. Modern EXAFS beamlines now implement acquisitions methods with a time resolution spanning from minutes to seconds or even lower (a few milliseconds) on beamlines equipped with dedicated optics (Müller et al., 2016link to reference).

With the availability of such facilities and the high photon flux delivered by third-generation synchrotron sources, the acquisition of hundreds or thousands of spectra in each experiment is usual, indicating the utility of a tool to efficiently follow the modification of the spectra in real time, without which the analysis can become tedious and time-consuming.

These reasons motivated the development of PrestoPronto (from the Italian for `soon ready'), interactive software with a graphical user interface (GUI) for the analysis of large data sets (hundreds or thousands of spectra), as described in the present contribution. Several codes for analysis have already been developed with the same objective (San-Miguel, 1995link to reference; Ressler, 1998link to reference; Sanchez del Rio & Dejus, 2004link to reference; Stötzel et al., 2012link to reference), and the use of macro languages such as IFEFFIT or LARCH (Newville, 2001blink to reference, 2012link to reference; Newville & Ravel, 2024link to reference) can help in the analysis of large data sets. However, no open-source code in a high-level programming language is available, while the learning curve for analysis packages based on macro languages is rather steep and relatively time-consuming for users without coding experience.

The main goal of the PrestoPronto set of programs is to cover the requirement to screen and analyze data in real time, dealing with numerous sets of spectra in a quick, intuitive and easy way, using the algorithms most commonly used by the XAFS community, while maintaining the possibility of customizing the code for specific requirements. The interfaces of the codes have been designed ad hoc to deal with large data sets, allowing the user to rapidly evaluate the parameters related to the implemented algorithms and to efficiently plot data-analysis results.

The PrestoPronto package has already been used in several studies (Bahout et al., 2012link to reference; Achilli et al., 2014link to reference; Moog et al., 2014link to reference; Zuvich et al., 2014link to reference; Nilsson et al., 2015link to reference). It is free and open source and can be adapted to specific requirements such as different data formats or experimental setups or different types of analysis.

2. Structure of the code

The code is written in Python 2.7 using several common libraries: the themed Tk/tkinter graphical toolkit for user interfaces, Matplotlib (Hunter, 2007link to reference) for plotting and NumPy (Walt et al., 2011link to reference) for the computation of matrices and arrays. For more XAFS-specific algorithms, the software is mainly built on the methods and functions coded in LARCH, the successor to the IFEFFIT macro language (Newville, 2001blink to reference, 2012link to reference; Newville & Ravel, 2024link to reference). The sequence of spectra is stored in a modified Python list, in which each element is a class containing a spectrum with associated methods and properties. The codes are not fully parallelized: the sequential fit approach (see below) implies that parallelization must be implemented at the array-computation level, with a consequent complexity of optimization. At the moment, only the few parts of the codes that use NumPy operations based on Basic Linear Algebra Subroutines (BLAS) can run in parallel; further parallelization is dependent on future development of NumPy.

In order to improve the user-interaction and data-processing experiences, the codes upload the entire sequence of spectra into random access memory (RAM) and create new arrays at each analysis step. This approach implies that the code is limited to handling data sets smaller than the available memory. Although in conventional QEXAFS, with a time resolution of seconds or minutes, this is not a practical limitation (the program currently works with a 1300 × 1050 data matrix), it is worth noting that for a time resolution lower than 100 ms, as attainable using the most time-resolved optimized beamlines (Müller et al., 2016link to reference; Pascarelli et al., 2016link to reference), this limitation could be reached in experiments of longer than an hour. The software package is composed of three programs: PrestoPronto_GUI, Linearcomb_GUI and PCA_GUI.

3. PrestoPronto_GUI

This module is the core of the package. PrestoPronto_GUI uploads a sequence of spectra into memory, performs the calibration, averages and resamples the data and finally performs an XAS analysis.

At the start of the program, a GUI appears as a single notebook window (Fig. 1link to figure) composed of six panes corresponding to the main steps of data analysis (Data input, Averages, XANES, EXAFS-FT, Fit and Attributes) and a terminal window used to communicate error detection. Each pane allows a set of parameters for the corresponding analysis step to be defined, the data to be processed and the results to be saved as image or ASCII files. In order to avoid multiple data processing, in each pane the parameters can be optimized in dedicated interfaces, in which the parameter values can be selected directly on the plot and applied interactively to the different spectra just by moving a slider, as shown in Fig. 2link to figure for EXAFS extraction.

[Figure 1]

Figure 1

Appearance of PrestoPronto_GUI. (a) Data input tab. (b) Plot windows with shift slider.

[Figure 2]

Figure 2

EXAFS extraction parameter window. A parameter can be changed by dragging the vertical lines to directly evaluate the effect on the background, the χ(k) function and its Fourier transform. The result of signal extraction with the selected set of parameters on the different spectra composing the series can be evaluated by moving the spectra slider.

In the `Data input' tab, QEXAFS or dispersive EXAFS data are imported as a sequence of multicolumn ASCII files, each containing the energy and detector readings for one spectrum. Possible energy shifts can be corrected if a reference spectrum has been collected simultaneously. In the interface, it is possible to associate each spectrum with several supplementary attributes, for example temperature, valve status or current intensity, if the corresponding numerical values are present in the original data files. Alternatively, spectra can also be read from a multicolumn ASCII file containing a common energy column and a series of spectra.

The second tab, `Averages', is dedicated to the binning and averaging of the spectra. This operation represents one of the most important steps in QEXAFS data analysis. Effectively, the energy sampling is typically defined before the experiment by the optimal energy resolution in the XANES region, i.e. below the core-hole width, resulting in an oversampling of the EXAFS region (Bunker, 2010link to reference). Moreover, if the evolution kinetics of the studied process is unknown a priori, the acquisition time adopted is generally also shorter than the optimal time. As a consequence, the data resulting from QEXAFS experiments are very often oversampled. The role of the algorithms implemented in this tab is to reduce the size of the data matrix without a significant loss of information, decreasing computation times and improving the quality of the results. This task is performed by the interpolation of each spectrum in photoelectron wavevector units and by reducing the number of spectra, averaging the data along the time coordinate. During this step, supplementary attributes are also averaged in order to maintain the same time resolution of the data set. The attribute arrays averaged consistently as for the data are available in the `Attributes' tab, where they can be visualized or saved as ASCII files.

As previously mentioned, one of the main goals of PrestoPronto_GUI is to rapidly monitor the evolution of spectra during time-resolved XAFS measurements. This task is achieved by using the `XANES' and `EXAFS-FT' tabs to perform a qualitative data analysis. In the `XANES' tab, the spectra are background-subtracted and normalized. It is possible to plot and save the evolution of the edge jump, the position of the first-derivative maximum and the normalized XANES integral in a defined energy range along the set. These features are quite effective during experiments, showing the investigated process kinetics or allowing systematic errors such as sample movements to be recognized. The `EXAFS-FT' tab is devoted to EXAFS signal extraction and calculation of the Fourier transform using the XAFS-specialized functions AUTOBK (Newville et al., 1993link to reference) and XFTF imported from LARCH (Newville, 2012link to reference). The choice of function parameters is performed graphically and interactively, as shown in Fig. 2link to figure for EXAFS extraction; the interface maintains the same names for parameters as the widely known ATHENA software (Ravel & Newville, 2005link to reference, 2024link to reference).

Finally, the `Fit' tab is a GUI for the FEFFIT function (Newville, 2001alink to reference) of the LARCH engine that models the experimental kwχ(k) signal as a sum of paths. The phase and amplitude for each path can be calculated by means of an internal interface to the FEFF6l code (Rehr et al., 1992link to reference) for the first coordination shell in the more common geometries, or can be imported by reading an feffn.dat file calculated by more recent versions of the FEFF code (Rehr & Albers, 2000link to reference; Kas et al., 2024link to reference). For each path, it is possible to fit the coordination number (n), path length (r), mean-square displacement (ss) and energy shift (e0). In order to improve the rate of convergence, each spectrum is fitted sequentially, i.e. the refined values of a spectrum are used as starting-parameter values for the fit of the next spectrum in the series. At the end of the sequential fit, the evolution of the agreement factors, the refined parameter values and their evaluated errors can be plotted and saved.

4. PCA_GUI

This module is devoted to principal component analysis (PCA) and accepts as input a multicolumn ASCII file such as those saved by PrestoPronto_GUI. PCA is a robust linear algebra method that is able to determine the number of independent components in a series of spectra, assuming that each data point can be expressed as a linear sum of product functions (Gemperline, 2006link to reference), as described in equation (1)link to equation, in which A is an n × m data matrix (n rows of spectra recorded at m energy points), Tk is the n × k matrix of principal component concentration and Vk is the m × k matrix of principal components:Mathematical equation

The method is based on the diagonalization of the variance–covariance matrix [Z(m×m) = ATA] of the data set, as shown in equation (2)link to equation. If the data set and the calculation are free of errors, then the number of principal components k, i.e. the primary rank of the matrix, is equal to the number of eigenvalues that differ from zero in the D matrix (Harman, 1976link to reference), and the corresponding eigenvectors are contained in the V matrix. The Tk matrix is calculated in accordance with equation (3)link to equation.Mathematical equationMathematical equation

However, experimental data are affected by statistical and systematic errors and the calculations are subjected to a numerical roundoff; therefore, diagonalization of the Z(m×m) matrix always produces m nonzero eigenvalues. Consequently, determining the number of principal components, i.e. discriminating between significant and negligible eigenvalues, is no longer a trivial task, but is the first and most important step in PCA and derived methods.

Numerous methods devoted to this task have been developed, as described in Malinowski (2002link to reference); however, it is worth noting that the use of different methods provides different results for the same data matrix and it is not apparent which one is the best (Gemperline, 2006link to reference). The PCA_GUI code has implemented the IND function (Malinowski, 1977link to reference), the significance level F-test (%SL) of the reduced eigenvalue function (REV; Malinowski, 1989link to reference), the residual standard deviation (RSD) and the method of determination of rank by median absolute deviation (DRMAD; Malinowski, 2009link to reference), as visible in the interface table in Fig. 3link to figure(a).

[Figure 3]

Figure 3

PCA_GUI interface. (a) PCA result tab. The table at the top shows the values of the eigenvalues, REV, IND, %SL, RSD and DRMAD for the first ten eigenvectors. The buttons at the bottom are related to the plot of the eigenvectors, the reconstructed spectra and their corresponding misfitting along the data set. (b) Example of a plot of the evolution of the residuals along the data set.

However, in the opinion of the author, determination of the rank of the data set should not be limited to only numerical analysis. All a priori knowledge of the studied system and a careful visual inspection of the data should be taken into account together with numerical analysis. The plotting capabilities of the program allow three rapid tests to be performed to help in evaluation of the primary rank.

(i) The evolution of the concentration of principal components (the Tk columns) should be in reasonable agreement with the expected kinetics of the investigated process. Spurious components tend to vary very rapidly.

(ii) Only improvement of the description of a meaningful spectroscopic feature along the data set can justify an increase in the rank. Components related only by small intensity changes of previously described features can be caused by imperfect normalization or background subtraction.

(iii) The residual between the experimental and reconstructed data should be constant or vary only as a function of the parameter investigated during the experiment. In the case shown in Fig. 3link to figure, it is clearly visible that the residual increases between the 60th and 90th spectra, and consequently an increase in the rank should be considered.

Generally, the components found by PCA are not physically meaningful spectra, but a linear combination thereof. To obtain the real spectra, it is necessary to perform a rotation in the vector space defined by the eigenvectors. To perform this task, PCA_GUI implements iterative transformation factor analysis (ITFA), a technique of self-modelling curve resolution (Fernandez-Garcia et al., 1995link to reference; Rossberg et al., 2003link to reference).

The first step of the method is to define an approximate concentration profile matrix Tks. A widespread approach, it makes use of the vectors obtained by a varimax rotation of the concentration matrix (Vandeginste et al., 1985link to reference). Such a rotation maximizes the sum of the squared factor loadings, i.e. it confines each component to only a few ranges at high concentration, reducing their concentration to zero or low values in other ranges. However, as suggested by Fernandez-Garcia et al. (1995link to reference), in order to reduce the weight of the arbitrary choice of the type of starting rotation, PCA_GUI uses as an approximate concentration profile a zero matrix in which the elements corresponding to the maximum concentration of the profile matrix obtained by varimax rotation have been replaced with 1. Such a matrix is generally called a `needle matrix'.

The second step of the method consists of performing an iterative projection of the approximate concentration profiles in the space of the concentrations in accordance with Mathematical equation

From a purely mathematical point of view, the projected target always gives an acceptable description of the data set; however, it still might not be physically acceptable. For this reason, appropriate constraints should be applied to the concentration profiles during the iteration. In PCA_GUI, by default the profile concentration must be positive and the sum of the concentrations for each spectrum should be 1. Other implemented constraints are optional; for example, the presence of only one maximum or concentration limits in a range of the data set.

When the iterative projection reaches convergence, the obtained concentration profiles satisfy equation (1)link to equation and the constraints and the corresponding spectra can be calculated by least squares from A and T (Gemperline, 2006link to reference). Nevertheless, it is important to underline that the results obtained from ITFA are not unique, but rather a specimen in the restricted space of rotation defined by the applied constrains.

5. LinearCom_GUI

The last program in the package is devoted to fitting a sequence of spectra as a linear combination of standards. Such an analysis can be applied to the normalized XANES, the derivative of the normalized XANES or the kwχ(k) spectra. In analogy with PrestoPronto_GUI, the fit is sequential, i.e. the starting values for the fitting parameter are set to be equal to the refined values obtained from the previous spectrum in the set. The code is built around the LMFIT Python library developed by Newville et al. (2014link to reference) and uses the Levenberg–Marquardt algorithm with parameters and standard errors calculated at a 1σ interval of confidence. The sequence of spectra and reference-compound spectra are imported as a multicolumn ASCII file. If necessary, the code automatically performs a cubic spline interpolation to define a common abscissa axis for experimental and standard spectra. Users can choose to analyse only a subset of the sequence and can graphically define the range of the fit.

6. Future developments

Presently in PrestoPronto_GUI the EXAFS is fitted with a basic sequential algorithm, in which all parameters are refined independently. Creating a new module with a dedicated interface will allow constraints between different path contributions and also between different spectra in the sequence to be introduced. Another possible improvement would be a better management of spectra import and the data matrix in order to avoid the RAM memory-size limit issue related to very large data matrices.

7. Resources

A project page for PrestoPronto is accessible at http://soonready.github.io/PrestoPronto/ . A self-installer and the source code are available. The screen shots in this article were made on a Windows 7 computer. The programs may differ in appearance on other systems.

Acknowledgements

The author thanks Santiago Figueroa, Sakura Pascarelli and Mark A. Newton for help in developing the programs and Matt Newville and Marcos Fernández-García for kindly opening their codes.

References

First citationAchilli, E., Minguzzi, A., Lugaresi, O., Locatelli, C., Rondinini, S., Spinolo, G. & Ghigna, P. (2014). J. Spectrosc. 2014, 1–7.Google Scholar
First citationBahout, M., Tonus, F., Prestipino, C., Pelloquin, D., Hansen, T., Fonda, E. & Battle, P. D. (2012). J. Mater. Chem. 22, 10560.Google Scholar
First citationBunker, G. (2010). Introduction to XAFS: A Practical Guide to X-ray Absorption Fine Structure Spectroscopy. Cambridge University Press.Google Scholar
First citationFernandez-Garcia, M., Marquez Alvarez, C. & Haller, G. L. (1995). J. Phys. Chem. 99, 12565–12569.Google Scholar
First citationFrahm, R. (1988). Nucl. Instrum. Methods Phys. Res. A, 270, 578–581.Google Scholar
First citationGemperline, P. (2006). Practical Guide to Chemometrics, 2nd ed. Boca Raton: CRC/Taylor & Francis.Google Scholar
First citationHarman, H. H. (1976). Modern Factor Analysis. University of Chicago Press.Google Scholar
First citationHunter, J. D. (2007). Comput. Sci. Eng. 9, 90–95.Google Scholar
First citationKaminaga, U., Matsushita, T. & Kohra, K. (1981). Jpn. J. Appl. Phys. 20, L355–L358.Google Scholar
First citationKas, J. J., Vila, F. D. & Rehr, J. J. (2024). Int. Tables Crystallogr. I, ch. 6.8, 764–769 .Google Scholar
First citationMalinowski, E. R. (1977). Anal. Chem. 49, 612–617.Google Scholar
First citationMalinowski, E. R. (1989). J. Chemometr. 3, 49–60.Google Scholar
First citationMalinowski, E. R. (2002). Factor Analysis in Chemistry. New York: Wiley.Google Scholar
First citationMalinowski, E. R. (2009). J. Chemometr. 23, 1–6.Google Scholar
First citationMatsushita, T. & Phizackerley, R. P. (1981). Jpn. J. Appl. Phys. 20, 2223–2228.Google Scholar
First citationMoog, I., Feral-Martin, C., Duttine, M., Wattiaux, A., Prestipino, C., Figueroa, S., Majimel, J. & Demourgues, A. (2014). J. Mater. Chem. A, 2, 20402–20414.Google Scholar
First citationMüller, O., Nachtegaal, M., Just, J., Lützenkirchen-Hecht, D. & Frahm, R. (2016). J. Synchrotron Rad. 23, 260–266.Google Scholar
First citationNewville, M. (2001a). J. Synchrotron Rad. 8, 96–100.Google Scholar
First citationNewville, M. (2001b). J. Synchrotron Rad. 8, 322–324.Google Scholar
First citationNewville, M. (2012). J. Phys. Conf. Ser. 430, 012007.Google Scholar
First citationNewville, M., Līviņš, P., Yacoby, Y., Rehr, J. J. & Stern, E. A. (1993). Phys. Rev. B, 47, 14126–14131.Google Scholar
First citationNewville, M. & Ravel, B. (2024). Int. Tables Crystallogr. I, ch. 6.13, 791–795 .Google Scholar
First citationNewville, M., Stensitzki, T., Allen, D. B. & Ingargiola, A. (2014). LMFIT: Non-Linear Least-Square Minimization and Curve-Fitting for Python. https://lmfit.github.io/lmfit-py/ .Google Scholar
First citationNilsson, J., Carlsson, P.-A., Fouladvand, S., Martin, N. M., Gustafson, J., Newton, M. A., Lundgren, E., Grönbeck, H. & Skoglundh, M. (2015). ACS Catal. 5, 2481–2489.Google Scholar
First citationPascarelli, S., Mathon, O., Mairs, T., Kantor, I., Agostini, G., Strohm, C., Pasternak, S., Perrin, F., Berruyer, G., Chappelet, P., Clavel, C. & Dominguez, M. C. (2016). J. Synchrotron Rad. 23, 353–368.Google Scholar
First citationRavel, B. & Newville, M. (2005). J. Synchrotron Rad. 12, 537–541.Google Scholar
First citationRavel, B. & Newville, M. (2024). Int. Tables Crystallogr. I, ch. 6.1, 723–727 .Google Scholar
First citationRehr, J. J., Albers, R. C. & Zabinsky, S. I. (1992). Phys. Rev. Lett. 69, 3397–3400.Google Scholar
First citationRehr, J. J. & Albers, R. C. (2000). Rev. Mod. Phys. 72, 621–654.Google Scholar
First citationRessler, T. (1998). J. Synchrotron Rad. 5, 118–122.Google Scholar
First citationRossberg, A., Reich, T. & Bernhard, G. (2003). Anal. Bioanal. Chem. 376, 631–638.Google Scholar
First citationSanchez del Rio, M. & Dejus, J. R. (2004). Proc. SPIE, 5536, 171–174.Google Scholar
First citationSan-Miguel, A. (1995). Physica B, 208–209, 177–179.Google Scholar
First citationStötzel, J., Lützenkirchen-Hecht, D. & Frahm, R. (2010). Rev. Sci. Instrum. 81, 073109.Google Scholar
First citationStötzel, J., Lützenkirchen-Hecht, D., Grunwaldt, J.-D. & Frahm, R. (2012). J. Synchrotron Rad. 19, 920–929.Google Scholar
First citationVandeginste, B. G. M., Derks, W. & Kateman, G. (1985). Anal. Chim. Acta, 173, 253–264.Google Scholar
First citationWalt, S. van der, Colbert, S. C. & Varoquaux, G. (2011). Comput. Sci. Eng. 13, 22–30.Google Scholar
First citationZuvich, A. F., Soldati, A., Larrondo, S., Saleta, M., Lamas, D. G., Baque, L. C., Caneiro, A. & Serquis, A. (2014). ECS Trans. 64, 233–240.Google Scholar








































to end of page
to top of page