International
Tables for Crystallography Volume C Mathematical, physical and chemical tables Edited by E. Prince © International Union of Crystallography 2006 |
International Tables for Crystallography (2006). Vol. C. ch. 8.4, pp. 703-704
|
Consider an unconstrained model with p parameters and a constrained one with q parameters, where . We wish to decide whether the constrained model represents an adequate fit to the data, or if the additional parameters in the unconstrained model provide, in some important sense, a better fit to the data. Provided the (p − q) additional columns of the design matrix, A, are linearly independent of the previous q columns, the sum of squared residuals must be reduced by some finite amount by adjusting the additional parameters, but we must decide whether this improved fit would have occurred purely by chance, or whether it represents additional information.
Let and be the weighted sums of squared residuals for the constrained and unconstrained models, respectively. If the constrained and unconstrained models are equally good representations of the data, and the weights have been assigned by , the expected values of the sums of squares are and , and, further, they should be distributed as χ2 with (n − q) and (n − p) degrees of freedom, respectively. Also, , and is distributed as χ2 with (p − q) degrees of freedom. and are not independent, but is the squared magnitude of a vector in a (p − q)-dimensional subspace that is orthogonal to the (n − p)-dimensional space of . Therefore, and are independent, random variables, each with a χ2 distribution. Let , , ν1 = p − q, and ν2 = n − p. The ratio F = should have a value close to one, even if the weights have relative rather than absolute values, but we need a measure of how far away from one this ratio can be before we must reject the hypothesis that the two models are equally good representations of the data. The conditional p.d.f. for F, given a value of , is and the marginal p.d.f. for is The marginal p.d.f. for F is obtained by integration of the joint p.d.f., yielding the result This p.d.f. is known as the F distribution with and degrees of freedom. Table 8.4.2.1 gives the values of F for which the c.d.f. is equal to 0.95 for various choices of ν1 and ν2. Fortran code for the program from which the table was generated appears in Prince (1994).
|
The cumulative distribution function gives the probability that the F ratio will be less than some value by chance if the models are equally consistent with the data. It is therefore a necessary, but not sufficient, condition for concluding that the unconstrained model gives a significantly better fit to the data that be greater than 1 − α, where α is the desired level of significance. For example, if = 0.95, the probability is only 0.05 that a value of F this large or greater would have been observed if the two models were equally good representations of the data.
Hamilton (1964) observed that the F ratio could be expressed in terms of the crystallographic weighted R index, which is defined, for refinement on |F| (and similarly for refinement on |F|2), by
Denoting by and the weighted R indices for the constrained and unconstrained models, respectively, and a c.d.f. for can be readily derived from this relation. A significance test based on is known as Hamilton's R-ratio test; it is entirely equivalent to a test on the F ratio.
References
Hamilton, W. C. (1964). Statistics in physical science: estimation, hypothesis testing and least squares. New York: Ronald Press.Google ScholarPrince, E. (1994). Mathematical techniques in crystallography and materials science, 2nd ed. Berlin/Heidelberg/New York/London/Paris/Tokyo/Hong Kong/Barcelona/Budapest: Springer-Verlag.Google Scholar