The F distribution

Prince, E.; Spiegelman, C. H.

doi:10.1107/97809553602060000612

International
Tables for
Crystallography
Volume C
Mathematical, physical and chemical tables
Edited by E. Prince

pdf | chapter contents | chapter index | related articles

International Tables for Crystallography (2006). Vol. C. ch. 8.4, pp. 703-704

Section 8.4.2. The F distribution

E. Prince^a and C. H. Spiegelman^b

^a NIST Center for Neutron Research, National Institute of Standards and Technology, Gaithersburg, MD 20899, USA, and ^bDepartment of Statistics, Texas A&M University, College Station, TX 77843, USA

8.4.2. The F distribution

| top | pdf |

Consider an unconstrained model with p parameters and a constrained one with q parameters, where $[q\lt p]$ . We wish to decide whether the constrained model represents an adequate fit to the data, or if the additional parameters in the unconstrained model provide, in some important sense, a better fit to the data. Provided the (p − q) additional columns of the design matrix, A, are linearly independent of the previous q columns, the sum of squared residuals must be reduced by some finite amount by adjusting the additional parameters, but we must decide whether this improved fit would have occurred purely by chance, or whether it represents additional information.

Let [s_c^2] and [s_u^2] be the weighted sums of squared residuals for the constrained and unconstrained models, respectively. If the constrained and unconstrained models are equally good representations of the data, and the weights have been assigned by $[w_i=1/\sigma _i^2]$ , the expected values of the sums of squares are $[\langle s_c^2\rangle = (n-q)]$ and $[\langle s_u^2\rangle =(n-p)]$ , and, further, they should be distributed as χ² with (n − q) and (n − p) degrees of freedom, respectively. Also, $[\langle s_c^2-s_u^2\rangle = (p-q)]$ , and [(s_c^2-s_u^2)] is distributed as χ² with (p − q) degrees of freedom. [s_c^2] and [s_u^2] are not independent, but is the squared magnitude of a vector in a (p − q)-dimensional subspace that is orthogonal to the (n − p)-dimensional space of [s_u^2] . Therefore, [s_u^2] and are independent, random variables, each with a χ² distribution. Let $[\chi _1^2=(s_c^2-s_u^2)]$ , $[\chi _2^2=s_u^2]$ , ν₁ = p − q, and ν₂ = n − p. The ratio F = $[(\chi _1^2/\nu _1)/(\chi _2^2/\nu _2)]$ should have a value close to one, even if the weights have relative rather than absolute values, but we need a measure of how far away from one this ratio can be before we must reject the hypothesis that the two models are equally good representations of the data. The conditional p.d.f. for F, given a value of $[\chi _2^2]$ , is $[ \Phi _C\left (F|\chi _2^2\right) = {\left [\left (\nu _1/\nu _2\right) \chi _2^2\right] ^{\nu _1/2}F^{\nu _1/2-1} \over 2^{\nu _1/2}\Gamma (\nu _1/2) }\exp \left [-(\nu _1/\nu _2) \chi _2^2F/2\right] , \eqno (8.4.2.1)]$ and the marginal p.d.f. for $[\chi _2^2]$ is $[ \Phi _M\left (\chi _2^2\right) = {\left (\chi _2^2\right) ^{\nu _2/2-1} \over 2^{\nu _2/2}\Gamma(\nu _2/2) }\exp \left (-\chi _2^2/2\right). \eqno (8.4.2.2)]$ The marginal p.d.f. for F is obtained by integration of the joint p.d.f., $[ \Phi (F)=\textstyle\int\limits_0^\infty \Phi _C\left (F|\chi _2^2\right) \!\Phi _M\!\left(\chi _2^2\right) {\,{\rm d}}\chi _2^2, \eqno (8.4.2.3)]$ yielding the result $[ \Phi (F,\nu _1,\nu _2)= {\left (\nu _1/\nu _2\right) F^{\nu _1/2-1} \over B\left (\nu _1/2,\nu _2/2\right) \left [1+\left (\nu _1/\nu _2\right) F\right] ^{\left (\nu _1+\nu _2\right) /2}}. \eqno (8.4.2.4)]$ This p.d.f. is known as the F distribution with $[\nu _1]$ and $[\nu _2]$ degrees of freedom. Table 8.4.2.1 gives the values of F for which the c.d.f. $[\Psi (F,\nu _1,\nu _2)]$ is equal to 0.95 for various choices of ν₁ and ν₂. Fortran code for the program from which the table was generated appears in Prince (1994).

Table 8.4.2.1| top | pdf |
Values of the F ratio for which the c.d.f. ψ(F, ν₁, ν₂) has the value 0.95, for various choices of ν₁ and ν₂

$[{\nu _2}]$	$[\nu _1]$
$[{\nu _2}]$	1	2	4	8	15
10	4.9646	4.1028	3.4781	3.0717	2.8450
20	4.3512	3.4928	2.8661	2.4471	2.2033
30	4.1709	3.3158	2.6896	2.2662	2.0148
40	4.0847	3.2317	2.6060	2.1802	1.9245
50	4.0343	3.1826	2.5572	2.1299	1.8714
60	4.0012	3.1504	2.5252	2.0970	1.8364
80	3.9604	3.1108	2.4859	2.0564	1.7932
100	3.9361	3.0873	2.4626	2.0323	1.7675
120	3.9201	3.0718	2.4472	2.0164	1.7505
150	3.9042	3.0564	2.4320	2.0006	1.7335
200	3.8884	3.0411	2.4168	1.9849	1.7167
300	3.8726	3.0259	2.4017	1.9693	1.6998
400	3.8648	3.0183	2.3943	1.9616	1.6914
600	3.8570	3.0107	2.3868	1.9538	1.6831
1000	3.8508	3.0047	2.3808	1.9477	1.6764

The cumulative distribution function $[\Psi (F,\nu _1,\nu _2)]$ gives the probability that the F ratio will be less than some value by chance if the models are equally consistent with the data. It is therefore a necessary, but not sufficient, condition for concluding that the unconstrained model gives a significantly better fit to the data that $[\Psi (F,\nu _1,\nu _2)]$ be greater than 1 − α, where α is the desired level of significance. For example, if $[\Psi (F,\nu _1,\nu _2)]$ = 0.95, the probability is only 0.05 that a value of F this large or greater would have been observed if the two models were equally good representations of the data.

Hamilton (1964) observed that the F ratio could be expressed in terms of the crystallographic weighted R index, which is defined, for refinement on |F| (and similarly for refinement on |F|²), by $[R_w=\left [\mathop {\textstyle \sum }w_i(|{\rm F}_o|_i-|{\rm F}_c|_i)^2\big/\mathop {\textstyle \sum }w_i|{\rm F}_o|_i^2\right] ^{1/2}. \eqno (8.4.2.5)]$

Denoting by [R_c] and [R_u] the weighted R indices for the constrained and unconstrained models, respectively, $[ F=(\nu _2/\nu _1)[(R_c/R_u)^2 - 1], \eqno (8.4.2.6)]$ and a c.d.f. for [R_c/R_u] can be readily derived from this relation. A significance test based on is known as Hamilton's R-ratio test; it is entirely equivalent to a test on the F ratio.

References

Hamilton, W. C. (1964). Statistics in physical science: estimation, hypothesis testing and least squares. New York: Ronald Press.Google Scholar

Prince, E. (1994). Mathematical techniques in crystallography and materials science, 2nd ed. Berlin/Heidelberg/New York/London/Paris/Tokyo/Hong Kong/Barcelona/Budapest: Springer-Verlag.Google Scholar

International Tables for Crystallography (2006). Vol. C. ch. 8.4, pp. 703-704