International
Tables for Crystallography Volume C Mathematical, physical and chemical tables Edited by E. Prince © International Union of Crystallography 2006 |
International Tables for Crystallography (2006). Vol. C. ch. 8.2, pp. 691-692
|
Entropy maximization, like least squares, is of interest primarily as a framework within which to find or adjust parameters of a model. Rationalization of the name `entropy maximization' by analogy to thermodynamics is controversial, but there is formal proof (Shore & Johnson, 1980, Johnson & Shore, 1983
) supporting entropy maximization as the unique method of inference that satisfies basic consistency requirements (Livesey & Skilling, 1985
). The proof consists of discovering the consequences of four consistency axioms, which may be stated informally as follows:
|
The term `entropy' is used in this chapter as a name only, the name for variation functions that include the form , where
may represent probability or, more generally, a positive proportion. Any positive measure, either observed or derived, of the relative apportionment of a characteristic quantity among observations can serve as the proportion.
The method of entropy maximization may be formulated as follows: given a set of n observations, , that are measurements of quantities that can be described by model functions,
, where x is a vector of parameters, find the prior, positive proportions,
, and the values of the parameters for which the positive proportions
make the sum
where
and
, a maximum. S is called the Shannon–Jaynes entropy. For some applications (Collins, 1982
), it is desirable to include in the variation function additional terms or restraints that give S the form
where the λs are undetermined multipliers, but we shall discuss here only applications where λi = 0 for all i, and an unrestrained entropy is maximized. A necessary condition for S to be a maximum is for the gradient to vanish. Using
and
straightforward algebraic manipulation gives equations of the form
It should be noted that, although the entropy function should, in principle, have a unique stationary point corresponding to the global maximum, there are occasional circumstances, particularly with restrained problems where the undetermined multipliers are not all zero, where it may be necessary to verify that a stationary solution actually maximizes entropy.
For an example of the application of the maximum-entropy method, consider (Collins, 1984) a collection of diffraction intensities in which various subsets have been measured under different conditions, such as on different films or with different crystals. All systematic corrections have been made, but it is necessary to put the different subsets onto a common scale. Assume that every subset has measurements in common with some other subset, and that no collection of subsets is isolated from the others. Let the measurement of intensity
in subset i be
, and let the scale factor that puts intensity
on the scale of subset i be
. Equation (8.2.3.1)
becomes
where the term is zero if
does not appear in subset i. Because
and
are parameters of the model, equations (8.2.3.5)
become
and
These simplify to
and
where
Equations (8.2.3.8)
may be solved iteratively, starting with the approximations
and Q = 0.
The standard uncertainties of scale factors and intensities are not used in the solution of equations (8.2.3.8), and must be computed separately. They may be estimated on a fractional basis from the variances of estimated population means
for a scale factor and
for an intensity, respectively. The maximum-entropy scale factors and scaled intensities are relative, and either set may be multiplied by an arbitrary, positive constant without affecting the solution.
For another example, consider the maximum-entropy fit of a linear function to a set of independently distributed variables. Let represent an observation drawn from a population with mean
and finite variance
; we wish to find the maximum-entropy estimate of
and
. Assume that the mismatch between the observation and the model is normally distributed, so that its probability density is the positive proportion
where
. The prior proportion is given by
Letting
, equations (8.2.3.5)
become
and
which simplifies to
where
may be interpreted as a weight and is given by
. Equations (8.2.3.12)
may be solved iteratively, starting with the approximations that the sums over j on the right-hand side are zero and
for all i, that is, using the solutions to the corresponding, unweighted least-squares problem. Resetting
after each iteration by only half the indicated amount defeats a tendency towards oscillation. Approximate standard uncertainties for the parameters,
and
, may be computed by conventional means after setting to zero the sums over j on the right-hand side of equations (8.2.3.12)
. (See, however, a discussion of computing variance–covariance matrices in Section 8.1.2
.) Note that
is small for both small and large values of
. Thus, in contrast to the robust/resistant methods (Section 8.2.2
), which de-emphasize only the large differences, this method down-weights both the small and the large differences and adjusts the parameters on the basis of the moderate-size mismatches between model and data. The procedure used in this two-dimensional, linear model can be extended to linear models, and linear approximations to nonlinear models, in any number of dimensions using methods discussed in Chapter 8.1
.
The maximum-entropy method has been described (Jaynes, 1979) as being `maximally noncommittal with respect to all other matters; it is as uniform (by the criterion of the Shannon information measure) as it can be without violating the given constraint[s]'. Least squares, because it gives minimum variance estimates of the parameters of a model, and therefore of all functions of the model including the predicted values of any additional data points, might be similarly described as `maximally committal' with regard to the collection of more data. Least squares and maximum entropy can therefore be viewed as the extremes of a range of methods, classified according to the degree of a priori confidence in the correctness of the model, with the robust/resistant methods lying somewhere in between (although generally closer to least squares). Maximum-entropy methods can be used when it is desirable to avoid prejudice in favour of a model because of doubt as to the model's correctness.
References





