International
Tables for Crystallography Volume C Mathematical, physical and chemical tables Edited by E. Prince © International Union of Crystallography 2006 
International Tables for Crystallography (2006). Vol. C. ch. 8.2, pp. 691692

Entropy maximization, like least squares, is of interest primarily as a framework within which to find or adjust parameters of a model. Rationalization of the name `entropy maximization' by analogy to thermodynamics is controversial, but there is formal proof (Shore & Johnson, 1980, Johnson & Shore, 1983) supporting entropy maximization as the unique method of inference that satisfies basic consistency requirements (Livesey & Skilling, 1985). The proof consists of discovering the consequences of four consistency axioms, which may be stated informally as follows:

The term `entropy' is used in this chapter as a name only, the name for variation functions that include the form , where may represent probability or, more generally, a positive proportion. Any positive measure, either observed or derived, of the relative apportionment of a characteristic quantity among observations can serve as the proportion.
The method of entropy maximization may be formulated as follows: given a set of n observations, , that are measurements of quantities that can be described by model functions, , where x is a vector of parameters, find the prior, positive proportions, , and the values of the parameters for which the positive proportions make the sum where and , a maximum. S is called the Shannon–Jaynes entropy. For some applications (Collins, 1982), it is desirable to include in the variation function additional terms or restraints that give S the form where the λs are undetermined multipliers, but we shall discuss here only applications where λ_{i} = 0 for all i, and an unrestrained entropy is maximized. A necessary condition for S to be a maximum is for the gradient to vanish. Using and straightforward algebraic manipulation gives equations of the form It should be noted that, although the entropy function should, in principle, have a unique stationary point corresponding to the global maximum, there are occasional circumstances, particularly with restrained problems where the undetermined multipliers are not all zero, where it may be necessary to verify that a stationary solution actually maximizes entropy.
For an example of the application of the maximumentropy method, consider (Collins, 1984) a collection of diffraction intensities in which various subsets have been measured under different conditions, such as on different films or with different crystals. All systematic corrections have been made, but it is necessary to put the different subsets onto a common scale. Assume that every subset has measurements in common with some other subset, and that no collection of subsets is isolated from the others. Let the measurement of intensity in subset i be , and let the scale factor that puts intensity on the scale of subset i be . Equation (8.2.3.1) becomes where the term is zero if does not appear in subset i. Because and are parameters of the model, equations (8.2.3.5) become and These simplify to and where Equations (8.2.3.8) may be solved iteratively, starting with the approximations and Q = 0.
The standard uncertainties of scale factors and intensities are not used in the solution of equations (8.2.3.8), and must be computed separately. They may be estimated on a fractional basis from the variances of estimated population means for a scale factor and for an intensity, respectively. The maximumentropy scale factors and scaled intensities are relative, and either set may be multiplied by an arbitrary, positive constant without affecting the solution.
For another example, consider the maximumentropy fit of a linear function to a set of independently distributed variables. Let represent an observation drawn from a population with mean and finite variance ; we wish to find the maximumentropy estimate of and . Assume that the mismatch between the observation and the model is normally distributed, so that its probability density is the positive proportion where . The prior proportion is given by Letting , equations (8.2.3.5) become and which simplifies to where may be interpreted as a weight and is given by . Equations (8.2.3.12) may be solved iteratively, starting with the approximations that the sums over j on the righthand side are zero and for all i, that is, using the solutions to the corresponding, unweighted leastsquares problem. Resetting after each iteration by only half the indicated amount defeats a tendency towards oscillation. Approximate standard uncertainties for the parameters, and , may be computed by conventional means after setting to zero the sums over j on the righthand side of equations (8.2.3.12). (See, however, a discussion of computing variance–covariance matrices in Section 8.1.2 .) Note that is small for both small and large values of . Thus, in contrast to the robust/resistant methods (Section 8.2.2), which deemphasize only the large differences, this method downweights both the small and the large differences and adjusts the parameters on the basis of the moderatesize mismatches between model and data. The procedure used in this twodimensional, linear model can be extended to linear models, and linear approximations to nonlinear models, in any number of dimensions using methods discussed in Chapter 8.1 .
The maximumentropy method has been described (Jaynes, 1979) as being `maximally noncommittal with respect to all other matters; it is as uniform (by the criterion of the Shannon information measure) as it can be without violating the given constraint[s]'. Least squares, because it gives minimum variance estimates of the parameters of a model, and therefore of all functions of the model including the predicted values of any additional data points, might be similarly described as `maximally committal' with regard to the collection of more data. Least squares and maximum entropy can therefore be viewed as the extremes of a range of methods, classified according to the degree of a priori confidence in the correctness of the model, with the robust/resistant methods lying somewhere in between (although generally closer to least squares). Maximumentropy methods can be used when it is desirable to avoid prejudice in favour of a model because of doubt as to the model's correctness.
References
Collins, D. M. (1982). Electron density images from imperfect data by iterative entropy maximization. Nature (London), 298, 49–51.Google ScholarCollins, D. M. (1984). Scaling by entropy maximization. Acta Cryst. A40, 705–708.Google Scholar
Jaynes, E. T. (1979). Where do we stand on maximum entropy? The maximum entropy formalism, edited by R. D. Liven & M. Tribus, pp. 44–49. Cambridge, MA: Massachusetts Institute of Technology.Google Scholar
Johnson, R. W. & Shore, J. E. (1983). Comments on and correction to 'Axiomatic derivation of the principle of maximum entropy and the principle of minimum crossentropy'. IEEE Trans. Inf. Theory, IT29, 942–943.Google Scholar
Livesey, A. K. & Skilling, J. (1985). Maximum entropy theory. Acta Cryst. A41, 113–122.Google Scholar
Shore, J. E. & Johnson, R. W. (1980). Axiomatic derivation of the principle of maximum entropy and the principle of minimum crossentropy. IEEE Trans. Inf. Theory, IT26, 26–37.Google Scholar