International
Tables for Crystallography Volume B Reciprocal space Edited by U. Shmueli © International Union of Crystallography 2006 |
International Tables for Crystallography (2006). Vol. B. ch. 1.3, pp. 50-55
|
Throughout this section we will denote by the expression , . The mapping has the following properties: Thus e defines an isomorphism between the additive group (the reals modulo the integers) and the multiplicative group of complex numbers of modulus 1. It follows that the mapping , where and N is a positive integer, defines an isomorphism between the one-dimensional residual lattice and the multiplicative group of Nth roots of unity.
The DFT on N points then relates vectors X and in W and through the linear transformations:
The presentation of Gentleman & Sande (1966) will be followed first [see also Cochran et al. (1967)]. It will then be reinterpreted in geometric terms which will prepare the way for the treatment of multidimensional transforms in Section 1.3.3.3.
Suppose that the number of sample points N is composite, say . We may write k to the base and to the base as follows: The defining relation for may then be written: The argument of may be expanded as and the last summand, being an integer, may be dropped: This computation may be decomposed into five stages, as follows:
If the intermediate transforms in stages (ii) and (iv) are performed in place, i.e. with the results overwriting the data, then at stage (v) the result will be found at address . This phenomenon is called scrambling by `digit reversal', and stage (v) is accordingly known as unscrambling.
The initial N-point transform has thus been performed as transforms on points, followed by transforms on points, thereby reducing the arithmetic cost from to . The phase shifts applied at stage (iii) are traditionally called `twiddle factors', and the transposition between and can be performed by the fast recursive technique of Eklundh (1972). Clearly, this procedure can be applied recursively if and are themselves composite, leading to an overall arithmetic cost of order N log N if N has no large prime factors.
The Cooley–Tukey factorization may also be derived from a geometric rather than arithmetic argument. The decomposition is associated to a geometric partition of the residual lattice into copies of , each translated by and `blown up' by a factor . This partition in turn induces a (direct sum) decomposition of X as where
According to (i), is related to by decimation by and offset by . By Section 1.3.2.7.2, is related to by periodization by and phase shift by , so that the periodization by being reflected by the fact that does not depend on . Writing and expanding shows that the phase shift contains both the twiddle factor and the kernel of . The Cooley–Tukey algorithm is thus naturally associated to the coset decomposition of a lattice modulo a sublattice (Section 1.3.2.7.2).
It is readily seen that essentially the same factorization can be obtained for , up to the complex conjugation of the twiddle factors. The normalizing constant arises from the normalizing constants and in and , respectively.
Factors of 2 are particularly simple to deal with and give rise to a characteristic computational structure called a `butterfly loop'. If , then two options exist:
By repeated factoring of the number N of sample points, the calculation of and can be reduced to a succession of stages, the smallest of which operate on single prime factors of N. The reader is referred to Gentleman & Sande (1966) for a particularly lucid analysis of the programming considerations which help implement this factorization efficiently; see also Singleton (1969). Powers of two are often grouped together into factors of 4 or 8, which are advantageous in that they require fewer complex multiplications than the repeated use of factors of 2. In this approach, large prime factors P are detrimental, since they require a full -size computation according to the defining formula.
The set of congruence classes of integers modulo an integer N [see e.g. Apostol (1976), Chapter 5] inherits from not only the additive structure used in deriving the Cooley–Tukey factorization, but also a multiplicative structure in which the product of two congruence classes mod N is uniquely defined as the class of the ordinary product (in ) of representatives of each class. The multiplication can be distributed over addition in the usual way, endowing with the structure of a commutative ring.
If N is composite, the ring has zero divisors. For example, let , let mod N, and let mod N: then mod N. In the general case, a product of non-zero elements will be zero whenever these elements collect together all the factors of N. These circumstances give rise to a fundamental theorem in the theory of commutative rings, the Chinese Remainder Theorem (CRT), which will now be stated and proved [see Apostol (1976), Chapter 5; Schroeder (1986), Chapter 16].
Let be factored into a product of pairwise coprime integers, so that g.c.d. for . Then the system of congruence equations has a unique solution mod N. In other words, each is associated in a one-to-one fashion to the d-tuple of its residue classes in .
The proof of the CRT goes as follows. Let Since g.c.d. there exist integers and such that then the integer is the solution. Indeed, because all terms with contain as a factor; and by the defining relation for .
It may be noted that so that the are mutually orthogonal idempotents in the ring , with properties formally similar to those of mutually orthogonal projectors onto subspaces in linear algebra. The analogy is exact, since by virtue of the CRT the ring may be considered as the direct product via the two mutually inverse mappings:
The mapping defined by (ii) is sometimes called the `CRT reconstruction' of from the .
These two mappings have the property of sending sums to sums and products to products, i.e: (the last proof requires using the properties of the idempotents ). This may be described formally by stating that the CRT establishes a ring isomorphism:
The CRT will now be used to factor the N-point DFT into a tensor product of d transforms, the jth of length .
Let the indices k and be subjected to the following mappings:
Then Cross terms with vanish since they contain all the factors of N, hence Dividing by N, which may be written as for each j, yields and hence Therefore, by the multiplicative property of ,
Let be described by a one-dimensional array indexed by k. The index mapping (i) turns X into an element of described by a d-dimensional array ; the latter may be transformed by into a new array . Finally, the one-dimensional array of results will be obtained by reconstructing according to (ii).
The prime factor algorithm, like the Cooley–Tukey algorithm, reindexes a 1D transform to turn it into d separate transforms, but the use of coprime factors and CRT index mapping leads to the further gain that no twiddle factors need to be applied between the successive transforms (see Good, 1971). This makes up for the cost of the added complexity of the CRT index mapping.
The natural factorization of N for the prime factor algorithm is thus its factorization into prime powers: is then the tensor product of separate transforms (one for each prime power factor ) whose results can be reassembled without twiddle factors. The separate factors within each must then be dealt with by another algorithm (e.g. Cooley–Tukey, which does require twiddle factors). Thus, the DFT on a prime number of points remains undecomposable.
The previous two algorithms essentially reduce the calculation of the DFT on N points for N composite to the calculation of smaller DFTs on prime numbers of points, the latter remaining irreducible. However, Rader (1968) showed that the p-point DFT for p an odd prime can itself be factored by invoking some extra arithmetic structure present in .
The ring has the property that its non-zero elements, called units, form a multiplicative group . In particular, all units have a unique multiplicative inverse in , i.e. a unit such that . This endows with the structure of a finite field.
Furthermore, is a cyclic group, i.e. consists of the successive powers of a generator g called a primitive root mod p (such a g may not be unique, but it always exists). For instance, for , is generated by , whose successive powers mod 7 are: [see Apostol (1976), Chapter 10].
The basis of Rader's algorithm is to bring to light a hidden regularity in the matrix by permuting the basis vectors and of as follows: where g is a primitive root mod p.
With respect to these new bases, the matrix representing will have the following elements: Thus the `core' of matrix , of size , formed by the elements with two non-zero indices, has a so-called skew-circulant structure because element depends only on . Simplification may now occur because multiplication by is closely related to a cyclic convolution. Introducing the notation we may write the relation in the permuted bases as where Z is defined by , .
Thus may be obtained by cyclic convolution of C and Z, which may for instance be calculated by where × denotes the component-wise multiplication of vectors. Since p is odd, is always divisible by 2 and may even be highly composite. In that case, factoring by means of the Cooley–Tukey or Good methods leads to an algorithm of complexity p log p rather than for . An added bonus is that, because , the elements of can be shown to be either purely real or purely imaginary, which halves the number of real multiplications involved.
This idea was extended by Winograd (1976, 1978) to the treatment of prime powers , using the cyclic structure of the multiplicative group of units . The latter consists of all those elements of which are not divisible by p, and thus has elements. It is cyclic, and there exist primitive roots g modulo such that The elements divisible by p, which are divisors of zero, have to be treated separately just as 0 had to be treated separately for .
When , then with . The results are p-decimated, hence can be obtained via the -point DFT of the -periodized data Y: with
When , then we may write where contains the contributions from and those from . By a converse of the previous calculation, arises from p-decimated data Z, hence is the -periodization of the -point DFT of these data: with (the -periodicity follows implicity from the fact that the transform on the right-hand side is independent of ).
Finally, the contribution from all may be calculated by reindexing by the powers of a primitive root g modulo , i.e. by writing then carrying out the multiplication by the skew-circulant matrix core as a convolution.
Thus the DFT of size may be reduced to two DFTs of size (dealing, respectively, with p-decimated results and p-decimated data) and a convolution of size . The latter may be `diagonalized' into a multiplication by purely real or purely imaginary numbers (because ) by two DFTs, whose factoring in turn leads to DFTs of size and . This method, applied recursively, allows the complete decomposition of the DFT on points into arbitrarily small DFTs.
When , the same method can be applied, except for a slight modification in the calculation of . There is no primitive root modulo for : the group is the direct product of two cyclic groups, the first (of order 2) generated by −1, the second (of order ) generated by 3 or 5. One then uses a representation and the reindexed core matrix gives rise to a two-dimensional convolution. The latter may be carried out by means of two 2D DFTs on points.
The cyclic convolutions generated by Rader's multiplicative reindexing may be evaluated more economically than through DFTs if they are re-examined within a new algebraic setting, namely the theory of congruence classes of polynomials [see, for instance, Blahut (1985), Chapter 2; Schroeder (1986), Chapter 24].
The set, denoted , of polynomials in one variable with coefficients in a given field has many of the formal properties of the set of rational integers: it is a ring with no zero divisors and has a Euclidean algorithm on which a theory of divisibility can be built.
Given a polynomial , then for every there exist unique polynomials and such that and is called the residue of modulo . Two polynomials and having the same residue modulo are said to be congruent modulo , which is denoted by
If is said to be divisible by . If only has divisors of degree zero in , it is said to be irreducible over (this notion depends on ). Irreducible polynomials play in a role analogous to that of prime numbers in , and any polynomial over has an essentially unique factorization as a product of irreducible polynomials.
There exists a Chinese remainder theorem (CRT) for polynomials. Let be factored into a product of pairwise coprime polynomials [i.e. and have no common factor for ]. Then the system of congruence equations has a unique solution modulo . This solution may be constructed by a procedure similar to that used for integers. Let Then and are coprime, and the Euclidean algorithm may be used to obtain polynomials and such that With , the polynomial is easily shown to be the desired solution.
As with integers, it can be shown that the 1:1 correspondence between and sends sums to sums and products to products, i.e. establishes a ring isomorphism:
These results will now be applied to the efficient calculation of cyclic convolutions. Let and be two vectors of length N, and let be obtained by cyclic convolution of U and V: The very simple but crucial result is that this cyclic convolution may be carried out by polynomial multiplication modulo : if then the above relation is equivalent to Now the polynomial can be factored over the field of rational numbers into irreducible factors called cyclotomic polynomials: if d is the number of divisors of N, including 1 and N, then where the cyclotomics are well known (Nussbaumer, 1981; Schroeder, 1986, Chapter 22). We may now invoke the CRT, and exploit the ring isomorphism it establishes to simplify the calculation of from and as follows:
When N is not too large, i.e. for `short cyclic convolutions', the are very simple, with coefficients 0 or ±1, so that (i) only involves a small number of additions. Furthermore, special techniques have been developed to multiply general polynomials modulo cyclotomic polynomials, thus helping keep the number of multiplications in (ii) and (iii) to a minimum. As a result, cyclic convolutions can be calculated rapidly when N is sufficiently composite.
It will be recalled that Rader's multiplicative indexing often gives rise to cyclic convolutions of length for p an odd prime. Since is highly composite for all other than 23 and 47, these cyclic convolutions can be performed more efficiently by the above procedure than by DFT.
These combined algorithms are due to Winograd (1977, 1978, 1980), and are known collectively as `Winograd small FFT algorithms'. Winograd also showed that they can be thought of as bringing the DFT matrix F to the following `normal form': where
The elements on the diagonal of B can be shown to be either real or pure imaginary, by the same argument as in Section 1.3.3.2.3.1. Matrices A and C may be rectangular rather than square, so that intermediate results may require extra storage space.
References
Apostol, T. M. (1976). Introduction to analytic number theory. New York: Springer-Verlag.Google ScholarBlahut, R. E. (1985). Fast algorithms for digital signal processing. Reading: Addison-Wesley.Google Scholar
Cochran, W. T., Cooley, J. W., Favin, D. L., Helms, H. D., Kaenel, R. A., Lang, W. W., Maling, G. C., Nelson, D. E., Rader, C. M. & Welch, P. D. (1967). What is the fast Fourier transform? IEEE Trans. Audio, 15, 45–55.Google Scholar
Eklundh, J. O. (1972). A fast computer method for matrix transposing. IEEE Trans. C-21, 801–803.Google Scholar
Gentleman, W. M. & Sande, G. (1966). Fast Fourier transforms – for fun and profit. In AFIPS Proc. 1966 Fall Joint Computer Conference, pp. 563–578. Washington, DC: Spartan Books.Google Scholar
Good, I. J. (1971). The relationship between two fast Fourier transforms. IEEE Trans. C-20, 310–317.Google Scholar
Nussbaumer, H. J. (1981). Fast Fourier transform and convolution algorithms. Berlin: Springer-Verlag.Google Scholar
Rader, C. M. (1968). Discrete Fourier transforms when the number of data samples is prime. Proc. IEEE, 56, 1107–1108.Google Scholar
Schroeder, M. R. (1986). Number theory in science and communication, 2nd ed. Berlin: Springer-Verlag.Google Scholar
Singleton, R. C. (1969). An algorithm for computing the mixed radix fast Fourier transform. IEEE Trans. AU, 17, 93–103.Google Scholar
Winograd, S. (1976). On computing the discrete Fourier transform. Proc. Natl Acad. Sci. USA, 73, 1005–1006.Google Scholar
Winograd, S. (1977). Some bilinear forms whose multiplicative complexity depends on the field of constants. Math. Syst. Theor. 10, 169–180.Google Scholar
Winograd, S. (1978). On computing the discrete Fourier transform. Math. Comput. 32, 175–199.Google Scholar
Winograd, S. (1980). Arithmetic complexity of computations. CBMS-NST Regional Conf. Series Appl. Math, Publ. No. 33. Philadelphia: SIAM Publications.Google Scholar