International
Tables for
Crystallography
Volume B
Reciprocal space
Edited by U. Shmueli

International Tables for Crystallography (2006). Vol. B. ch. 1.3, pp. 50-55   | 1 | 2 |

Section 1.3.3.2. One-dimensional algorithms

G. Bricognea

a MRC Laboratory of Molecular Biology, Hills Road, Cambridge CB2 2QH, England, and LURE, Bâtiment 209D, Université Paris-Sud, 91405 Orsay, France

1.3.3.2. One-dimensional algorithms

| top | pdf |

Throughout this section we will denote by [e(t)] the expression [\exp (2 \pi it)], [t \in {\bb R}]. The mapping [t \;\longmapsto\; e(t)] has the following properties: [\eqalign{e(t_{1} + t_{2}) &= e(t_{1}) e(t_{2}) \cr e(-t) &= \overline{e(t)} = [e(t)]^{-1} \cr e(t) &= 1 \Leftrightarrow t \in {\bb Z}.}] Thus e defines an isomorphism between the additive group [{\bb R} /{\bb Z}] (the reals modulo the integers) and the multiplicative group of complex numbers of modulus 1. It follows that the mapping [\ell \;\longmapsto\; e(\ell/N)], where [\ell \in {\bb Z}] and N is a positive integer, defines an isomorphism between the one-dimensional residual lattice [{\bb Z}/N {\bb Z}] and the multiplicative group of Nth roots of unity.

The DFT on N points then relates vectors X and [{\bf X}^{*}] in W and [W^{*}] through the linear transformations: [\eqalign{&F(N): \quad X(k) = {1 \over N} {\sum\limits_{k^{*} \in {\bb Z}/N {\bb Z}}} X^{*} (k^{*}) e(-k^{*} k/N) \cr &\bar{F}(N): \quad X^{*} (k^{*}) = {\sum\limits_{k \in {\bb Z}/N {\bb Z}}} X(k) e(k^{*} k/N).}]

1.3.3.2.1. The Cooley–Tukey algorithm

| top | pdf |

The presentation of Gentleman & Sande (1966)[link] will be followed first [see also Cochran et al. (1967)[link]]. It will then be reinterpreted in geometric terms which will prepare the way for the treatment of multidimensional transforms in Section 1.3.3.3.[link]

Suppose that the number of sample points N is composite, say [N = N_{1} N_{2}]. We may write k to the base [N_{1}] and [k^{*}] to the base [N_{2}] as follows: [\eqalign{k &= k_{1} + N_{1} k_{2} \quad\; k_{1} \in {\bb Z}/N_{1} {\bb Z}, \quad k_{2} \in {\bb Z}/N_{2} {\bb Z} \cr k^{*} &= k_{2}^{*} + k_{1}^{*} N_{2} \quad\;k_{1}^{*} \in {\bb Z}/N_{1} {\bb Z}, \quad k_{2}^{*} \in {\bb Z}/N_{2} {\bb Z}.}] The defining relation for [\bar{F}(N)] may then be written: [\eqalign{X^{*} (k_{2}^{*} + k_{1}^{*} N_{2}) &= {\sum\limits_{k_{1} \in {\bb Z}/N_{1} {\bb Z}}}\; {\sum\limits_{k_{2} \in {\bb Z}/N_{2} {\bb Z}}} X (k_{1} + N_{1} k_{2}) \cr &\quad \times e \left[{(k_{2}^{*} + k_{1}^{*} N_{2}) (k_{1} + N_{1} k_{2}) \over N_{1} N_{2}}\right].}] The argument of [e[.]] may be expanded as [{k_{2}^{*} k_{1} \over N} + {k_{1}^{*} k_{1} \over N_{1}} + {k_{2}^{*} k_{2} \over N_{2}} + k_{1}^{*} k_{2},] and the last summand, being an integer, may be dropped: [\eqalign{&X^{*} (k_{2}^{*} + k_{1}^{*} N_{2})\cr &\quad = {\sum\limits_{k_{1}}} \left\{e \left({k_{2}^{*} k_{1} \over N}\right) \left[{\sum\limits_{k_{2}}}\; X (k_{1} + N_{1} k_{2}) e \left({k_{2}^{*} k_{2} \over N_{2}}\right)\right]\right\}\cr &\qquad \times e \left({k_{1}^{*} k_{1} \over N_{1}}\right).}] This computation may be decomposed into five stages, as follows:

  • (i) form the [N_{1}] vectors [{\bf Y}_{k_{1}}] of length [N_{2}] by the prescription [Y_{k_{1}} (k_{2}) = X (k_{1} + N_{1} k_{2}),\quad k_{1} \in {\bb Z}/N_{1} {\bb Z},\quad k_{2} \in {\bb Z}/N_{2} {\bb Z}\hbox{;}]

  • (ii) calculate the [N_{1}] transforms [{\bf Y}_{k_{1}}^{*}] on [N_{2}] points: [{\bf Y}_{k_{1}}^{*} = \bar{F} (N_{2}) [{\bf Y}_{k_{1}}],\quad k_{1} \in {\bb Z}/N_{1} {\bb Z}\hbox{;}]

  • (iii) form the [N_{2}] vectors [{\bf Z}_{k_{2}^{*}}] of length [N_{1}] by the prescription [{\bf Z}_{k_{2}^{*}} (k_{1}) = e \left({k_{2}^{*} k_{1} \over N}\right) Y_{k_{1}}^{*} (k_{2}^{*}),\quad k_{1} \in {\bb Z}/N_{1} {\bb Z},\quad k_{2}^{*} \in {\bb Z}/N_{2} {\bb Z}\hbox{;}]

  • (iv) calculate the [N_{2}] transforms [{\bf Z}_{k_{2}^{*}}^{*}] on [N_{1}] points: [{\bf Z}_{k_{2}^{*}}^{*} = \bar{F} (N_{1}) [{\bf Z}_{k_{2}^{*}}],\quad k_{2}^{*} \in {\bb Z}/N_{2} {\bb Z}\hbox{;}]

  • (v) collect [X^{*} (k_{2}^{*} + k_{1}^{*} N_{2})] as [Z_{k_{2}^{*}}^{*} (k_{1}^{*})].

If the intermediate transforms in stages (ii)[link] and (iv)[link] are performed in place, i.e. with the results overwriting the data, then at stage (v)[link] the result [X^{*} (k_{2}^{*} + k_{1}^{*} N_{2})] will be found at address [k_{1}^{*} + N_{1} k_{2}^{*}]. This phenomenon is called scrambling by `digit reversal', and stage (v)[link] is accordingly known as unscrambling.

The initial N-point transform [\bar{F} (N)] has thus been performed as [N_{1}] transforms [\bar{F} (N_{2})] on [N_{2}] points, followed by [N_{2}] transforms [\bar{F} (N_{1})] on [N_{1}] points, thereby reducing the arithmetic cost from [(N_{1} N_{2})^{2}] to [N_{1} N_{2} (N_{1} + N_{2})]. The phase shifts applied at stage (iii)[link] are traditionally called `twiddle factors', and the transposition between [k_{1}] and [k_{2}^{*}] can be performed by the fast recursive technique of Eklundh (1972)[link]. Clearly, this procedure can be applied recursively if [N_{1}] and [N_{2}] are themselves composite, leading to an overall arithmetic cost of order N log N if N has no large prime factors.

The Cooley–Tukey factorization may also be derived from a geometric rather than arithmetic argument. The decomposition [k = k_{1} + N_{1} k_{2}] is associated to a geometric partition of the residual lattice [{\bb Z}/N {\bb Z}] into [N_{1}] copies of [{\bb Z}/N_{2} {\bb Z}], each translated by [k_{1} \in {\bb Z}/N_{1} {\bb Z}] and `blown up' by a factor [N_{1}]. This partition in turn induces a (direct sum) decomposition of X as [{\bf X} = {\textstyle\sum\limits_{k_{1}}}\; {\bf X}_{k_{1}},] where [\eqalign{X_{k_{1}} (k) &= X (k)\quad \hbox{if } k \equiv k_{1} \hbox{ mod } N_{1},\cr &= 0{\hbox to 23pt{}} \hbox{otherwise}.}]

According to (i)[link], [{\bf X}_{k_{1}}] is related to [{\bf Y}_{k_{1}}] by decimation by [N_{1}] and offset by [k_{1}]. By Section 1.3.2.7.2[link], [\bar{F} (N) [{\bf X}_{k_{1}}]] is related to [\bar{F} (N_{2}) [{\bf Y}_{k_{1}}]] by periodization by [N_{2}] and phase shift by [e (k^{*} k_{1}/N)], so that [X^{*} (k^{*}) = {\sum\limits_{k_{1}}} e \left({k^{*} k_{1} \over N}\right) Y_{k_{1}}^{*} (k_{2}^{*}),] the periodization by [N_{2}] being reflected by the fact that [Y_{k_{1}}^{*}] does not depend on [k_{1}^{*}]. Writing [k^{*} = k_{2}^{*} + k_{1}^{*} N_{2}] and expanding [k^{*} k_{1}] shows that the phase shift contains both the twiddle factor [e (k_{2}^{*} k_{1}/N)] and the kernel [e (k_{1}^{*} k_{1}/N_{1})] of [\bar{F} (N_{1})]. The Cooley–Tukey algorithm is thus naturally associated to the coset decomposition of a lattice modulo a sublattice (Section 1.3.2.7.2[link]).

It is readily seen that essentially the same factorization can be obtained for [F(N)], up to the complex conjugation of the twiddle factors. The normalizing constant [1/N] arises from the normalizing constants [1/N_{1}] and [1/N_{2}] in [F (N_{1})] and [F (N_{2})], respectively.

Factors of 2 are particularly simple to deal with and give rise to a characteristic computational structure called a `butterfly loop'. If [N = 2M], then two options exist:

  • (a) using [N_{1} = 2] and [N_{2} = M] leads to collecting the even-numbered coordinates of X into [{\bf Y}_{0}] and the odd-numbered coordinates into [{\bf Y}_{1}] [\eqalign{Y_{0} (k_{2}) &= X (2k_{2}),\quad \qquad k_{2} = 0, \ldots, M - 1,\cr Y_{1} (k_{2}) &= X (2k_{2} + 1),\quad \;k_{2} = 0, \ldots, M - 1,}] and writing: [\eqalign{X^{*} (k_{2}^{*}) = \;&Y_{0}^{*} (k_{2}^{*}) + e (k_{2}^{*}/N) Y_{1}^{*} (k_{2}^{*}),\cr \quad &k_{2}^{*} = 0, \ldots, M - 1\hbox{;}\cr X^{*} (k_{2}^{*} + M) =\; &Y_{0}^{*} (k_{2}^{*}) - e (k_{2}^{*}/N) Y_{1}^{*} (k_{2}^{*}),\cr &k_{2}^{*} = 0, \ldots, M - 1.}] This is the original version of Cooley & Tukey, and the process of formation of [{\bf Y}_{0}] and [{\bf Y}_{1}] is referred to as `decimation in time' (i.e. decimation along the data index k).

  • (b) using [N_{1} = M] and [N_{2} = 2] leads to forming [\eqalign{Z_{0} (k_{1}) &= X (k_{1}) + X (k_{1} + M),\;\;\quad \quad \quad \qquad k_{1} = 0, \ldots, M - 1,\cr Z_{1} (k_{1}) &= [X (k_{1}) - X (k_{1} + M)] e \left({k_{1} \over N}\right),{\hbox to 20pt{}} k_{1} = 0, \ldots, M - 1,}] then obtaining separately the even-numbered and odd-numbered components of [{\bf X}^{*}] by transforming [{\bf Z}_{0}] and [{\bf Z}_{1}]: [\eqalign{X^{*} (2k_{1}^{*}) &= Z_{0}^{*} (k_{1}^{*}),\quad k_{1}^{*} = 0, \ldots, M - 1\hbox{;}\cr X^{*} (2k_{1}^{*} + 1) &= Z_{1}^{*} (k_{1}^{*}),\quad k_{1}^{*} = 0, \ldots, M - 1.}] This version is due to Sande (Gentleman & Sande, 1966[link]), and the process of separately obtaining even-numbered and odd-numbered results has led to its being referred to as `decimation in frequency' (i.e. decimation along the result index [k^{*}]).

By repeated factoring of the number N of sample points, the calculation of [F(N)] and [\bar{F} (N)] can be reduced to a succession of stages, the smallest of which operate on single prime factors of N. The reader is referred to Gentleman & Sande (1966)[link] for a particularly lucid analysis of the programming considerations which help implement this factorization efficiently; see also Singleton (1969)[link]. Powers of two are often grouped together into factors of 4 or 8, which are advantageous in that they require fewer complex multiplications than the repeated use of factors of 2. In this approach, large prime factors P are detrimental, since they require a full [P^{2}]-size computation according to the defining formula.

1.3.3.2.2. The Good (or prime factor) algorithm

| top | pdf |

1.3.3.2.2.1. Ring structure on [{\bb Z}/N{\bb Z}]

| top | pdf |

The set [{\bb Z}/N {\bb Z}] of congruence classes of integers modulo an integer N [see e.g. Apostol (1976)[link], Chapter 5] inherits from [{\bb Z}] not only the additive structure used in deriving the Cooley–Tukey factorization, but also a multiplicative structure in which the product of two congruence classes mod N is uniquely defined as the class of the ordinary product (in [{\bb Z}]) of representatives of each class. The multiplication can be distributed over addition in the usual way, endowing [{\bb Z}/N {\bb Z}] with the structure of a commutative ring.

If N is composite, the ring [{\bb Z}/N {\bb Z}] has zero divisors. For example, let [N = N_{1} N_{2}], let [n_{1} \equiv N_{1}] mod N, and let [n_{2} \equiv N_{2}] mod N: then [n_{1} n_{2} \equiv 0] mod N. In the general case, a product of non-zero elements will be zero whenever these elements collect together all the factors of N. These circumstances give rise to a fundamental theorem in the theory of commutative rings, the Chinese Remainder Theorem (CRT), which will now be stated and proved [see Apostol (1976[link]), Chapter 5; Schroeder (1986[link]), Chapter 16].

1.3.3.2.2.2. The Chinese remainder theorem

| top | pdf |

Let [N = N_{1} N_{2} \ldots N_{d}] be factored into a product of pairwise coprime integers, so that g.c.d. [(N_{i}, N_{j}) = 1] for [i \neq j]. Then the system of congruence equations [{\ell} \equiv {\ell}_{j} \hbox{ mod } N_{j},\qquad j = 1, \ldots, d,] has a unique solution [\ell] mod N. In other words, each [\ell \in {\bb Z}/N {\bb Z}] is associated in a one-to-one fashion to the d-tuple [(\ell_{1}, \ell_{2}, \ldots, \ell_{d})] of its residue classes in [{\bb Z}/N_{1} {\bb Z}, {\bb Z}/N_{2} {\bb Z}, \ldots, {\bb Z}/N_{d} {\bb Z}].

The proof of the CRT goes as follows. Let [Q_{j} = {N \over N_{j}} = {\prod\limits_{i \neq j}}\; N_{i}.] Since g.c.d. [(N_{j}, Q_{j}) = 1] there exist integers [n_{j}] and [q_{j}] such that [n_{j} N_{j} + q_{j} Q_{j} = 1,\qquad j = 1, \ldots, d,] then the integer [{\ell} = {\textstyle\sum\limits_{i = 1}^{d}}\; {\ell}_{i} q_{i} Q_{i} \hbox{ mod } N] is the solution. Indeed, [{\ell} \equiv {\ell}_{j} q_{j} Q_{j} \hbox{ mod } N_{j}] because all terms with [i \neq j] contain [N_{j}] as a factor; and [q_{j} Q_{j} \equiv 1 \hbox{ mod } N_{j}] by the defining relation for [q_{j}].

It may be noted that [\eqalign{(q_{i} Q_{i}) (q_{j} Q_{j}) &\equiv 0{\phantom{Q_j}}\quad\quad\hbox{ mod } N \hbox{ for } i \neq j,\cr (q_{j} Q_{j})^{2} &\equiv q_{j} Q_{j}\quad\quad\hbox{mod } N, \;\;j = 1, \ldots, d,}] so that the [q_{j} Q_{j}] are mutually orthogonal idempotents in the ring [{\bb Z}/N {\bb Z}], with properties formally similar to those of mutually orthogonal projectors onto subspaces in linear algebra. The analogy is exact, since by virtue of the CRT the ring [{\bb Z}/N {\bb Z}] may be considered as the direct product [{\bb Z}/N_{1} {\bb Z} \times {\bb Z}/N_{2} {\bb Z} \times \ldots \times {\bb Z}/N_{d} {\bb Z}] via the two mutually inverse mappings:

  • (i) [{\ell} \;\longmapsto\; (\ell_{1}, \ell_{2}, \ldots, \ell_{d})] by [\ell \equiv \ell_{j}] mod [N_{j}] for each j;

  • (ii) [(\ell_{1}, \ell_{2}, \ldots, \ell_{d}) \;\longmapsto\; \ell \hbox { by } \ell = {\textstyle\sum_{i = 1}^{d}} \ell_{i} q_{i} Q_{i}\hbox{ mod } N].

The mapping defined by (ii)[link] is sometimes called the `CRT reconstruction' of [\ell] from the [\ell_{j}].

These two mappings have the property of sending sums to sums and products to products, i.e: [\displaylines{\quad (\hbox{i})\hfill {\ell} + {\ell}' \;\longmapsto\; ({\ell}_{1} + {\ell}'_{1}, {\ell}_{2} + {\ell}'_{2}, \ldots, {\ell}_{d} + {\ell}'_{d}) \hfill\cr \hfill {\ell} {\ell}' \;\longmapsto\; ({\ell}_{1} {\ell}'_{1}, {\ell}_{2} {\ell}'_{2}, \ldots, {\ell}_{d} {\ell}'_{d}) \quad\;\;\; \phantom{(\hbox{i})} \hfill\cr \quad (\hbox{ii}) \hfill ({\ell}_{1} + {\ell}'_{1}, {\ell}_{2} + {\ell}'_{2}, \ldots, {\ell}_{d} + {\ell}'_{d}) \;\longmapsto\; {\ell} + {\ell}' \;\;\hfill\cr \hfill ({\ell}_{1} {\ell}'_{1}, {\ell}_{2} {\ell}'_{2}, \ldots, {\ell}_{d} {\ell}'_{d}) \;\longmapsto\; {\ell} {\ell}' \quad \phantom{(\hbox{i})} \;\;\;\;\hfill}] (the last proof requires using the properties of the idempotents [q_{j} Q_{j}]). This may be described formally by stating that the CRT establishes a ring isomorphism: [{\bb Z}/N {\bb Z} \cong ({\bb Z}/N_{1} {\bb Z}) \times \ldots \times ({\bb Z}/N_{d} {\bb Z}).]

1.3.3.2.2.3. The prime factor algorithm

| top | pdf |

The CRT will now be used to factor the N-point DFT into a tensor product of d transforms, the jth of length [N_{j}].

Let the indices k and [k^{*}] be subjected to the following mappings:

  • (i) [k \;\longmapsto\; (k_{1}, k_{2}, \ldots, k_{d}), k_{j} \in {\bb Z}/N_{j} {\bb Z}], by [k_{j} \equiv k] mod [N_{j}] for each j, with reconstruction formula [k = {\textstyle\sum\limits_{i = 1}^{d}} \;k_{i} q_{i} Q_{i} \hbox{ mod } N\hbox{;}]

  • (ii) [k^{*} \;\longmapsto\; (k_{1}^{*}, k_{2}^{*}, \ldots, k_{d}^{*}), k_{j}^{*} \in {\bb Z}/N_{j} {\bb Z}], by [k_{j}^{*} \equiv q_{j} k^{*}] mod [N_{j}] for each j, with reconstruction formula [k^{*} = {\textstyle\sum\limits_{i = 1}^{d}} \;k_{i}^{*} Q_{i} \hbox{ mod } N.]

Then [\eqalign{k^{*} k &= \left({\textstyle\sum\limits_{i = 1}^{d}}\; k_{i}^{*} Q_{i}\right) \left({\textstyle\sum\limits_{j = 1}^{d}} \;k_{j} q_{j} Q_{j}\right) \hbox{ mod } N\cr &= {\textstyle\sum\limits_{i, \, j = 1}^{d}} k_{i}^{*} k_{j} Q_{i} q_{j} Q_{j} \hbox{ mod } N.}] Cross terms with [i \neq j] vanish since they contain all the factors of N, hence [\eqalign{k^{*} k &= {\textstyle\sum\limits_{j = 1}^{d}}\; q_{j} Q_{j}^{2} k_{j}^{*} k_{j} \hbox{ mod } N\cr &= {\textstyle\sum\limits_{j = 1}^{d}} (1 - n_{j} N_{j}) Q_{j} k_{j}^{*} k_{j} \hbox{ mod } N.}] Dividing by N, which may be written as [N_{j} Q_{j}] for each j, yields [\eqalign{{k^{*} k \over N} &= {\sum\limits_{j = 1}^{d}} (1 - n_{j} N_{j}) {Q_{j} \over N_{j} Q_{j}} k_{j}^{*} k_{j} \hbox{ mod } 1\cr &= {\sum\limits_{j = 1}^{d}} \left({1 \over N_{j}} - n_{j}\right) k_{j}^{*} k_{j} \hbox{ mod } 1,}] and hence [{k^{*} k \over N} \equiv {\sum\limits_{j = 1}^{d}} {k_{j}^{*} k_{j} \over N_{j}} \hbox{ mod } 1.] Therefore, by the multiplicative property of [e(.)], [e \left({k^{*} k \over N}\right) \equiv \bigotimes\limits_{j = 1}^{d} e \left({k_{j}^{*} k_{j} \over N_{j}}\right).]

Let [{\bf X} \in L ({\bb Z}/N {\bb Z})] be described by a one-dimensional array [X(k)] indexed by k. The index mapping (i)[link] turns X into an element of [L ({\bb Z}/N_{1} {\bb Z} \times \ldots \times {\bb Z}/N_{d} {\bb Z})] described by a d-dimensional array [X (k_{1}, \ldots, k_{d})]; the latter may be transformed by [\bar{F} (N_{1}) \bigotimes \ldots \bigotimes \bar{F} (N_{d})] into a new array [X^{*} (k_{1}^{*}, k_{2}^{*}, \ldots, k_{d}^{*})]. Finally, the one-dimensional array of results [X^{*} (k^{*})] will be obtained by reconstructing [k^{*}] according to (ii)[link].

The prime factor algorithm, like the Cooley–Tukey algorithm, reindexes a 1D transform to turn it into d separate transforms, but the use of coprime factors and CRT index mapping leads to the further gain that no twiddle factors need to be applied between the successive transforms (see Good, 1971[link]). This makes up for the cost of the added complexity of the CRT index mapping.

The natural factorization of N for the prime factor algorithm is thus its factorization into prime powers: [\bar{F}(N)] is then the tensor product of separate transforms (one for each prime power factor [N_{j} = p_{j}^{\nu_{j}}]) whose results can be reassembled without twiddle factors. The separate factors [p_{j}] within each [N_{j}] must then be dealt with by another algorithm (e.g. Cooley–Tukey, which does require twiddle factors). Thus, the DFT on a prime number of points remains undecomposable.

1.3.3.2.3. The Rader algorithm

| top | pdf |

The previous two algorithms essentially reduce the calculation of the DFT on N points for N composite to the calculation of smaller DFTs on prime numbers of points, the latter remaining irreducible. However, Rader (1968)[link] showed that the p-point DFT for p an odd prime can itself be factored by invoking some extra arithmetic structure present in [{\bb Z} / p {\bb Z}].

1.3.3.2.3.1. N an odd prime

| top | pdf |

The ring [{\bb Z} / p {\bb Z} = \{0,1,2,\ldots,p - 1\}] has the property that its [p - 1] non-zero elements, called units, form a multiplicative group [U(p)]. In particular, all units [r \in U(p)] have a unique multiplicative inverse in [{\bb Z} / p {\bb Z}], i.e. a unit [s \in U(p)] such that [rs \equiv 1\hbox { mod } p]. This endows [{\bb Z} / p {\bb Z}] with the structure of a finite field.

Furthermore, [U(p)] is a cyclic group, i.e. consists of the successive powers [g^{m}\hbox{ mod } p] of a generator g called a primitive root mod p (such a g may not be unique, but it always exists). For instance, for [p = 7], [U(7) = \{1,2,3,4,5,6\}] is generated by [g = 3], whose successive powers mod 7 are: [g^{0} = 1, \quad g^{1} = 3, \quad g^{2} = 2, \quad g^{3} = 6, \quad g^{4} = 4, \quad g^{5} = 5] [see Apostol (1976[link]), Chapter 10].

The basis of Rader's algorithm is to bring to light a hidden regularity in the matrix [F(p)] by permuting the basis vectors [{\bf u}_{k}] and [{\bf v}_{k^{*}}] of [L({\bb Z} / p {\bb Z})] as follows: [\eqalign{{\bf u}'_{0} &= {\bf u}_{0} \cr {\bf u}'_{m} &= {\bf u}_{k} {\hbox to 12pt{}}\hbox{with } k = g^{m}, {\hbox to 15pt{}} m = 1, \ldots, p - 1\hbox{;} \cr {\bf v}'_{0} &= {\bf v}_{0} \cr {\bf v}'_{m^{*}} &= {\bf v}_{k^{*}} \quad \hbox{with } k^{*} = g^{m^{*}}, \quad m^{*} = 1, \ldots, p - 1\hbox{;}}] where g is a primitive root mod p.

With respect to these new bases, the matrix representing [\bar{F}(p)] will have the following elements: [\eqalign{\hbox{element } (0,0) &= 1 \cr \hbox{element } (0, m + 1) &= 1 \quad \hbox{for all } m = 0, \ldots p - 2, \cr \hbox{element } (m^{*} + 1,0) &= 1 \quad \hbox{for all } m^{*} = 0, \ldots, p - 2, \cr \hbox{element } (m^{*} + 1, m + 1) &= e \left({k^{*}k \over p}\right) \cr &= e(g^{(m^{*} + m)/p}) \cr &\qquad \quad \hbox{for all } m^{*} = 0, \ldots, p - 2.}] Thus the `core' [\bar{C}(p)] of matrix [\bar{F}(p)], of size [(p - 1) \times (p - 1)], formed by the elements with two non-zero indices, has a so-called skew-circulant structure because element [(m^{*}, m)] depends only on [m^{*} + m]. Simplification may now occur because multiplication by [\bar{C}(p)] is closely related to a cyclic convolution. Introducing the notation [C(m) = e(g^{m/p})] we may write the relation [{\bf Y}^{*} = \bar{F}(p){\bf Y}] in the permuted bases as [\eqalign{Y^{*} (0) &= {\textstyle\sum\limits_{k}} Y(k) \cr Y^{*} (m^{*} + 1) &= Y(0) + {\textstyle\sum\limits_{m = 0}^{p - 2}} C(m^{*} + m) Y(m + 1) \cr &= Y(0) + {\textstyle\sum\limits_{m = 0}^{p - 2}} C(m^{*} - m) Z(m) \cr &= Y(0) + ({\bf C} * {\bf Z}) (m^{*}), \quad m^{*} = 0, \ldots, p - 2,}] where Z is defined by [Z(m) = Y(p - m - 2)], [m = 0, \ldots, p - 2].

Thus [{\bf Y}^{*}] may be obtained by cyclic convolution of C and Z, which may for instance be calculated by [{\bf C} * {\bf Z} = F(p - 1) [\bar{F}(p - 1) [{\bf C}] \times \bar{F} (p - 1) [{\bf Z}]],] where × denotes the component-wise multiplication of vectors. Since p is odd, [p - 1] is always divisible by 2 and may even be highly composite. In that case, factoring [\bar{F} (p - 1)] by means of the Cooley–Tukey or Good methods leads to an algorithm of complexity p log p rather than [p^{2}] for [\bar{F}(p)]. An added bonus is that, because [g^{(p-1) / 2} = -1], the elements of [\bar{F} (p - 1) [{\bf C}]] can be shown to be either purely real or purely imaginary, which halves the number of real multiplications involved.

1.3.3.2.3.2. N a power of an odd prime

| top | pdf |

This idea was extended by Winograd (1976[link], 1978[link]) to the treatment of prime powers [N = p^{\nu}], using the cyclic structure of the multiplicative group of units [U(p^{\nu})]. The latter consists of all those elements of [{\bb Z} / p^{\nu} {\bb Z}] which are not divisible by p, and thus has [q_{\nu} = p^{\nu - 1} (p - 1)] elements. It is cyclic, and there exist primitive roots g modulo [p^{\nu}] such that [U(p^{\nu}) = \{1, g, g^{2}, g^{3}, \ldots, g^{q_{\nu} - 1}\}.] The [p^{\nu - 1}] elements divisible by p, which are divisors of zero, have to be treated separately just as 0 had to be treated separately for [N = p].

When [k^{*} \not\in U(p^{\nu})], then [k^{*} = pk_{1}^{*}] with [k_{1}^{*} \in {\bb Z} / p^{\nu - 1} {\bb Z}]. The results [X^{*} (pk_{1}^{*})] are p-decimated, hence can be obtained via the [p^{\nu - 1}]-point DFT of the [p^{\nu - 1}]-periodized data Y: [X^{*} (pk_{1}^{*}) = \bar{F} (p^{\nu - 1}) [{\bf Y}] (k_{1}^{*})] with [Y(k_{1}) = {\textstyle\sum\limits_{k_{2} \in {\bb Z} / p {\bb Z}}} X(k_{1} + p^{\nu - 1} k_{2}).]

When [k^{*} \in U(p^{\nu})], then we may write [X^{*} (k^{*}) = X_{0}^{*} (k^{*}) + X_{1}^{*} (k^{*}),] where [{\bf X}_{0}^{*}] contains the contributions from [k\; \notin\; U(p^{\nu})] and [{\bf X}_{1}^{*}] those from [k \in U(p^{\nu})]. By a converse of the previous calculation, [{\bf X}_{0}^{*}] arises from p-decimated data Z, hence is the [p^{\nu - 1}]-periodization of the [p^{\nu - 1}]-point DFT of these data: [X_{0}^{*} (p^{\nu - 1} k_{1}^{*} + k_{2}^{*}) = \bar{F} (p^{\nu - 1}) [{\bf Z}] (k_{2}^{*})] with [Z(k_{2}) = X(pk_{2}), \qquad k_{2} \in {\bb Z} / p^{\nu - 1} {\bb Z}] (the [p^{\nu - 1}]-periodicity follows implicity from the fact that the transform on the right-hand side is independent of [k_{1}^{*} \in {\bb Z} / p {\bb Z}]).

Finally, the contribution [X_{1}^{*}] from all [k \in U(p^{\nu})] may be calculated by reindexing by the powers of a primitive root g modulo [p^{\nu}], i.e. by writing [X_{1}^{*} (g^{m^{*}}) = {\textstyle\sum\limits_{m = 0}^{q_{\nu} - 1}} X(g^{m}) e(g^{(m^{*} + m) / p^{\nu}})] then carrying out the multiplication by the skew-circulant matrix core as a convolution.

Thus the DFT of size [p^{\nu}] may be reduced to two DFTs of size [p^{\nu - 1}] (dealing, respectively, with p-decimated results and p-decimated data) and a convolution of size [q_{\nu} = p^{\nu - 1} (p - 1)]. The latter may be `diagonalized' into a multiplication by purely real or purely imaginary numbers (because [g^{(q_{\nu} / 2)} = -1]) by two DFTs, whose factoring in turn leads to DFTs of size [p^{\nu - 1}] and [p - 1]. This method, applied recursively, allows the complete decomposition of the DFT on [p^{\nu}] points into arbitrarily small DFTs.

1.3.3.2.3.3. N a power of 2

| top | pdf |

When [N = 2^{\nu}], the same method can be applied, except for a slight modification in the calculation of [{\bf X}_{1}^{*}]. There is no primitive root modulo [2^{\nu}] for [\nu \gt 2]: the group [U(2^{\nu})] is the direct product of two cyclic groups, the first (of order 2) generated by −1, the second (of order [N/4]) generated by 3 or 5. One then uses a representation [\eqalign{k &= (-1)^{m_{1}} 5^{m_{2}} \cr k^{*} &= (-1)^{m_{1}^{*}} 5^{m_{2}^{*}}}] and the reindexed core matrix gives rise to a two-dimensional convolution. The latter may be carried out by means of two 2D DFTs on [2 \times (N/4)] points.

1.3.3.2.4. The Winograd algorithms

| top | pdf |

The cyclic convolutions generated by Rader's multiplicative reindexing may be evaluated more economically than through DFTs if they are re-examined within a new algebraic setting, namely the theory of congruence classes of polynomials [see, for instance, Blahut (1985[link]), Chapter 2; Schroeder (1986[link]), Chapter 24].

The set, denoted [{\bb K}[X]], of polynomials in one variable with coefficients in a given field [{\bb K}] has many of the formal properties of the set [{\bb Z}] of rational integers: it is a ring with no zero divisors and has a Euclidean algorithm on which a theory of divisibility can be built.

Given a polynomial [P(z)], then for every [W(z)] there exist unique polynomials [Q(z)] and [R(z)] such that [W(z) = P(z) Q(z) + R(z)] and [\hbox{degree } (R) \;\lt\; \hbox{degree } (P).] [R(z)] is called the residue of [H(z)] modulo [P(z)]. Two polynomials [H_{1}(z)] and [H_{2}(z)] having the same residue modulo [P(z)] are said to be congruent modulo [P(z)], which is denoted by [H_{1}(z) \equiv H_{2}(z) \hbox{ mod } P(z).]

If [H(z) \equiv 0\hbox{ mod } P(z),\; H(z)] is said to be divisible by [P(z)]. If [H(z)] only has divisors of degree zero in [{\bb K}[X]], it is said to be irreducible over [{\bb K}] (this notion depends on [{\bb K}]). Irreducible polynomials play in [{\bb K}[X]] a role analogous to that of prime numbers in [{\bb Z}], and any polynomial over [{\bb K}] has an essentially unique factorization as a product of irreducible polynomials.

There exists a Chinese remainder theorem (CRT) for polynomials. Let [P(z) = P_{1}(z) \ldots P_{d}(z)] be factored into a product of pairwise coprime polynomials [i.e. [P_{i}(z)] and [P_{j}(z)] have no common factor for [i \neq j]]. Then the system of congruence equations [H(z) \equiv H_{j}(z) \hbox{ mod } P_{j}(z), \quad j = 1, \ldots, d,] has a unique solution [H(z)] modulo [P(z)]. This solution may be constructed by a procedure similar to that used for integers. Let [Q_{j}(z) = P(z) / P_{j}(z) = {\textstyle\prod\limits_{i \neq j}} \;P_{i}(z).] Then [P_{j}] and [Q_{j}] are coprime, and the Euclidean algorithm may be used to obtain polynomials [p_{j}(z)] and [q_{j}(z)] such that [p_{j}(z) P_{j}(z) + q_{j}(z) Q_{j}(z) = 1.] With [S_{i}(z) = q_{i}(z) Q_{i}(z)], the polynomial [H(z) = {\textstyle\sum\limits_{i = 1}^{d}} \;S_{i}(z) H_{i}(z) \hbox{ mod } P(z)] is easily shown to be the desired solution.

As with integers, it can be shown that the 1:1 correspondence between [H(z)] and [H_{j}(z)] sends sums to sums and products to products, i.e. establishes a ring isomorphism: [{\bb K}[X] \hbox{ mod } P \cong ({\bb K}[X] \hbox{ mod } P_{1}) \times \ldots \times ({\bb K}[X] \hbox{ mod } P_{d}).]

These results will now be applied to the efficient calculation of cyclic convolutions. Let [{\bf U} = (u_{0}, u_{1}, \ldots, u_{N - 1})] and [{\bf V} = (v_{0}, v_{1}, \ldots, v_{N - 1})] be two vectors of length N, and let [{\bf W} = (w_{0}, w_{1}, \ldots, w_{N - 1})] be obtained by cyclic convolution of U and V: [w_{n} = {\textstyle\sum\limits_{m = 0}^{N - 1}} u_{m} v_{n - m}, \quad n = 0, \ldots, N - 1.] The very simple but crucial result is that this cyclic convolution may be carried out by polynomial multiplication modulo [(z^{N} - 1)]: if [\eqalign{U(z) &= {\textstyle\sum\limits_{l = 0}^{N - 1}} u_{l} z^{l} \cr V(z) &= {\textstyle\sum\limits_{m = 0}^{N - 1}} v_{m} z^{m} \cr W(z) &= {\textstyle\sum\limits_{n = 0}^{N - 1}} w_{n} z^{n}}] then the above relation is equivalent to [W(z) \equiv U(z) V(z) \hbox{ mod } (z^{N} - 1).] Now the polynomial [z^{N} - 1] can be factored over the field of rational numbers into irreducible factors called cyclotomic polynomials: if d is the number of divisors of N, including 1 and N, then [z^{N} - 1 = {\textstyle\prod\limits_{i = 1}^{d}} P_{i}(z),] where the cyclotomics [P_{i}(z)] are well known (Nussbaumer, 1981[link]; Schroeder, 1986[link], Chapter 22). We may now invoke the CRT, and exploit the ring isomorphism it establishes to simplify the calculation of [W(z)] from [U(z)] and [V(z)] as follows:

  • (i) compute the d residual polynomials [\eqalign{U_{i}(z) &\equiv U(z) \hbox{ mod } P_{i}(z), \quad i = 1, \ldots, d,\cr V_{i}(z) &\equiv V(z) \hbox{ mod } P_{i}(z), \quad i = 1, \ldots, d\hbox{;}}]

  • (ii) compute the d polynomial products [W_{i}(z) \equiv U_{i}(z) V_{i}(z) \hbox{ mod } P_{i}(z), \quad i = 1, \ldots, d\hbox{;}]

  • (iii) use the CRT reconstruction formula just proved to recover [W(z)] from the [W_{i} (z)]: [W (z) \equiv {\textstyle\sum\limits_{i = 1}^{d}} S_{i} (z) W_{i} (z) \hbox{ mod } (z^{N} - 1).]

When N is not too large, i.e. for `short cyclic convolutions', the [P_{i} (z)] are very simple, with coefficients 0 or ±1, so that (i)[link] only involves a small number of additions. Furthermore, special techniques have been developed to multiply general polynomials modulo cyclotomic polynomials, thus helping keep the number of multiplications in (ii)[link] and (iii)[link] to a minimum. As a result, cyclic convolutions can be calculated rapidly when N is sufficiently composite.

It will be recalled that Rader's multiplicative indexing often gives rise to cyclic convolutions of length [p - 1] for p an odd prime. Since [p - 1] is highly composite for all [p \leq 50] other than 23 and 47, these cyclic convolutions can be performed more efficiently by the above procedure than by DFT.

These combined algorithms are due to Winograd (1977[link], 1978[link], 1980[link]), and are known collectively as `Winograd small FFT algorithms'. Winograd also showed that they can be thought of as bringing the DFT matrix F to the following `normal form': [{\bf F} = {\bf CBA},] where

  • A is an integer matrix with entries 0, [\pm 1], defining the `pre-additions',

  • B is a diagonal matrix of multiplications,

  • C is a matrix with entries 0, [\pm 1], [\pm i], defining the `post-additions'.

The elements on the diagonal of B can be shown to be either real or pure imaginary, by the same argument as in Section 1.3.3.2.3.1.[link] Matrices A and C may be rectangular rather than square, so that intermediate results may require extra storage space.

References

First citation Apostol, T. M. (1976). Introduction to analytic number theory. New York: Springer-Verlag.Google Scholar
First citation Blahut, R. E. (1985). Fast algorithms for digital signal processing. Reading: Addison-Wesley.Google Scholar
First citation Cochran, W. T., Cooley, J. W., Favin, D. L., Helms, H. D., Kaenel, R. A., Lang, W. W., Maling, G. C., Nelson, D. E., Rader, C. M. & Welch, P. D. (1967). What is the fast Fourier transform? IEEE Trans. Audio, 15, 45–55.Google Scholar
First citation Eklundh, J. O. (1972). A fast computer method for matrix transposing. IEEE Trans. C-21, 801–803.Google Scholar
First citation Gentleman, W. M. & Sande, G. (1966). Fast Fourier transforms – for fun and profit. In AFIPS Proc. 1966 Fall Joint Computer Conference, pp. 563–578. Washington, DC: Spartan Books.Google Scholar
First citation Good, I. J. (1971). The relationship between two fast Fourier transforms. IEEE Trans. C-20, 310–317.Google Scholar
First citation Nussbaumer, H. J. (1981). Fast Fourier transform and convolution algorithms. Berlin: Springer-Verlag.Google Scholar
First citation Rader, C. M. (1968). Discrete Fourier transforms when the number of data samples is prime. Proc. IEEE, 56, 1107–1108.Google Scholar
First citation Schroeder, M. R. (1986). Number theory in science and communication, 2nd ed. Berlin: Springer-Verlag.Google Scholar
First citation Singleton, R. C. (1969). An algorithm for computing the mixed radix fast Fourier transform. IEEE Trans. AU, 17, 93–103.Google Scholar
First citation Winograd, S. (1976). On computing the discrete Fourier transform. Proc. Natl Acad. Sci. USA, 73, 1005–1006.Google Scholar
First citation Winograd, S. (1977). Some bilinear forms whose multiplicative complexity depends on the field of constants. Math. Syst. Theor. 10, 169–180.Google Scholar
First citation Winograd, S. (1978). On computing the discrete Fourier transform. Math. Comput. 32, 175–199.Google Scholar
First citation Winograd, S. (1980). Arithmetic complexity of computations. CBMS-NST Regional Conf. Series Appl. Math, Publ. No. 33. Philadelphia: SIAM Publications.Google Scholar








































to end of page
to top of page