Multidimensional factorization

Bricogne, G.

doi:10.1107/97809553602060000551

International
Tables for
Crystallography
Volume B
Reciprocal space
Edited by U. Shmueli

pdf | chapter contents | chapter index | related articles

International Tables for Crystallography (2006). Vol. B. ch. 1.3, pp. 55-57 | 1 | 2 |

Section 1.3.3.3.2. Multidimensional factorization

G. Bricogne^a

^a MRC Laboratory of Molecular Biology, Hills Road, Cambridge CB2 2QH, England, and LURE, Bâtiment 209D, Université Paris-Sud, 91405 Orsay, France

1.3.3.3.2. Multidimensional factorization

| top | pdf |

Substantial reductions in the arithmetic cost, as well as gains in flexibility, can be obtained if the factoring of the DFT is carried out in several dimensions simultaneously. The presentation given here is a generalization of that of Mersereau & Speake (1981), using the abstract setting established independently by Auslander, Tolimieri & Winograd (1982).

Let us return to the general n-dimensional setting of Section 1.3.2.7.4, where the DFT was defined for an arbitrary decimation matrix N by the formulae (where $[|{\bf N}|]$ denotes $[| \hbox{det }{\bf N}|]$ ): $[\eqalign{F ({\bf N})&:\quad X ({\bf k}) \;\;\;= {1 \over |{\bf N}|} {\sum\limits_{{\bf k}^{*}}} \;X^{*} ({\bf k}^{*}) e[-{\bf k}^{*} \cdot ({\bf N}^{-1} {\bf k})] \cr \bar{F} ({\bf N})&:\quad X^{*} ({\bf k}^{*}) = \phantom{{1 \over |{\bf N}|}} {\sum\limits_{{\bf k}}} \;X ({\bf k}) e[{\bf k}^{*} \cdot ({\bf N}^{-1} {\bf k})]}]$ with $[{\bf k} \in {\bb Z}^{n} / {\bf N} {\bb Z}^{n},\quad {\bf k}^{*} \in {\bb Z}^{n} / {\bf N}^{T} {\bb Z}^{n}.]$

1.3.3.3.2.1. Multidimensional Cooley–Tukey factorization

| top | pdf |

Let us now assume that this decimation can be factored into d successive decimations, i.e. that $[{\bf N} = {\bf N}_{1} {\bf N}_{2} \ldots {\bf N}_{d-1} {\bf N}_{d}]$ and hence $[{\bf N}^{T} = {\bf N}_{d}^{T} {\bf N}_{d - 1}^{T} \ldots {\bf N}_{2}^{T} {\bf N}_{1}^{T}.]$ Then the coset decomposition formulae corresponding to these successive decimations (Section 1.3.2.7.1) can be combined as follows: $[\eqalign{{\bb Z}^{n} &= \bigcup_{{\bf k}_{1}}\; ({\bf k}_{1} + {\bf N}_{1} {\bb Z}^{n}) \cr &= \bigcup_{{\bf k}_{1}}\; \left\{{\bf k}_{1} + {\bf N}_{1} \left[\bigcup_{{\bf k}_{2}}\; ({\bf k}_{2} + {\bf N}_{2} {\bb Z}^{n})\right]\right\} \cr &= \ldots \cr &= \bigcup_{{\bf k}_{1}} \ldots \bigcup_{{\bf k}_{d}}\; ({\bf k}_{1} + {\bf N}_{1} {\bf k}_{2} + \ldots + {\bf N}_{1} {\bf N}_{2} \times \ldots \times {\bf N}_{d - 1} {\bf k}_{d} + {\bf N} {\bb Z}^{n})}]$ with $[{\bf k}_{j} \in {\bb Z}^{n} / {\bf N}_{j} {\bb Z}^{n}]$ . Therefore, any $[{\bf k} \in {\bb Z} / {\bf N} {\bb Z}^{n}]$ may be written uniquely as $[{\bf k} = {\bf k}_{1} + {\bf N}_{1} {\bf k}_{2} + \ldots + {\bf N}_{1} {\bf N}_{2} \times \ldots \times {\bf N}_{d - 1} {\bf k}_{d}.]$ Similarly: $[\eqalign{{\bb Z}^{n} &= \bigcup_{{\bf k}_{d}^{*}}\; ({\bf k}_{d}^{*} + {\bf N}_{d}^{T} {\bb Z}^{n}) \cr &= \ldots \cr &= \bigcup_{{\bf k}_{d}^{*}} \ldots \bigcup_{{\bf k}_{1}^{*}} \;({\bf k}_{d}^{*} + {\bf N}_{d}^{T} {\bf k}_{d - 1}^{*} + \ldots + {\bf N}_{d}^{T} \times \ldots \times {\bf N}_{2}^{T} {\bf k}_{1}^{*} \cr &\quad + {\bf N}^{T} {\bb Z}^{n})}]$ so that any $[{\bf k}^{*} \in {\bb Z}^{n} / {\bf N}^{T} {\bb Z}^{n}]$ may be written uniquely as $[{\bf k}^{*} = {\bf k}_{d}^{*} + {\bf N}_{d}^{T} {\bf k}_{d - 1}^{*} + \ldots + {\bf N}_{d}^{T} \times \ldots \times {\bf N}_{2}^{T} {\bf k}_{1}^{*}]$ with $[{\bf k}_{j}^{*} \in {\bb Z}^{n} / {\bf N}_{j}^{T} {\bb Z}^{n}]$ . These decompositions are the vector analogues of the multi-radix number representation systems used in the Cooley–Tukey factorization.

We may then write the definition of $[\bar{F} ({\bf N})]$ with [d = 2] factors as $[\eqalign{X^{*} ({\bf k}_{2}^{*} + {\bf N}_{2}^{T} {\bf k}_{1}^{*}) &= {\textstyle\sum\limits_{{\bf k}_{1}}} {\textstyle\sum\limits_{{\bf k}_{2}}}\; X ({\bf k}_{1} + {\bf N}_{1} {\bf k}_{2}) \cr &\quad \times e[({\bf k}_{2}^{*T} + {\bf k}_{1}^{*T}{\bf N}_2) {\bf N}_{2}^{-1} {\bf N}_{1}^{-1} ({\bf k}_{1} + {\bf N}_{1} {\bf k}_{2})].}]$ The argument of e(–) may be expanded as $[{\bf k}_{2}^{*} \cdot ({\bf N}^{-1} {\bf k}_{1}) + {\bf k}_{1}^{*} \cdot ({\bf N}_{1}^{-1} {\bf k}_{1}) + {\bf k}_{2}^{*} \cdot ({\bf N}_{2}^{-1} {\bf k}_{2}) + {\bf k}_{1}^{*} \cdot {\bf k}_{2}.]$ The first summand may be recognized as a twiddle factor, the second and third as the kernels of $[\bar{F} ({\bf N}_{1})]$ and $[\bar{F} ({\bf N}_{2})]$ , respectively, while the fourth is an integer which may be dropped. We are thus led to a `vector-radix' version of the Cooley–Tukey algorithm , in which the successive decimations may be introduced in all n dimensions simultaneously by general integer matrices. The computation may be decomposed into five stages analogous to those of the one-dimensional algorithm of Section 1.3.3.2.1:

(i) form the $[|{\bf N}_{1}|]$ vectors $[{\bf Y}_{{\bf k}_{1}}]$ of shape $[{\bf N}_{2}]$ by $[Y_{{\bf k}_{1}} ({\bf k}_{2}) = X ({\bf k}_{1} + {\bf N}_{1} {\bf k}_{2}),\quad {\bf k}_{1} \in {\bb Z}^{n} / {\bf N}_{1} {\bb Z}^{n},\quad {\bf k}_{2} \in {\bb Z}^{n} / {\bf N}_{2} {\bb Z}^{n}\hbox{;}]$
(ii) calculate the $[|{\bf N}_{1}|]$ transforms $[{\bf Y}_{{\bf k}_{1}}^{*}]$ on $[|{\bf N}_{2}|]$ points: $[Y_{{\bf k}_{1}}^{*} ({\bf k}_{2}^{*}) = {\textstyle\sum\limits_{{\bf k}_{2}}}\; e[{\bf k}_{2}^{*} \cdot ({\bf N}_{2}^{-1} {\bf k}_{2})] Y_{{\bf k}_{1}} ({\bf k}_{2}),\quad {\bf k}_{1} \in {\bb Z}^{n} / {\bf N}_{1} {\bb Z}^{n}\hbox{;}]$
(iii) form the $[|{\bf N}_{2}|]$ vectors $[{\bf Z}_{{\bf k}_{2}^{*}}]$ of shape $[{\bf N}_{1}]$ by $[\displaylines{Z_{{\bf k}_{2}^{*}} ({\bf k}_{1}) = e[{\bf k}_{2}^{*} \cdot ({\bf N}^{-1} {\bf k}_{1})] Y_{{\bf k}_{1}}^{*} ({\bf k}_{2}^{*}),\quad {\bf k}_{1} \in {\bb Z}^{n} / {\bf N}_{1} {\bb Z}^{n},\cr {\bf k}_{2}^{*} \in {\bb Z}^{n} / {\bf N}_{2}^{T} {\bb Z}^{n}\hbox{;}}]$
(iv) calculate the $[|{\bf N}_{2}|]$ transforms $[{\bf Z}_{{\bf k}_{2}^{*}}^{*}]$ on $[|{\bf N}_{1}|]$ points: $[Z_{{\bf k}_{2}^{*}}^{*} ({\bf k}_{1}^{*}) = {\textstyle\sum\limits_{{\bf k}_{1}}}\; e[{\bf k}_{1}^{*} \cdot ({\bf N}_{1}^{-1} {\bf k}_{1})] Z_{{\bf k}_{2}^{*}} ({\bf k}_{1}),\quad {\bf k}_{2}^{*} \in {\bb Z}^{n} / {\bf N}_{2}^{T} {\bb Z}^{n}\hbox{;}]$
(v) collect $[X^{*} ({\bf k}_{2}^{*} + {\bf N}_{2}^{T} {\bf k}_{1}^{*})]$ as $[Z_{{\bf k}_{2}^{*}}^{*} ({\bf k}_{1}^{*})]$ .

The initial $[|{\bf N}|]$ -point transform $[\bar{F} ({\bf N})]$ can thus be performed as $[|{\bf N}_{1}|]$ transforms $[\bar{F} ({\bf N}_{2})]$ on $[|{\bf N}_{2}|]$ points, followed by $[|{\bf N}_{2}|]$ transforms $[\bar{F} ({\bf N}_{1})]$ on $[|{\bf N}_{1}|]$ points. This process can be applied successively to all d factors. The same decomposition applies to $[F ({\bf N})]$ , up to the complex conjugation of twiddle factors, the normalization factor $[1 / |{\bf N}|]$ being obtained as the product of the factors $[1 / |{\bf N}_{j}|]$ in the successive partial transforms $[F ({\bf N}_{j})]$ .

The geometric interpretation of this factorization in terms of partial transforms on translates of sublattices applies in full to this n-dimensional setting; in particular, the twiddle factors are seen to be related to the residual translations which place the sublattices in register within the big lattice. If the intermediate transforms are performed in place, then the quantity $[X^{*} ({\bf k}_{d}^{*} + {\bf N}_{d}^{T} {\bf k}_{d - 1}^{*} + \ldots + {\bf N}_{d}^{T} {\bf N}_{d - 1}^{T} \times \ldots \times {\bf N}_{2}^{T} {\bf k}_{1}^{*})]$ will eventually be found at location $[{\bf k}_{1}^{*} + {\bf N}_{1} {\bf k}_{2}^{*} + \ldots + {\bf N}_{1} {\bf N}_{2} \times \ldots \times {\bf N}_{d - 1} {\bf k}_{d}^{*},]$ so that the final results will have to be unscrambled by a process which may be called `coset reversal', the vector equivalent of digit reversal.

Factoring by 2 in all n dimensions simultaneously, i.e. taking $[{\bf N} = 2{\bf M}]$ , leads to `n-dimensional butterflies'. Decimation in time corresponds to the choice $[{\bf N}_{1} = 2{\bf I}, {\bf N}_{2} = {\bf M}]$ , so that $[{\bf k}_{1} \in {\bb Z}^{n} / 2{\bb Z}^{n}]$ is an n-dimensional parity class; the calculation then proceeds by $[\displaylines{Y_{{\bf k}_{1}} ({\bf k}_{2}) = X ({\bf k}_{1} + 2{\bf k}_{2}),\quad{\bf k}_{1} \in {\bb Z}^{n} / 2{\bb Z}^{n},\quad {\bf k}_{2} \in {\bb Z}^{n} / {\bf M}{\bb Z}^{n}, \cr Y_{{\bf k}_{1}}^{*} = \bar{F} ({\bf M}) [{\bf Y}_{{\bf k}_{1}}],\quad{\bf k}_{1} \in {\bb Z}^{n} / 2{\bb Z}^{n}\hbox{;} \cr \eqalign{X^{*} ({\bf k}_{2}^{*} + {\bf M}^{T} {\bf k}_{1}^{*}) &= {\textstyle\sum\limits_{{\bf k}_{1} \in {\bb Z}^{n} / 2{\bb Z}^{n}}} (-1)^{{\bf k}_{1}^{*} \cdot {\bf k}_{1}} \cr &\quad \times e[{\bf k}_{2}^{*} \cdot ({\bf N}^{-1} {\bf k}_{1})] Y_{{\bf k}_{1}}^{*} ({\bf k}_{2}^{*}).}\cr}]$ Decimation in frequency corresponds to the choice $[{\bf N}_{1} = {\bf M}]$ , $[{\bf N}_{2} = 2{\bf I}]$ , so that $[{\bf k}_{2} \in {\bb Z}^{n} / 2{\bb Z}^{n}]$ labels `octant' blocks of shape M; the calculation then proceeds through the following steps: $[\eqalign{Z_{{\bf k}_{2}^{*}} ({\bf k}_{1}) &= \left[{\textstyle\sum\limits_{{\bf k}_{2} \in {\bb Z}^{n} / 2{\bb Z}^{n}}} (-1)^{{\bf k}_{2}^{*} \cdot {\bf k}_{2}} X ({\bf k}_{1} + {\bf M}{\bf k}_{2})\right] \cr &\quad \times e[{\bf k}_{2}^{*} \cdot ({\bf N}^{-1} {\bf k}_{1})], \cr {\bf Z}_{{\bf k}_{2}^{*}}^{*} &= \bar{F} ({\bf M}) [{\bf Z}_{{\bf k}_{2}^{*}}], \cr X^{*} ({\bf k}_{2}^{*} + 2{\bf k}_{1}^{*}) &= Z_{{\bf k}_{2}^{*}}^{*} ({\bf k}_{1}^{*}),}]$ i.e. the $[2^{n}]$ parity classes of results, corresponding to the different $[{\bf k}_{2}^{*} \in {\bb Z}^{n} / 2{\bb Z}^{n}]$ , are obtained separately. When the dimension n is 2 and the decimating matrix is diagonal, this analysis reduces to the `vector radix FFT' algorithms proposed by Rivard (1977) and Harris et al. (1977). These lead to substantial reductions in the number M of multiplications compared to the row–column method: M is reduced to [3M/4] by simultaneous $[2 \times 2]$ factoring, and to [15M/32] by simultaneous $[4 \times 4]$ factoring.

The use of a non-diagonal decimating matrix may bring savings in computing time if the spectrum of the band-limited function under study is of such a shape as to pack more compactly in a non-rectangular than in a rectangular lattice (Mersereau, 1979). If, for instance, the support K of the spectrum Φ is contained in a sphere, then a decimation matrix producing a close packing of these spheres will yield an aliasing-free DFT algorithm with fewer sample points than the standard algorithm using a rectangular lattice.

1.3.3.3.2.2. Multidimensional prime factor algorithm

| top | pdf |

Suppose that the decimation matrix N is diagonal $[{\bf N} = \hbox{diag } (N^{(1)}, N^{(2)}, \ldots, N^{(n)})]$ and let each diagonal element be written in terms of its prime factors: $[N^{(i)} = {\textstyle\prod\limits_{j = 1}^{m}} \;p_{j}^{\nu (i, \, \;j)},]$ where m is the total number of distinct prime factors present in the $[N^{(i)}]$ .

The CRT may be used to turn each 1D transform along dimension i $[(i = 1, \ldots, n)]$ into a multidimensional transform with a separate `pseudo-dimension' for each distinct prime factor of $[N^{(i)}]$ ; the number $[\mu_{i}]$ , of these pseudo-dimensions is equal to the cardinality of the set: $[\{\;j \in \{1, \ldots, m\} | \nu (i,j) \gt 0 \hbox{ for some } i\}.]$ The full n-dimensional transform thus becomes μ-dimensional, with $[\mu = {\textstyle\sum_{i = 1}^{n}} \mu_{i}]$ .

We may now permute the μ pseudo-dimensions so as to bring into contiguous position those corresponding to the same prime factor $[p_{j}]$ ; the m resulting groups of pseudo-dimensions are said to define `p-primary' blocks. The initial transform is now written as a tensor product of m p-primary transforms, where transform j is on $[p_{j}^{\nu (1, \, \;j)} \times p_{j}^{\nu (2, \, j)} \times \ldots \times p_{j}^{\nu (n, \, j)}]$ points [by convention, dimension i is not transformed if $[\nu (i,j) = 0]$ ]. These p-primary transforms may be computed, for instance, by multidimensional Cooley–Tukey factorization (Section 1.3.3.3.1), which is faster than the straightforward row–column method. The final results may then be obtained by reversing all the permutations used.

The extra gain with respect to the multidimensional Cooley–Tukey method is that there are no twiddle factors between p-primary pieces corresponding to different primes p.

The case where N is not diagonal has been examined by Guessoum & Mersereau (1986).

1.3.3.3.2.3. Nesting of Winograd small FFTs

| top | pdf |

Suppose that the CRT has been used as above to map an n-dimensional DFT to a μ-dimensional DFT. For each $[\kappa = 1, \ldots, \mu]$ [κ runs over those pairs (i, j) such that $[\nu (i,j) \gt 0]$ ], the Rader/Winograd procedure may be applied to put the matrix of the κth 1D DFT in the CBA normal form of a Winograd small FFT. The full DFT matrix may then be written, up to permutation of data and results, as $[\bigotimes_{\kappa = 1}^{\mu}({\bf C}_{\kappa} {\bf B}_{\kappa} {\bf A}_{\kappa}).]$

A well known property of the tensor product of matrices allows this to be rewritten as $[\left(\bigotimes_{\gamma = 1}^{\mu} {\bf C}_{\gamma}\right) \times \left(\bigotimes_{\beta = 1}^{\mu} {\bf B}_{\beta}\right) \times \left(\bigotimes_{\alpha = 1}^{\mu} {\bf A}_{\alpha}\right)]$ and thus to form a matrix in which the combined pre-addition, multiplication and post-addition matrices have been precomputed. This procedure, called nesting, can be shown to afford a reduction of the arithmetic operation count compared to the row–column method (Morris, 1978).

Clearly, the nesting rearrangement need not be applied to all μ dimensions, but can be restricted to any desired subset of them.

1.3.3.3.2.4. The Nussbaumer–Quandalle algorithm

| top | pdf |

Nussbaumer's approach views the DFT as the evaluation of certain polynomials constructed from the data (as in Section 1.3.3.2.4). For instance, putting $[\omega = e(1/N)]$ , the 1D N-point DFT $[X^{*}(k^{*}) = {\textstyle\sum\limits_{k = 0}^{N - 1}} X(k) \omega^{k^{*}k}]$ may be written $[X^{*}(k^{*}) = Q(\omega^{k^{*}}),]$ where the polynomial Q is defined by $[Q(z) = {\textstyle\sum\limits_{k = 0}^{N - 1}} X(k)z^{k}.]$

Let us consider (Nussbaumer & Quandalle, 1979) a 2D transform of size $[N \times N]$ : $[X^{*}(k_{1}^{*}, k_{2}^{*}) = {\textstyle\sum\limits_{k_{1} = 0}^{N - 1}}\; {\textstyle\sum\limits_{k_{2} = 0}^{N - 1}} X(k_{1}, k_{2}) \omega^{k_{1}^{*} k_{1} + k_{2}^{*} k_{2}}.]$ By introduction of the polynomials $[\eqalign{Q_{k_{2}}(z) &= {\textstyle\sum\limits_{k_{1}}}\; X (k_{1}, k_{2})z^{k_{1}} \cr R_{k_{2}^{*}}(z) &= {\textstyle\sum\limits_{k_{2}}}\; \omega^{k_{2}^{*} k_{2}} Q_{k_{2}}(z),}]$ this may be rewritten: $[X^{*}(k_{1}^{*}, k_{2}^{*}) = R_{k_{2}^{*}} (\omega^{k_{1}^{*}}) = {\textstyle\sum\limits_{k_{2}}}\; \omega^{k_{2}^{*} k_{2}} Q_{k_{2}} (\omega^{k_{1}^{*}}).]$

Let us now suppose that $[k_{1}^{*}]$ is coprime to N. Then $[k_{1}^{*}]$ has a unique inverse modulo N (denoted by $[1/k_{1}^{*}]$ ), so that multiplication by $[k_{1}^{*}]$ simply permutes the elements of $[{\bb Z}/N {\bb Z}]$ and hence $[{\textstyle\sum\limits_{k_{2} = 0}^{N - 1}} f(k_{2}) = {\textstyle\sum\limits_{k_{2} = 0}^{N - 1}} f(k_{1}^{*} k_{2})]$ for any function f over $[{\bb Z}/N {\bb Z}]$ . We may thus write: $[\eqalign{X^{*}(k_{1}^{*}, k_{2}^{*}) &= {\textstyle\sum\limits_{k_{2}}} \;\omega^{k_{1}^{*} k_{2}^{*} k_{2}} Q_{k_{1}^{*} k_{2}} (\omega^{k_{1}^{*}}) \cr &= S_{k_{1}^{*} k_{2}} (\omega^{k_{1}^{*}})}]$ where $[S_{k^{*}}(z) = {\textstyle\sum\limits_{k_{2}}}\; z^{k^{*} k_{2}} Q_{k_{2}}(z).]$ Since only the value of polynomial $[S_{k^{*}}(z)]$ at $[z = \omega^{k_{1}^{*}}]$ is involved in the result, the computation of $[S_{k^{*}}]$ may be carried out modulo the unique cyclotomic polynomial [P(z)] such that $[P(\omega^{k_{1}^{*}}) = 0]$ . Thus, if we define: $[T_{k^{*}}(z) = {\textstyle\sum\limits_{k_{2}}} \;z^{k^{*} k_{2}} Q_{k_{2}}(z) \hbox{ mod } P(z)]$ we may write: $[X^{*}(k_{1}^{*}, k_{2}^{*}) = T_{k_{1}^{*} k_{2}^{*}} (\omega^{k_{1}^{*}})]$ or equivalently $[X^{*} \left(k_{1}^{*}, {k_{2}^{*} \over k_{1}^{*}}\right) = T_{k_{2}^{*}} (\omega^{k_{1}^{*}}).]$

For N an odd prime p, all non-zero values of $[k_{1}^{*}]$ are coprime with p so that the $[p \times p]$ -point DFT may be calculated as follows:

(1) form the polynomials $[T_{k_{2}^{*}}(z) = {\textstyle\sum\limits_{k_{1}}} {\textstyle\sum\limits_{k_{2}}} \;X(k_{1}, k_{2})z^{k_{1} + k_{2}^{*} k_{2}} \hbox{ mod } P(z)]$ for $[k_{2}^{*} = 0, \ldots, p - 1]$ ;
(2) evaluate $[T_{k_{2}^{*}} (\omega^{k_{1}^{*}})]$ for $[k_{1}^{*} = 0, \ldots, p - 1]$ ;
(3) put $[X^{*}(k_{1}^{*}, k_{2}^{*}/k_{1}^{*}) = T_{k_{2}^{*}} (\omega^{k_{1}^{*}})]$ ;
(4) calculate the terms for $[k_{1}^{*} = 0]$ separately by $[X^{*}(0, k_{2}^{*}) = {\textstyle\sum\limits_{k_{2}}} \left[{\textstyle\sum\limits_{k_{1}}} \;X(k_{1}, k_{2})\right] \omega^{k_{2}^{*} k_{2}}.]$

Step (1) is a set of p `polynomial transforms' involving no multiplications; step (2) consists of p DFTs on p points each since if $[T_{k_{2}^{*}}(z) = {\textstyle\sum\limits_{k_{1}}}\; Y_{k_{2}^{*}}(k_{1})z^{k_{1}}]$ then $[T_{k_{2}^{*}} (\omega^{k_{1}^{*}}) = {\textstyle\sum\limits_{k_{1}}} \;Y_{k_{2}^{*}}(k_{1}) \omega^{k_{1}^{*} k_{1}} = Y_{k_{2}^{*}}^{*}(k_{1}^{*})\hbox{;}]$ step (3) is a permutation; and step (4) is a p-point DFT. Thus the 2D DFT on $[p \times p]$ points, which takes 2p p-point DFTs by the row–column method, involves only [(p + 1)] p-point DFTs; the other DFTs have been replaced by polynomial transforms involving only additions.

This procedure can be extended to n dimensions, and reduces the number of 1D p-point DFTs from $[np^{n - 1}]$ for the row–column method to $[(p^{n} -1)/(p - 1)]$ , at the cost of introducing extra additions in the polynomial transforms.

A similar algorithm has been formulated by Auslander et al. (1983) in terms of Galois theory.

References

Auslander, L., Feig, E. & Winograd, S. (1983). New algorithms for the multidimensional discrete Fourier transform. IEEE Trans. Acoust. Speech Signal Process. 31, 388–403.Google Scholar

Auslander, L., Tolimieri, R. & Winograd, S. (1982). Hecke's theorem in quadratic reciprocity, finite nilpotent groups and the Cooley–Tukey algorithm. Adv. Math. 43, 122–172.Google Scholar

Guessoum, A. & Mersereau, R. M. (1986). Fast algorithms for the multidimensional discrete Fourier transform. IEEE Trans. Acoust. Speech Signal Process. 34, 937–943.Google Scholar

Harris, D. B., McClellan, J. H., Chan, D. S. K. & Schuessler, H. W. (1977). Vector radix fast Fourier transform. Rec. 1977 IEEE Internat. Conf. Acoust. Speech Signal Process. pp. 548–551.Google Scholar

Mersereau, R. & Speake, T. C. (1981). A unified treatment of Cooley–Tukey algorithms for the evaluation of the multidimensional DFT. IEEE Trans. Acoust. Speech Signal Process. 29, 1011–1018.Google Scholar

Mersereau, R. M. (1979). The processing of hexagonally sampled two-dimensional signals. Proc. IEEE, 67, 930–949.Google Scholar

Morris, R. L. (1978). A comparative study of time efficient FFT and WFTA programs for general purpose computers. IEEE Trans. Acoust. Speech Signal Process. 26, 141–150.Google Scholar

Nussbaumer, H. J. & Quandalle, P. (1979). Fast computation of discrete Fourier transforms using polynomial transforms. IEEE Trans. Acoust. Speech Signal Process. 27, 169–181.Google Scholar

Rivard, G. E. (1977). Direct fast Fourier transform of bivariate functions. IEEE Trans. Acoust. Speech Signal Process. 25, 250–252.Google Scholar

International Tables for Crystallography (2006). Vol. B. ch. 1.3, pp. 55-57