This uses the Cooley-Tukey radix-2 decimation by time algorithm.
Let's start with the definition of a DFT:
where k = 0,1,2...N-1
So imagine a square matrix NxN large with k’s as the rows and n’s as the columns.
By applying the Danielson - Lanczos lemma, we can divide the problem into 2 of size N/2:
This is essentially the DFT of the even indexed terms plus the product of the DFT of the odd indexed terms and a “twiddle factor”:
We can recursively apply the lemma, eventually getting a balanced binary tree of terms. I'm using a DFT of size 11 as an example here:
You can see that the twiddle factor builds up (denoted by the T's.)
We can already achieve data parallelism at the k granularity. However, we want to achieve data parallelism at the k/partition granularity. In order to achieve this, we need to be able to calculate the effective twiddle factor of the partitions and apply a unique partition id.
Well first, you can see that for a DFT of size N, you will be able to get partitions of sizes 1 and 2 with exactly floor(log2N) splits. We'll call this the recursion depth (R). Since this is a balanced binary tree, the number of partitions (i.e. number of leaves) will be 2R. Splitting by even and odds recursively is the same as splitting by divisibility by the number of partitions. Therefore, we can calculate a unique parititon id: n mod 2R where R = floor(log2N).
The partition id is actually not arbitrary since we will use it to calculate the effective twiddle factor. From the terrible drawing above you can see that the number of twiddle factor applications is the number of right traversals in the path from the root of the tree to your node/partition. Well since branching in this graph is representative of a division by two, a right traversal is representative of an incremental bit of difference. Therefore, we can calculate the number of twiddle factor applications by counting the number of bits in the partition id.
Now, since all the calculations necessary are only dependent on the xn’s in the partition, n itself, and k, we have data parallelism to the k/partition granularity. This is more than enough granularity to apply map reduce:
For the first map step, we map
f(n,xn):
for(k in range(0,N-1):
emit([k,partition(n)],[xn,n])
where partition(n) is n mod 2R and R = floor(log2N).
In the first reduce step, we solve the size 1 and size 2 DFTs using the naive algorithm (just apply the definition.)
f([k,partition],[vals]):
emit([k,partition],DFT(k,[vals])
For the second map step, we then apply the twiddle factor vector to the results of the DFT
f([k,partition],dft):
emit (k,twiddle(partition,k)*dft)
where twiddle(partition,k) is
In the final reduce step, we add the adjusted DFT results together
f(k,[vals]):
emit (k,sum([vals]))
No comments:
Post a Comment