Fully Parallel Systolic Architecture
The Fully Parallel Systolic Architecture is a highly optimized filter design that ameliorates symmetric, anti-symmetric, and zero-value coefficients. The filter's delay is contingent upon the symmetry of the filter coefficients.
When symmetric coefficients are absolutely equal, they share a single DSP block, enabling the utilization of pre-adders in Xilinx and Altera DSP blocks. This pairing facilitates streamlined implementation.
Illustrating the unoptimized and optimized versions of a symmetric filter, the upper and lower halves of the following diagram respectively depict the filter configurations before and after optimization:
[Insert diagram]
The unoptimized version of the filter structure is located in the upper half, while the optimized version is displayed in the lower half.
Fully Parallel Transposed Architecture
The Fully Parallel Transposed Architecture optimizes filter implementation by exploiting the sharing of multiplication units for any pair of coefficients that are absolutely equal, and removing the multiplication units required for zero-value coefficients. This structure entails a fixed delay of 6 clock cycles.
The upper and lower halves of the following diagram illustrate the unoptimized and optimized versions of a symmetric filter:
[Insert diagram]
The upper half depicts the unoptimized portion of a symmetric filter, while the lower half displays the optimized structure.
Partly Serial Systolic Architecture (1 < N < L)
Where N represents the delay length and L denotes the filter order.
A partly serial filter necessitates the use of M = ceil(L/N) systolic units.
[Insert diagram]
The filter delay is M + ceil(L/M) + 5 clock cycles. In the process of implementation, if the coefficient in the lookup table corresponding to a multiplication unit is 0 or a power of 2, the multiplication unit is not required, and the change in the power of 2 is implemented using bit-shifting.
Fully Serial Systolic Architecture (N ≥ L)
With a delay length greater than the filter order, the filter is designed as a fully serial structure. In this case, the filter delay equals L + 5 clock cycles.