HDL coder sharing factor and axi-stream valid signal

Dear HDL coder experts,
please help me to understand how to combine the axi-stream interface and resource sharing at the same time.
Imagine hypothetical 8 taps symmetric FIR filter with 4 multipliers. The FIR has 2 inputs, data_in, and valid_in. The data rate is 24 kHz, the FPGA clock rate is 122.88 MHz, 5120 times higher than the data rate. I can use an enabled subsystem and connect the valid_in signal to the enable port of this subsystem. In this case, the enabled subsystem would run only when an external valid signal is on. So far so good except for the use of 4 multipliers which is resource wasting.
When I enable resource sharing the HDL coder shares one multiplier 4 times and updates the DUT base rate to 24*4=192 kHz. In order to do that the optimizer creates a timing controller which divides the FPGA clock 4 times.
There are two problems here. First, my actual FPGA clock rate is 122.88 MHz, not 192 kHz. The solution is to explicitly point out the oversampling factor at Workflow Advisor. Second, the phase of this timing controller's signal is not related to the external valid_in signal. In other words, the FIR accepts new data only when the internal state machine dictates that. If the state machine and the external valid signal are not in sync, nothing is at the output.
Is it possible to synchronize the external valid signal and the resource sharing counter by means of HDL-coder settings? My vision: the positive edge of the valid_in signal should trigger the counter which counts 4 times and then stops providing 4 high-speed clocks to fulfill mult sharing.

Answers (1)

I will likely need a little more information on exactly which FIR block you are using, but you can put in a sharing factor on the subsystem containing the FIR to be the ratio of 122.88MHz/24KHz=5120. The subsystem will end up using only 4 clocks out of the possible 5120.
You can also use the FIR HDL Optimized block where you can put in this sharing factor of 5120. It has the added benefit of having a valid input port as well.

3 Comments

Hello Bharath,
thank you for the answer. Yes, I know about FIR HDL Optimized block, and it works great. It has a unique capability to get information from a user about the minimum number of clock cycles between two consequent valid signals. Unfortunately, the FIR Rate Conversion HDL Optimized block does not have such an option. As long as I work with sound signals, the rate between FPGA master-clock and audio samples is quite high, 640-5120. It would be great to use these extra clocks to save multipliers. The FIR Rate Conversion HDL Optimized block saves multipliers at the polyphase algorithm level but it is not enough because it does not use sharing. The Xilinx FIR Compiler (IP-core generator from Vivado) provides a more efficient variant here, so at the moment I divide the model into several parts, generate HDL code from them and interleave it with Xilinx IP-cores, the FIR interpolator, and decimator in this case.
Besides, the FIR HDL Optimized block forces the System model to work at the FPGA clock rate which slows down the simulation. If the resource sharing option works with valid-handshaking blocks, it would let the system model stay at a low clock rate.
So I decided to play around with a hand-made FIR filter to investigate the resource sharing algorithms. In the end, would like to get the polyphase FIR structure and apply resource sharing to it. And I need valid in/valid out signals.
I understand that I can model resource sharing at the system level using one MAC-core and add some multiplexers around it explicitly using the FPGA clock rate, but I would like to know if there is some easier way to do that.
Maybe MathWorks plans to add resource sharing to the FIR Rate Conversion HDL Optimized block? The best solution is to provide the user to point out the minimum clock samples number between two valid signals.
My simple FIR looks like this:
Understood and I will note down the request for resource sharing for the FIR Rate Conversion HDL Optimized block.
With the above structure, you can now ask for sharing at the subsystem level and set the sharing factor to 5120, which will cause usage of just one multiplier.
I also suggest that you model valid instead of using enable. Us the valid as an input port. The "enable" port for the delay block works in this case as a valid and you can hook valid up to it.
Thank you Bharath,
to use the valid input port explicitly and connect it to the delay blocks' enable pins only is not going to help much because there is still no relation between this valid signal and the generated timing controller logic. Without this synchronization, the output valid is always zero in the HDL testbench.
This is because of that (the generated HDL's top level):
always @(posedge clk) begin : delayMatch1_process
if (resetn == 1'b0) begin
valid_in_2 <= 1'b0;
end
else begin
if (enb_1_2560_0) begin
valid_in_2 <= valid_in;
end
end
end
assign valid_out = valid_in_2;
valid_out equals valid_in only at the moment enb_1_2560_0 is high. The enb_1_2560_0 signal in turn is generated by the timing controller:
fir_filter_tc u_fir_filter_tc (
.clk(clk),
.resetn(resetn),
.clk_en(clk_en),
.enb(enb),
.enb_1_2560_0(enb_1_2560_0),
.enb_1_2560_1(enb_1_2560_1)
);
I came to the conclusion that the only way to get maximum efficiency for the rate converter FIR filters with the axi-stream interface is to implement them manually, in a manner described in the example for regular FIR here: https://ww2.mathworks.cn/help/hdlcoder/ug/running-audio-filter-with-multiple-axi4-stream-channels.html

Sign in to comment.

Products

Release

R2021a

Asked:

on 13 Oct 2021

Commented:

on 19 Oct 2021

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!