How to decide window size for a moving average filter?

Hello all, I have some noisy data in the form of x and y variables. I plan to use moving average filer to get satisfactory results, yet as close as possible to the real data. I understand that higher window size means more smooth data, and hence less realistic. Is that correct? Is window size of 5 considered decent enough to establish relationship between the variables in general? Any leads are highly appreciated. Thanks and regards, Swanand.

 Accepted Answer

It could be. Who's to say? It's more or less of a judgement call as to what amount of smoothing is best, isn't it. You could determine the sum of absolute differences for different window sizes and plot it. Maybe some pattern will jump out at you, like a knee in the curve.
clc; % Clear the command window.
close all; % Close all figures (except those of imtool.)
clear; % Erase all existing variables. Or clearvars if you want.
workspace; % Make sure the workspace panel is showing.
format long g;
format compact;
fontSize = 20;
numPoints = 5000;
noiseSignal = rand(1, numPoints);
x = linspace(0, 300, numPoints);
period = 100;
cleanSignal = cos(2*pi*x / period);
noisySignal = cleanSignal + noiseSignal;
subplot(2, 1, 1);
plot(x, noisySignal, 'b-', 'LineWidth', 2);
grid on;
xlabel('x', 'FontSize', fontSize);
ylabel('Noisy Signal', 'FontSize', fontSize);
windowSizes = 3 : 3 : 51
for k = 1 : length(windowSizes)
smoothedSignal = movmean(noisySignal, windowSizes(k));
sad(k) = sum(abs(smoothedSignal - noisySignal))
end
subplot(2, 1, 2);
plot(windowSizes, sad, 'b*-', 'LineWidth', 2);
grid on;
xlabel('Window Size', 'FontSize', fontSize);
ylabel('SAD', 'FontSize', fontSize);
Pick the smallest window size where the SAD seems to start to flatten out. Going beyond that (to larger window sizes) really doesn't produce much more benefit (smoothing) and will take longer.

11 Comments

Thanks a lot for your answer. However, can you please tell what exactly is this part doing?
windowSizes = 3 : 3 : 51
for k = 1 : length(windowSizes)
smoothedSignal = movmean(noisySignal, windowSizes(k));
sad(k) = sum(abs(smoothedSignal - noisySignal))
end
Thanks again.
It tries different window sizes and computes and saves the sum of absolute differences. After that loop, it then plots them so we can see if something significant happens for a particular window size. Actually it should have said
windowSizes = 3 : 2 : 51
because we want the window size to take on only odd sizes so the filtered signal won't have a half element shift in the x direction.
Hi Image Analyst, this answer is benifit to me, I learn more from it. I wonder that, how can I cite your answer to scientific article? Where does the method come from? An article, a book or others.
It's nothing I invented - it's just common sense. You can just say something like "Using a technique I learned on the MATLAB Central web site" if you want.
thanks for your asnwer, does the window size that we should pick is between 10 and 20?
Hi,
Thanks for your answer. However, I want to ask how we can pick the smallest window size value dynamically (using code) where the curve starts to flatten out?
Just pick something, like 10 or 20 or whatever you want. It's a judgment call. There is no "right" answer.
Or you could pick something where the SAD value is more than 90% of the final value, like
index = find(sad > 0.9 * sad(end), 1, 'first'); % Adjust 0.9 to whatever you want.
windowSize = windowSizes(index);
My SAD looks like this, what does it mean?
You probably have periodic structures in your signal (which you forgot to attach).

Sign in to comment.

More Answers (3)

A moving average filter is one of the varieties of discrete lowpass filter. You can choose your width according to your attenuation needs. See http://ptolemy.eecs.berkeley.edu/eecs20/week12/freqResponseRA.html
How can we select a wind size for the selection of DNA sequence like
ATCGGGCTTACGG
window length size 5 to read the sequence please drop the code.

3 Comments

I don't understand the question. How is a sequence of letters analogous to a noisy numerical signal?
Let's say the window size was 5. What would you expect the output to be for that short sequence you gave? Explain why/how you got that output.
basically i am working on the CLassificaion of DNA sequence using neural networks i have a DNA sequcence like
ATCGTGGCCAATGGTAACCG...... upto 500 0r more Nucleotides
i converted it to binary now i want my network read a stream of five charecters 10 and 15 so how to write code for it'
I don't know what that means. Do you want to read 5 characters from somewhere? Or 10? Or 15? Where is this stream coming from? A file? If so, have you seen fread()?

Sign in to comment.

I'm very surprised that none of the previous responses mentioned
1. Determine characteristic self correlation lengths using output autocorrelation functions
2.. Determine characteristic cross correlation lengths using
input-output crosscorrelation functions
Hope this helps.
Greg

2 Comments

Hi Greg, What does it mean to use the correlation length?
for example, lets say I have an image with size of 400 [vertical pixels] x 600 [horizontal pixels], then how to find an optimal window size for moving avearge filter among 3x3, 5x5, 7x7, 11x11, 13x13, 15x15 window size?
PS) could you see whether my approach is correct or not?
what I am doing now is that..
[step 1]
A: original input image
B3: the result by using averaging filter with 3x3 window size
B5: the result by using averaging filter with 5x5 window size
B7: the result by using averaging filter with 7x7 window size
B9, B11, B13, B15.
I can get seven binary images(intensity of each pixel is 0 or 1) for B3~B15 using a fixed threshold. FYI, the object area is changing gradually depending on window size.
[step 2]
I calculated a numerical value between two adjacent filters such as mean squared error
C35: MSE between B3 and B5
C57: MSE between B5 and B7
C79: MSE between B7 and B9
C911: MSE between B9 and B11
C1113: MSE between B11 and B13
C1315: MSE between B13 and B15
[step 3]
find when I can get maximum gradient among C35, C57, ..., C1315.
Thank you in advance
Could you please explain to me in case for example C35 and C57 with a maximum gradient comparing to the rest. Which one of them could be considered at the best length 3, 5 or 7 ?

Sign in to comment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!