# Linear regression on data with asymmetric measurement error

6 views (last 30 days)

Show older comments

I am looking to perform a linear regression on measured data that takes into account an asymmetric error in the data. I've created some dummy data to illustrate what I mean:

The blue curve represents the measured data, while the red curve is the lower bound and is notably closer to the measured data than the orange curve, which represents the upper bound.

Snippet of code to create dummy data:

xdata = linspace(0,10, 20);

ydata = 2*xdata+1.5*rand(1,length(xdata));

y_err_low = 0.3*xdata+1.5*rand(1,length(xdata));

y_err_high = 0.6*xdata+1.5*rand(1,length(xdata));

ylowbnd = ydata - y_err_low;

yupbnd = ydata + y_err_high;

plot(xdata, ydata,'o-', 'LineWidth', 2, 'DisplayName', 'measured data')

hold on

plot(xdata, ylowbnd, 'x--', 'LineWidth', 2, 'DisplayName', 'lower bound')

plot(xdata, yupbnd, 's--', 'LineWidth', 2, 'DisplayName', 'upper bound')

xlabel('x')

ylabel('y')

legend('Location','northwest')

I have linear regression approaches that rely on the error in y being symmetric about the measured datapoint, but am struggling to find a way to weight my regression based on an asymmetric error.

Things I've been digging into:

- fmincon (for both fmincon and lsqcurvefit, the bounds, equalities, and inequalities do not appear to allow to input a bound/etc with vectors, e.g., , where anonymous function to fit the data would be and the objective for fmincon would be )
- lsqcurvefit
- Method of Maximum Likelihood (here the examples I've been seeing rely on Gaussian distribution around each ydata point, so not asymmetric)

I would appreciate any help in how I can go about giving the fit more (or less) freedom to roam as matches with the asymmetric error associated with each data point.

Thanks!

##### 1 Comment

### Answers (2)

Mathieu NOE
on 10 Nov 2023

hello Katrina

maybe this ?

you can force the mean curve to get closer from either the upper or the lower bound by adjusting the a coefficient

a = 0.7; % a = 1 is equivalent to standard linear averaging (no weighting)

% a<1 shift the mean towards the lower bound, a>1 towards the upper bound

full code (dummy data slightly different from your version, sorry !)

% "true" data

x2 = (0:30);

y2 = 2*x2+1.5*rand(1,length(x2));

dx = mean(diff(x2));

% upper bound

x1 = x2 + dx/3;

y1 = 2.6*x1+1.5*rand(1,length(x1));

% lower bound

x3 = x2 + dx*2/3;

y3 = 1.7*x3-1.5*rand(1,length(x3));

% measurement = all data (contatenated)

x = [x1 x2 x3];

[x,ind] = sort(x);

y = [y1 y2 y3];

y = y(ind);

%%%% main loop %%%%

n = 15; % buffer size

a = 0.7; % a = 1 is equivalent to standard linear averaging (no weighting)

% a<1 shift the mean towards the lower bound, a>1 towards the upper bound

yy = myspecialavg(y, n ,a);

plot(x2, y2,'b',x, y,'*-c',x,yy,'r', 'LineWidth', 2, 'DisplayName', 'measured data')

legend('"true data"','noisy data','my solution');

xlabel('x')

ylabel('y')

legend('Location','northwest')

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

function out = myspecialavg(in, N, a)

% OUTPUT_ARRAY = MYSLIDINGAVG(INPUT_ARRAY, N)

%

% The function 'slidingavg' implements a one-dimensional weighted filtering, applying a sliding window to a sequence. Such filtering replaces the center value in

% the window with the average value of all the points within the window. When the sliding window is exceeding the lower or upper boundaries of the input

% vector INPUT_ARRAY, the average is computed among the available points. Indicating with nx the length of the the input sequence, we note that for values

% of N larger or equal to 2*(nx - 1), each value of the output data array are identical and equal to mean(in).

%

% * The input argument INPUT_ARRAY is the numerical data array to be processed.

% * The input argument N is the number of neighboring data points to average over for each point of IN.

%

% * The output argument OUTPUT_ARRAY is the output data array.

if (isempty(in)) | (N<=0) % If the input array is empty or N is non-positive,

disp(sprintf('SlidingAvg: (Error) empty input data or N null.')); % an error is reported to the standard output and the

return; % execution of the routine is stopped.

end % if

if (N==1) % If the number of neighbouring points over which the sliding

out = in; % average will be performed is '1', then no average actually occur and

return; % OUTPUT_ARRAY will be the copy of INPUT_ARRAY and the execution of the routine

end % if % is stopped.

nx = length(in); % The length of the input data structure is acquired to later evaluate the 'mean' over the appropriate boundaries.

if (N>=(2*(nx-1))) % If the number of neighbouring points over which the sliding

out = mean(in)*ones(size(in)); % average will be performed is large enough, then the average actually covers all the points

return; % of INPUT_ARRAY, for each index of OUTPUT_ARRAY and some CPU time can be gained by such an approach.

end % if % The execution of the routine is stopped.

out = zeros(size(in)); % In all the other situations, the initialization of the output data structure is performed.

if rem(N,2)~=1 % When N is even, then we proceed in taking the half of it:

m = N/2; % m = N / 2.

else % Otherwise (N >= 3, N odd), N-1 is even ( N-1 >= 2) and we proceed taking the half of it:

m = (N-1)/2; % m = (N-1) / 2.

end % if

for i=1:nx, % For each element (i-th) contained in the input numerical array, a check must be performed:

dist2start = i-1; % index distance from current index to start index (1)

dist2end = nx-i; % index distance from current index to end index (nx)

if dist2start<m || dist2end<m % if we are close to start / end of data, reduce the mean calculation on centered data vector reduced to available samples

dd = min(dist2start,dist2end); % min of the two distance (start or end)

else

dd = m;

end % if

tmp = sort(in(i-dd:i+dd)); % buffered data , reduced to available samples at both ends of the data vector

win = linspace(1/a,a,numel(tmp));

win = win/sum(win);

out(i) = sum(win.*tmp); % mean of weighted data , reduced to available samples at both ends of the data vector

end % for i

end

##### 4 Comments

Mathieu NOE
on 14 Nov 2023

hello Katrina

sorry but for the time being I have no other solution to suggest

Jeff Miller
on 14 Nov 2023

##### 0 Comments

### See Also

### Categories

### Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!