36 views (last 30 days)

Hi,

I have a dataset (upper and lower limits for a given X coordinate) where I want to generate a batch of curves that fit inside the dataset, to explore the worst case possible representations.

Dataset is:-

X Value , Lower Limit , Upper Limit

0.0 14.96 15.94

0.2 14.98 15.91

0.4 13.47 15.94

0.6 10.90 13.66

0.8 5.60 12.38

1.0 0.00 7.84

1.2 0.00 2.35

1.4 0.00 0.51

How would I go about it?

Many thanks, Mark.

Image Analyst
on 24 Jan 2021

Here's an example. Adapt as needed:

clc; % Clear the command window.

close all; % Close all figures (except those of imtool.)

clear; % Erase all existing variables. Or clearvars if you want.

workspace; % Make sure the workspace panel is showing.

format long g;

format compact;

fprintf('Beginning to run %s.m ...\n', mfilename);

% X Value , Lower Limit , Upper Limit

x = [...

0.0 14.96 15.94

0.2 14.98 15.91

0.4 13.47 15.94

0.6 10.90 13.66

0.8 5.60 12.38

1.0 0.00 7.84

1.2 0.00 2.35

1.4 0.00 0.51 ]

[rows, columns] = size(x);

% Get colors for all the lines.

cmap = jet(rows);

labelStrings = cell(rows, 1);

for row = 1 : rows

% Make 500 points between the left and right.

thisXAxis = linspace(x(row, 2), x(row, 3), 500);

% Assume the first column if the period, since we have no idea what it should be.

period = x(row, 1);

% Get a random amplitude

amplitude = rand;

% Get y

y = amplitude * cos(2 * pi * thisXAxis / period);

plot(thisXAxis, y, '-', 'Color', cmap(row, :), 'LineWidth', 4);

grid on;

hold on;

% Set up legend

legendStrings{row} = sprintf('Curve %d', row);

end

fontSize= 20;

xlabel('X', 'FontSize', fontSize);

ylabel('Y', 'FontSize', fontSize);

legend(legendStrings, 'Location', 'north');

g = gcf;

g.WindowState = 'maximized';

fprintf('Done running %s.m.\n', mfilename);

Adam Danz
on 24 Jan 2021

Edited: Adam Danz
on 24 Jan 2021

1. A sophisticated way would be to fit the curve, perhaps to a sigmoid or logistic fcn, and then adjust the fit parameters to produces a set of smooth curves that encompass the range of y values. That will likely requires a lot of fine tuning. If you're interested in slope and its variation, this would be the way to go.

2. Another suggestion is to implement a bootstrap process with whatever generated the data to begin with. You can resample from the distribution that gave you the bounds, with replacement, many times to generate many curves from the raw, resampled data and the slope can be computed from each bootstrap iteration to produce a normal distribution of slopes from which you can compute the mean and std.

3. A lower level solution is to linearly space y values at each range,

T = array2table([

0.0 14.96 15.94

0.2 14.98 15.91

0.4 13.47 15.94

0.6 10.90 13.66

0.8 5.60 12.38

1.0 0.00 7.84

1.2 0.00 2.35

1.4 0.00 0.51 ], ...

'VariableNames',{'x','lower','upper'});

% Add means since they weren't provided

T.y = mean([T.lower, T.upper],2)

% Plot mean curve

errorbar(T.x, T.y, T.y-T.lower, T.upper-T.y, 'LineWidth', 3);

% Add various lines within the bounds

hold on

nLines = 10; % <--- number of lines

yvals = cell2mat(arrayfun(@(i){linspace(T.lower(i), T.upper(i), nLines)}, 1:height(T))');

xvals = repmat(T.x, 1,nLines);

h = plot(xvals, yvals);

4. But the variation doesn't have to be only vertical. If you want noisy curves that are anywhere within the bounds you can generate random y values within each bound,

% Plot mean curve

figure()

errorbar(T.x, T.y, T.y-T.lower, T.upper-T.y, 'LineWidth', 3);

% Add various lines within the bounds

hold on

boundRange = range([T.lower,T.upper],2);

rng('default') % for reproducibility

for i = 1:50 % <--- number of lines

randYVals = rand(1,numel(T.x)).*boundRange' + T.lower';

plot(T.x, randYVals)

end

Adam Danz
on 25 Jan 2021

If you want smooth curves you'll need to either fit the curve or produce it from a known function.

It's unclear why you want these curves.

> I am trying to generate a bunch of curves that represent steep and shallow inflections that cover all of the areas between upper and lower limits. Also to explore bias towards the lower or to the upper.

In this case, generating a bunch of random curves and then analyzing the bias in your randomly fabricated data isn't a good approach and probably wouldn't pass peer reivew if this it were submitted to a journal. Those error bounds were computed somehow likely based on some distributions at each x value. If possible, you should use those data to compute the possible curves within the bounds. You could also use standard bootstrapping techniques to resample from the population of data to generate thousands of curves all of which could be analyzed for bias which would form normal distributions that you can use to estimate the bias from the entire population.

Adam Danz
on 25 Jan 2021

Thanks for the description. If those intervals were supplied byt he supplier and you don't have access to the underlying data that were used to compute the intervals, then the bootstrap method as I described it isn't available.

If the curve is something you generate and has some variation each time you generate it, you could store all of the curves you generated and bootstrap those data.

Brief description of bootstrapping

The main idea behind bootstrapping is this: your cuve is based on 8 coordinates. Let's say that curve is generated 20 times, now you have 20x8 y-values (assuming the x values do not change, but it's not a problem if they do). Assuming the x-values are independent, step 1 is to resample those 20x8 values with replacement to generate 1000 curves based on the same data. So col 1 is sampled 1000 times randomly, same with column 2 and so on. Now you have 1000x8 matrix containing resampled data from the original 20x8 matrix. You also now have 1000 curves. You can measure the bias in each of the 1000 curves to get 1000 bias values. Thanks to the central limit theorem, that distribution will be normal provided that you have enough bootstrap iterations (in this case, 1000). From that distribution of bias measurements, you can compute the mean and 95% confidence intervals (the 97.5 and 2.5 percentiles). That gives you a completely valid, scientific estimate of bias and error from the 20x8 original sample.

Alternatively, you could measure the bais from the original 20 curves and then bootstrap those values. That starts with a 20x1 (or 1x20) vector of baises from the 20 original curves, then you randomly sample it 1000 x to get 1000x1 biases and compute the mean and CI's. That might be the better approach since your x's aren't independent from the y's.

Problems with my suggestions #3 and #4

The problem with my suggestion #3 is that the curves vary mainly vertically with very little variation otherwise and that may not be how the actual data vary.

The problem with my suggestion #4 is that variation is too wild and likely generates curves that wouldn't actually occur naturally (I'm guessing; I have no idea what process bore the data).

So, basing estimate from unnatural curves won't be helpful to understand how the real data may vary.

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!
## 2 Comments

## Direct link to this comment

https://au.mathworks.com/matlabcentral/answers/725767-generating-a-range-of-curves-that-fit-inside-a-set-of-fixed-limits#comment_1283247

⋮## Direct link to this comment

https://au.mathworks.com/matlabcentral/answers/725767-generating-a-range-of-curves-that-fit-inside-a-set-of-fixed-limits#comment_1283247

## Direct link to this comment

https://au.mathworks.com/matlabcentral/answers/725767-generating-a-range-of-curves-that-fit-inside-a-set-of-fixed-limits#comment_1283672

⋮## Direct link to this comment

https://au.mathworks.com/matlabcentral/answers/725767-generating-a-range-of-curves-that-fit-inside-a-set-of-fixed-limits#comment_1283672

Sign in to comment.