# What can I do to further vectorise this code?

2 views (last 30 days)
O.Ib on 30 Jan 2020
Edited: O.Ib on 2 Feb 2020
Some context first:
I have several sets of experimental data that look like this: I'll be restricting my discussion to just one set of data. My objective is to extract the peaks (maximum and minimum) of each "cycle." findpeaks didn't help with that since the data doesn't always ascend/descend monotonously over a quarter-cycle, especially near the peaks: However, that's not the case near the x-axis (time) intersects, so I used the sign change in the strain (y-axis) that happens there to collect the strain values of each "cycle" and calculate, iteratively, the maximum and minimum of each set. This is the code I wrote after some optimisation (in the form of a function):
function Umkehrpunkte = ump(x)
% Prepare data for processing (Calculate Stress and Strain; Area always constant)%
Area = 24.5;
x(:,3) = x(:,3)*100;
x(:,2) = x(:,2)/Area;
%Initialise index(es) for loops and create a zeros-matrix with a reasonable number of rows for storage of maxima and minima%
z = 1;
Umkehrpunkte = zeros(100,2);
i_umk = 1;
length_x = length(x); %Slightly faster. Compare in profiler.
while z < length_x %z corresponds to row number
t = z; %Setting lower row index to the first element being processed; marks the starting point of a cycle
while z < length_x && sign(x(z,2)) == 1 %Using change of sign at x-axis as criterion for terminating incrementation of z
z = z+1; %Increment until last positive value (positive half-cycle) has been reached.
end
%Termination checkpoint: Store the maxima of the last data set and terminate the outer while-loop.
%(LAST DATA SET IS ALWAYS A POSITIVE HALF CYCLE!)
if z == length_x;
Umkehrpunkte(i_umk, [1 2]) = [max(x(t:z,2)) max(x(t:z,3))]; %Locate and assign last maximum; run only once before termination.
i_umk = i_umk +1;
break;
end
while sign(x(z,2)) == -1; %Negative half-cycle, see comment at first inner while-loop for details
z = z+1;
end
end
%Extract the maximum and the minimum of the current FULL cycle.
Umkehrpunkte([i_umk i_umk+1], [1 2]) = [max(x(t:z,2)) max(x(t:z,3)); min(x(t:z,2)) min(x(t:z,3))]; i_umk = i_umk +2; %
end
%Trimming off the zeros
Umkehrpunkte = Umkehrpunkte(1:i_umk-1, [1 2]);
My first thought was to vectorise the collection of positive and negative half-cycles using the critera I set here for the inner while-loops (signum function) as boolean conditions, but that would collect all positive or all negative data points without showing which half-cycle they belong to. Any approach I came up with to separate the data involved loops, just different ones. So my first question along this line of thought is: is there any way I can achieve what I want with something like the following?
Umkehrpunkte = max(all positive values of x(all rows, specific column(s)))
%And then the same here for all negative values of x to calculate the minima.
What, if anything, can be added to this besides loops that would make it collect the data until it finds a positive/negative value, the way it does in the code above, and then stop and process it?
Feel free to suggest more broad overhauls or entirely different approaches. Thanks in advance.
Edit: More context: The "x" that's passed to the function is a 24000+ by 3 matrix. The first column contains the time, second and third columns the stress and strain, respectively.

Show 1 older comment
O.Ib on 31 Jan 2020
I have R2018b, not 2019. I can't use anything that requires interaction or manual intervention, since I will have millions of similar data sets. The function needs to be fully automated. I think I failed to clarify one crucial point in my question:
This code already works. I have already extracted the data I wanted out of it. What I'm trying to do now is optimise and vectorise this code as much as possible to reduce processing time when I call the function one million times with one million different data sets.
Stephen Cobeldick on 31 Jan 2020
"findpeaks didn't help with that since the data doesn't always ascend/descend monotonously over a quarter-cycle..."
I don't see any reason why it shouldn't, once you set the following options to something suitable for your data:
• MinPeakHeight
• MinPeakProminence
• Threshold
• MinPeakDistance
• You might also find the peak width options useful.
1. show us exactly how you called findpeaks,
2. upload some sample data by clicking the paperclip button.
O.Ib on 31 Jan 2020
Hello, Stephen.
It might be possible to set these parameters to something that fits this particular set of data plotted here, which is one set of 37 others that form a unit, which in turn is part of an even larger set that contains about 54100 sets of (approximately) 37 "blocks" each. You can see why it wouldn't be helpful in the long run to extract the peaks using findpeaks with parameters that are specific to just one of over 2 million data sets. The variance of the "frequency" and amplitude of the peaks with time makes it particularly difficult to find suitable parameters for the findpeaks() function that would be valid even across just one set (37 blocks).
I called findpeaks a number of different ways, starting with:
findpeaks(x(:,Block_18*100))
and then I experimented with a bunch of different options that did refine the results for this particular set, but then I decided I would need something different if I wanted to process millions of data sets. That's not to say the current code shown above already does the job for all sets, but it's a good starting point.
I'll upload one set of 37 "blocks" of data as a text file and this particular set, which is block number 18, which I extracted from the text file. The file "Aufzeichnung.txt" contains 5 columns, separated by spaces. The columns represent, in order: time, block number, load change (which you can delete and ignore), force, and strain. I'm leaving the units out since they're not relevant to what we're investigating.
Edit: The Block_18 variable I uploaded doesn't contain all the values from the 18th block, but about half of them. The peaks I am looking for in each block appear twice; the entire data set would be that graph above mirrored, approximately, about t = 2165 s.

Mohammad Sami on 31 Jan 2020
Have you tried the function islocalmax and islocalmin ?
maxIndices = islocalmax(x(:,2));
minIndices = islocalmin(x(:,2));

O.Ib on 1 Feb 2020
I understood how the prominence option works, but sadly the data has different ranges. It's a very unwieldy set of data.
Mohammad Sami on 1 Feb 2020
Another option
sign_18 = sign(Block_18(:,2));
half_cycle_id = sign_18 ~= circshift(sign_18,1); % detect sign change
half_cycle_id = cumsum(half_cycle_id); % assign id to each half cycle
if half_cycle_id(1) == 0
half_cycle_id = half_cycle_id + 1;
end
min_18 = accumarray(half_cycle_id,Block_18(:,2),[],@min,NaN);
max_18 = accumarray(half_cycle_id,Block_18(:,2),[],@max,NaN);
summ = table(unique(half_cycle_id),min_18,max_18);
summ.isnegative_cyc = sign(summ.min_18) == -1;
summ.val(summ.isnegative_cyc) = summ.min_18(summ.isnegative_cyc);
summ.val(~summ.isnegative_cyc) = summ.max_18(~summ.isnegative_cyc);
O.Ib on 2 Feb 2020
Mohammad, this is exactly what I was hoping to learn when I asked this question: a more vectorised approach that involves detecting, and storing for subsequent processing, locations of sign change in the data. Gave me more insight into the versatility of logical arrays. Thank you very much.