28 views (last 30 days)

Hi there!

Heads up: Technically this is a homework-problem, but I'm done and just want your thoughts on if the code could be faster/better.

The task is to find how many same-digit-sequences of different lengths that appear in the data. I.E. in data=[1,1,1,2,2,3,3,4,4] sequcens of length 3 appears once, sequences of length 2 appears thrice. So i want appearances=[0,3,1,0,...] (indices indicate length of sequence)

The data is typically a vector of length 5000000 and you don't know how long the longest sequences are (otherwise I'd probably use strfind and decimate the data). Further this is done in a context where the starting-index of the sequences might later become important, even though I don't extract them atm.

The code submitted below should only have to run over the data vector once to withdraw all neccesary information. But still it's at least two times slower than what I've got reason to belive that it could be.

Do you have any ideas on how to solve this task in a quicker/cleaner fashion? Would love to see your take on it. /FB

%find longest similar sequence in data

%data size typically 1x5000000 (first five million digits in pi or e for example)

data=[1,1,1,2,1,1,3,3,2,4,4,4,5,6,7,8,8,9,9,9,8,8,7,7,5,5,4,6,6,2,5,9,9,9,9,7,7,7,7,8,8,4,4,5,5,6,6,4,4,5,5,1,1,2,1,3,3,1];

appearances=zeros(10,1); %count how many times sequences of specific lengths 1:10 appear.

%assumes that we wont find similar sequences longer than 10 such as 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1 in data

i=2; %counter s.t while approximate a for-loop. Not using for since can't manually change i within for-loop

%starting at 2 since looking at 2 numbers at a time to compare if equal or not

while i<length(data)

isEqual=(data(i-1)==data(i)); %true if in a sequence of equal digits. 1,1 gives true, 1,2 gives false etc.

switch isEqual

case true

%save where sequence started and keep incrementing while checking if equal

start_i=i-1;

while isEqual&&(i<length(data))

i=i+1;

isEqual=(data(i-1)==data(i));

end

%calculate length of sequence and record as new appearance of that length

end_i=i-1;

sequenceLength=(end_i-start_i+1);

appearances(sequenceLength)=appearances(sequenceLength)+1;

case false

%keep incrementing and look for sequence

while~isEqual&&(i<length(data))

i=i+1;

isEqual=(data(i-1)==data(i));

end

end

end

%to find out sequences of length one subtract the sum of all other

%sequences from length of data

appearances(1)=length(data)-sum(appearances);

Image Analyst
on 23 Jan 2021

Here's how I thought of to do it, using regionprops to count the length of each consequtive run of numbers:

data = [1,1,1,2,1,1,3,3,2,4,4,4,5,6,7,8,8,9,9,9,8,8,7,7,5,5,4,6,6,2,5,9,9,9,9,7,7,7,7,8,8,4,4,5,5,6,6,4,4,5,5,1,1,2,1,3,3,1]

uniqueData = unique(data)

% Preallocate an array for what we think the longest possible run length would be, say a 1000 or so

appearances = zeros(1000, 2); %count how many times sequences of specific lengths 1:10 appear.

% Put lengths into column 2 for convenience for when we look at our output matrix.

appearances(:, 1) = [1 : size(appearances, 1)]';

for k = 1 : length(uniqueData)

% Get this one unique number.

thisValue = uniqueData(k);

% Find regions with k

props = regionprops(data == thisValue, 'Area');

% Increment the count of each length of data.

for k2 = 1 : length(props)

appearances(props(k2).Area, 2) = appearances(props(k2).Area, 2) + 1;

end

end

appearances % Show in command window.

% Crop off bins that has no runs of that length.

lastBin = find(appearances(:, 2) > 0, 1, 'last');

appearances = appearances(1:lastBin, :) % Show in command window.

You'll see the final result in the command window:

appearances =

1 11

2 15

3 3

4 2

So there were 11 sequences of length 1, 15 sequences of length 2, 3 sequences of length 3, and 2 sequences of length 4. Same result as Matt's (if you use the same vector) - it's just a different approach.

Image Analyst
on 23 Jan 2021

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!
## 0 Comments

Sign in to comment.