Comparing a million data from csv files takes too much time

Hello everyone,
I am quite new to this program and need some help regarding this problem. I want to compare 1 million number to make sure there are no same number meet each other (n-1 ~= n). I tried to program the code, and using tic toc to measure time, elapsed time recorded is 40944.541765 seconds. This amount of time just for one csv file. actually i do want to make the code run for every csv file in the folder, but it is quite complicated so i just tried to focus to make calculation to one csv file first. How could i optimize this piece of code and make the calculation more accurate ? Thank You
data = csvread('data.csv',9); % Read the csv
a = zeros(1,999999); % Initialize a variable
for i=1:999998
t = data(i) ~= data(i+1); % make sure that n != n+1
a(i) = t; % Saving t value to a array
v=sum(a(:)==0); % Counting boolean 0 in a array
end
csvwrite('count.csv',v); % Writing the number to new csv file

 Accepted Answer

I assume, you want to calculate the number of nonzero difference data from one value to next to that value
We can do without loops, this may help you
tic
data = randi(100, [1, 999999]); % taken a randon data of your data length
v = sum(diff(data) ~= 0);
toc
Elapsed time is 0.022935 seconds.

1 Comment

Thank you sir. Actually I've tried to calculate it in microsoft excel first to make sure the matlab output is correct using =a2<>a1 in column B and =COUNTIF(B1:B1000000;"false"). Your answer is insightful. I've tried your answer but the adjustment i need to do is change the
v = sum(diff(data) ~= 0);
to
v = sum(diff(data8a) == 0);
to output the same output as microsoft excel. I will accept your answer. Thank you

Sign in to comment.

More Answers (0)

Products

Release

R2016a

Asked:

on 17 Sep 2022

Commented:

on 17 Sep 2022

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!