optimizing my code using vectorization methods or avoiding for loops

1 view (last 30 days)
Hi everyone i'm wondering is there any way to optimize this code to run faster by using vectorization Methods or avoiding loops or any other methods . Here i have a very large text file (up to 2.5 G ) that must be readed and compared line by line with another file in .xlsx format .It take me eternal to run and i'm also worried that the memory will not be enough because the result of calculation will be much bigger than the .txt file
the text file is something like this:
1567683075.081675 800002C1 1100000000000000
1567683075.082312 80000189 7437060000843B00
also same structure with 3 column as time and hex and about 10 million row which will be 2 G
and the excel file has 800 row and 17 column like this:
'1' '800002CA' 'EBC1' 'nodata' 'ASRECA_1' 'nodata ' '1' '0' 'NaN' 'NaN' 'NaN' 'NaN' 'NaN' 'NaN' 'NaN' '4' '0,5'
as I said the second column of text file will be compared with the second column in excel file and some calculation happend . so this should be done for all the rows in text file and the result will be stored in a structure .
I'm so far with the code and i want to know how can i replace this 2 for loops because the second one will be irritated 10 million time as length of c is the same with text.
Thank you
fileID = fopen('day_29_08.txt');
text = textscan(fileID,'%s %s %s');
fclose(fileID);
length_text=length(text{1,3})%% it will be 10 million rows
excel_data = readtable('List.xlsx');
excel_id = table2cell(excel_data(:,2));
excel_signal_name = table2cell(excel_data(:,5));
length_excel=length(excel_data);%% 800 Rows
for i=1:length_excel
c=strcmp(excel_id{i},text{1,2});%% compare every id in excell with every 10 million rows of text
for j=1:length(c)
if c(j)
%% hier i need the index of c for calculation and assignments
end
end
end

Accepted Answer

Spencer Chen
Spencer Chen on 3 Feb 2020
I would first use the "profile" function to check which part of the code is actually taking a long time. Then tackle those lines.
Secondly, maybe some of your for-loops are inevitable, but you can do it more efficiently. One way to do that is reduce the number of iterations you would have to do.
For example -- your big i-for-loop. You are looping your text file, which according to you needs to run 20 million iterations. Consider then, what if you can loop your excel file instead? That only has 800 rows, much less than 20 million. How can modify your code to loop it that way?
Thirdly, you have may conditional assignments within your loop. Instead of doing it item by item inside the loop, think about how you can do this outside the loop in a vectorized form, e.g. use your for-loop to identify those matched_data_indices, then you have a vectorized your assignments:
matched_data_indices; % from loop
flag = isnan(excel_bit(matched_data_inices)) && isnan(excel_signalbyte_2(matched_data_inices));
dec_struct.hex = ...
Lastly, you may not need to do a for-loop to find matching data.
Blessings,
Spencer

More Answers (0)

Categories

Find more on Large Files and Big Data in Help Center and File Exchange

Products

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!