Reading a N columns table which sometimes have N+1 columns

Question

DocWalo on 7 May 2020

0
Link

Direct link to this question

https://au.mathworks.com/matlabcentral/answers/523675-reading-a-n-columns-table-which-sometimes-have-n-1-columns

Commented: DocWalo on 11 May 2020

Accepted Answer: Sindar

Open in MATLAB Online

Hi Everyone,

I search a lot in the forum without findinding a solution. And pardon my English I am French :)

My problem is the following:

I have a log file with a lot of information. The log file could be up to 500Mo (even bigger sometimes). This log file is seperated in 2 main parts. A header part which is easy and fast to retrieve info line by line, and a data part.

The data part is composed of several tasks with a table of data with text before and after.

The data table structure is the following :

   25.870     1.000     lhc       0.000     0.000   -20.140    24.449      1.42061512
   25.870     1.000     lhc       0.000     0.000   -20.520    24.519      1.35075912
   25.870     1.000     lhc       0.000     0.000   -20.951    24.582      1.28833133
   25.870     1.000     lhc       0.000     0.000   -21.434    24.638      1.23204173
   25.870     1.000     lhc       0.000     0.000   -21.958    24.689      1.18086597
   25.870     1.000     lhc       0.000     0.000   -22.503    24.735      1.13498198
   25.870     1.000     lhc       0.000     0.000   -23.148    24.781      1.08854135
   25.870     1.000     lhc       0.000     0.000   -23.741    24.824      1.04623596
   25.870     1.000     lhc       0.000     0.000   -24.244    24.863      1.00744521
   25.870     1.000     lhc       0.000     0.000   -24.626    24.898      0.97159033
   25.870     1.000     lhc       0.000     0.000   -24.876    24.932      0.93839531
   25.870     1.000     lhc       0.000     0.000   -25.010    24.962      0.90779039
   25.870     1.000     lhc       0.000     0.000   -25.057    24.990      0.87971152
   25.870     1.000     lhc       0.000     0.000   -25.063    25.016      0.85443812
   25.870     1.000     lhc       0.000     0.000   -25.072    25.038      0.83238819
   25.870     1.000     lhc       0.000     0.000   -25.115    25.056      0.81396378
   25.870     1.000     lhc       0.000     0.000   -25.220    25.070      0.79981872
   25.870     1.000     lhc       0.000     0.000   -25.406    25.079      0.79060410
   25.870     1.000     lhc       0.000     0.000   -25.611    25.078      0.79173920
   25.870     1.000     lhc       0.000     0.000   -25.936    25.068      0.80208976
   25.870     1.000     lhc       0.000     0.000   -26.373    25.047      0.82291587
   25.870     1.000     lhc       0.000     0.000   -26.891    25.014      0.85576164
   25.870     1.000     lhc       0.000     0.000   -27.437    24.969      0.90124460
   25.870     1.000     lhc       0.000     0.000   -27.928    24.910      0.96048807
   25.870     1.000     lhc       0.000     0.000   -28.254    24.835      1.03468974
   25.870     1.000     lhc       0.000     0.000   -28.317    24.746      1.12353854
   25.870     1.000     lhc       0.000     0.000   -28.070    24.642      1.22847010
   25.870     1.000     lhc       0.000     0.000   -27.662    24.552      1.31821801
   25.870     1.000     lhc       0.000     0.000   -27.101    24.452      1.41784749
   25.870     1.000     lhc       0.000     0.000   -26.466    24.343      1.52711338
   25.870     1.000     lhc       0.000     0.000   -25.820    24.224      1.64568471 **   

As you can see in line 31, ** appears randomly as a 6th column. This is just a part of the data it goes for thousand of lines.

I am using the following code to retrieve those data. It works fine but I have performance problem with big file. It takes too long. Do you have a solution to help me improve performances ? My problem if the interruption cause by these **. The more I have the slower it gets.

Where fid is the identication of current file opened

    % Store all the file in one variable in order to find line of begining and end of tasks and
    % doing more quickly research
    outFile = textscan(fid, '%s', 'Delimiter', '\n');
    frewind(fid);
    
    %Variable
    taskSummaryFlagOn='No.     goal     weight     pol.      rot.      att.    1. comp.  2. comp.      residue';
    taskSummaryFlagOff='Maximum of 1. component:';
    
    % Find the rows where tasks results are
    needle=strfind(outFile{1}, taskSummaryFlagOn);
    rowsStartTask= find(~cellfun('isempty', needle));
    needle=strfind(outFile{1}, taskSummaryFlagOff);
    rowsEndTask= find(~cellfun('isempty', needle));
    nbStartLine=0;nbEndLine=2;
      
    %PreAllocation of the variable for better performances
    dataSimu=cell(max(size(nbLineData)),9);
    nbLineData=zeros(max(size(rowsStartTask)),1);% nbLineData will be to ensure that all the data are correctly retrieve
    % Loop
    for i=1:max(size(rowsStartTask))
        nbLineData(i)=rowsEndTask(i)-rowsStartTask(i)-nbStartLine-nbEndLine;
        dataSimu(i,:)=textscan(fid,'%f %f %f %s %f %f %f %f %f','headerlines', rowsStartTask(i));
        % Exception when the line of data finish with **
        while size(dataSimu{i,1},1)~=nbLineData(i)
            fgetl(fid);% reading the final '**'
            buff=textscan(fid,'%f %f %f %s %f %f %f %f %f');
            for j=1:max(size(buff))
                dataSimu{i,j}=[dataSimu{i,j};buff{:,j}];
            end
        end
        frewind(fid);
    end

If you need more information to understand my problem, I will provide you more details.

Thanks for the time you will spend to help me :)

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Sign in to answer this question.

Answer 1

Sindar on 7 May 2020

0
Link

Direct link to this answer

https://au.mathworks.com/matlabcentral/answers/523675-reading-a-n-columns-table-which-sometimes-have-n-1-columns#answer_431075

Open in MATLAB Online

Assuming you don't need the '**' info, you could try this solution from the fscanf examples which skips the remainder of the line after the data you expect:

dataSimu(i,:)=textscan(fid,'%f %f %f %s %f %f %f %f %f %*[^\n]'','headerlines', rowsStartTask(i));

1 Comment
Show -1 older commentsHide -1 older comments

DocWalo on 11 May 2020

Thanks Sindar! Only with that I reduce of a 10 factor my computation time.

If you see other mean to accelerate computation don't hesitate to add some tips :)

Sign in to comment.

Reading a N columns table which sometimes have N+1 columns

0 Comments
Show -2 older commentsHide -2 older comments

Accepted Answer

1 Comment
Show -1 older commentsHide -1 older comments

More Answers (0)

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

Reading a N columns table which sometimes have N+1 columns

0 Comments Show -2 older commentsHide -2 older comments

Accepted Answer

1 Comment Show -1 older commentsHide -1 older comments

More Answers (0)

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

0 Comments
Show -2 older commentsHide -2 older comments

1 Comment
Show -1 older commentsHide -1 older comments