How to build a structure that is easier to work with (i.e. for looping through and adding to)

Question

Alex Mason on 4 Sep 2023

0
Link

Direct link to this question

https://au.mathworks.com/matlabcentral/answers/2016661-how-to-build-a-structure-that-is-easier-to-work-with-i-e-for-looping-through-and-adding-to

Commented: dpb on 6 Sep 2023

structure.zip

I have an app I am writing and what I want it to do is build a structure that will be filled with test data for various things. There will be evaulation data and validation data for battery cells and within each of these, a list of cells and for each cell a list of months and for each month, there is some data which I want as a table. ( I think).

So the full thing looks like this:

structure.Evaluation.Cell_1.Month_1.RawData

Now I've come to realise that whilst this looks nice as you're interacting with the structure in workspace. Its horrible for wanting to loop through because I need to have a way of generating the "Cell_1" and "Month_1", then next loop "Cell_1" and "Month_2"..... etc for each cell. So at the moment I am leaning heavily on "eval" to do this which just feels wrong.

So I think I want it to be more like:

structure.Evaluation.Cells.Months.RawData

So then indexing becomes the easy way to just loop through all the bits. But I am struggling with how this would look.

The RawData is a table and it could be any size but will usually have 8-10 columns and 1000's of rows. Each month has its own table of raw data. There are multiple months of data for 1 cell and then multiple cells. I can't visualise how this would look if I didn't use the first method which is effectively adaptively naming my variables. Which I know is a bit of a no-no.

Can I have the raw data table held in like 1 cell? so "Months(1) = a cell or block containing a 10 x 10,000 data table"?

then going 1 up to "Cells" there would be a cell for each month.

I am sorry if this is poorly explained. I can't really get my head round it. I have attached an example of the structure as it is now.

5 Comments
Show 3 older commentsHide 3 older comments

Alex Mason on 5 Sep 2023

@Stephen23 as I mention to Steven... I think the single flat table might just work. When I load the data files (these are kept in 1 folder but are containing "month_x" and "cell_y" in their naming structure) I could just vertically concatenate the data sets whilst also adding 3 new columns : Month, Cell, Type. I will know all these infos because its contained in the filenames and I can already extract those. Then I can use logical masks to single out chunks of data.

My concern would be how big the table gets but I assume Matlab is OK with the potential for millions of rows?

This is battery cell characterisation data. The cells are monitored for around 9 months. So depending on logging rate there could be a lot of data.

Stephen23 on 5 Sep 2023

"My concern would be how big the table gets but I assume Matlab is OK with the potential for millions of rows?"

MATLAB has no problem with this, it depends more on your available computer memory.

Another option would be to use a datastore / tall arrays:

https://www.mathworks.com/help/matlab/large-files-and-big-data.html

Sign in to comment.

Sign in to answer this question.

Answer 1

Bruno Luong on 4 Sep 2023

0
Link

Direct link to this answer

https://au.mathworks.com/matlabcentral/answers/2016661-how-to-build-a-structure-that-is-easier-to-work-with-i-e-for-looping-through-and-adding-to#answer_1301146

Edited: Bruno Luong on 5 Sep 2023

Open in MATLAB Online

If I was you I organize the data like this, just a linear array of structs

load('structure.mat');
NewDataStruct = struct('DataInfo', shareData.DataInfo);
NewDataStruct.DataRecord = ConvertRawData(shareData, struct())
function DataRecord = ConvertRawData(s, info)
f = fieldnames(s);
DataRecord = [];
for k=1:length(f)
    fk = f{k};
    Tmp = [];
    switch fk
        case 'RawData'
            Tmp = info;
            Tmp.RawData = s.(fk);
        case {'EvaluationData', 'ValidationData'}
            info.Type = fk;
        otherwise
            N = regexp(fk,'Month_(\d+)|Cell_(\d+)', 'tokens', 'once');
            if ~isempty(N)
                N = str2double(N{1});
                fbase = fk(1:find(fk=='_',1)-1);
                info.(fbase) = N;
            end
    end
    if isstruct(s.(fk))
        Tmp = ConvertRawData(s.(fk), info);
    end
    if ~isempty(Tmp)
        if isempty(DataRecord)
            DataRecord = Tmp;
        else
            DataRecord = [DataRecord; Tmp]; %#ok
        end
    end
end
end

4 Comments
Show 2 older commentsHide 2 older comments

Bruno Luong on 5 Sep 2023

Open in MATLAB Online

"If I want to loop through and load/work on the cell 1 data, how do i index or reference that?"

filter = [NewDataStruct.DataRecord.Cell] == 1;
DataFiltered = NewDataStruct.DataRecord(filter)

Alex Mason on 5 Sep 2023

@Bruno Luong perfect. I had just figured this out. Thanks for confirming I am on the right lines.

Sign in to comment.

Answer 2

Steven Lord on 4 Sep 2023

0
Link

Direct link to this answer

https://au.mathworks.com/matlabcentral/answers/2016661-how-to-build-a-structure-that-is-easier-to-work-with-i-e-for-looping-through-and-adding-to#answer_1301161

I'd probably store this either as a timetable (with the date and time data stored as the RowTimes, and as many data variables as you need) or as a table with multiple colums for your cell and month data. Then you could use logical indexing into the rows of the tabular array (either using matches or startsWith on the column containing your month "names" or using the month function on the RowTimes and selecting the appropriate month numbers.

1 Comment
Show -1 older commentsHide -1 older comments

Alex Mason on 5 Sep 2023

Hi Steven, I will explore this.

Not sure about the timetable and having row names as the time/date? I guess what I could do is literally concatenate 2 new columns "month" and "cell" to the data array and just sift through in that fashion. I could just use logical checks to mask out the chunks of data needed based on month number AND cell number.

I will have a read about timetables though. I am not familiar with them at all which is perhaps why I can't imagine how it might work.

Sign in to comment.

Answer 3

Bruno Luong on 5 Sep 2023

0
Link

Direct link to this answer

https://au.mathworks.com/matlabcentral/answers/2016661-how-to-build-a-structure-that-is-easier-to-work-with-i-e-for-looping-through-and-adding-to#answer_1301766

Edited: Bruno Luong on 5 Sep 2023

Open in MATLAB Online

If you want to organize as a single giant table.

IMO if you don't need to mix part of the tables, you should not do this way. Keep array of tables as my other solution is better.

load('structure.mat');
NewDataStruct = struct('DataInfo', shareData.DataInfo, ...
                       'Data', ConvertRawData2SingleTable(shareData, struct()))
function DataRecord = ConvertRawData2SingleTable(s, info)
f = fieldnames(s);
DataRecord = [];
for k=1:length(f)
    fk = f{k};
    Tmp = [];
    switch fk
        case 'RawData'
            T =  s.(fk);
            infof = fieldnames(info);
            for j=1:length(infof)
                T.(infof{j})(:) = info.(infof{j});
            end
            Tmp = T;
        case {'EvaluationData', 'ValidationData'}
            info.Type = string(fk);
        otherwise
            N = regexp(fk,'Month_(\d+)|Cell_(\d+)', 'tokens', 'once');
            if ~isempty(N)
                N = str2double(N{1});
                fbase = fk(1:find(fk=='_',1)-1);
                info.(fbase) = N;
            end
    end
    if isstruct(s.(fk))
        Tmp = ConvertRawData2SingleTable(s.(fk), info);
    end
    if ~isempty(Tmp)
        if isempty(DataRecord)
            DataRecord = Tmp;
        else
            DataRecord = [DataRecord; Tmp]; %#ok
        end
    end
end
end

2 Comments
Show NoneHide None

Bruno Luong on 5 Sep 2023

Edited: Bruno Luong on 6 Sep 2023

Open in MATLAB Online

Note that how the extra memory required by single table storage after conversion

>> whos
  Name               Size                Bytes  Class     Attributes
  NewDataStruct      1x1             362504847  struct              
  shareData          1x1             165267346  struct 

dpb on 6 Sep 2023

IF were to go to timetable, use the datetime for the date rather than augmenting with a month/day extra columns; use lookup within it for time selection to process; retime might be of use.

In a table, the extra memory is compensated for by the handy nature of rowfun and grouping variables to do all kinds of magical analyses in very few lines of code -- again IF the nature of the analysis is by some set of variables.

If it's simply iterating through each dataset one at a time, not a whole lot to be gained as Bruno says...but we've no knowledge of what your end objectives are with which to guide the tools to use.

Sign in to comment.

How to build a structure that is easier to work with (i.e. for looping through and adding to)

5 Comments
Show 3 older commentsHide 3 older comments

Accepted Answer

4 Comments
Show 2 older commentsHide 2 older comments

More Answers (2)

1 Comment
Show -1 older commentsHide -1 older comments

2 Comments
Show NoneHide None

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

How to build a structure that is easier to work with (i.e. for looping through and adding to)

5 Comments Show 3 older commentsHide 3 older comments

Accepted Answer

4 Comments Show 2 older commentsHide 2 older comments

More Answers (2)

1 Comment Show -1 older commentsHide -1 older comments

2 Comments Show NoneHide None

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

5 Comments
Show 3 older commentsHide 3 older comments

4 Comments
Show 2 older commentsHide 2 older comments

1 Comment
Show -1 older commentsHide -1 older comments

2 Comments
Show NoneHide None