Is map reduce suitable to analyze large data set that has an iterative function?

2 views (last 30 days)

Show older comments

Clayton Leung on 10 Sep 2021

0
Link

Direct link to this question

https://au.mathworks.com/matlabcentral/answers/1450454-is-map-reduce-suitable-to-analyze-large-data-set-that-has-an-iterative-function

Commented: Clayton Leung on 10 Sep 2021

Currently, I have a large timetable that is more than 1 billion rows x 3 columns.

Some of the highlighted functions I use include:

unstack: which turns my timetables into 1 billion rows x 1000 columns.

fillmissing(data, 'previous'): which fills all the NaN values from the previous value.

retime: which in some cases, can increase my number of rows 10 fold.

cumsum: add all the previous data together.

I am able to process small datasets using standard matlab function. But for some of the larger dataset (> 1 billion rows). I run into memory issues.

I am planning to break my timetable into smaller pieces, record all the "states" at the end of each section, and repopulate at the beginning of the next batch.

Can map reduce help me in this situation?

Any pseudo code is appreciated. Thank you

4 Comments
Show 2 older commentsHide 2 older comments

Ive J on 10 Sep 2021

Open in MATLAB Online

fillmissing support tall arrays.

Regardless, in general you can use mapreduce , something like:

ds = datastore(...);
raw = mapreduce(ds, @myMapper, @myReducer);
raw = readall(raw); % may not fit into memory (maybe tall?)
function myMapper(data, intermKeys, intermKVStore)
    data = fillmissing(data, 'previous');
    % other filters go here
    % do whatever
    offsetData = intermKeys.Offset;
    add(intermKVStore, offsetData, data)
end
function myReducer(intermKey, intermValsIter, outKVStore)
  data = []; 
  while(hasnext(intermValsIter))
      data = [data; getnext(intermValsIter)];
  end
  add(outKVStore,intermKey,data);
end

However, you should be careful about fillmissing (or maybe unstack too) in cases where first rows in some chunks are missing (so you don't have access to the previous rows because they're in another chunk). So, this approach is good if chunks can be treated somehow independ of each other.

Clayton Leung on 10 Sep 2021

@ive, thanks. I appreciate your answer.

However, as i said in my question, I required to have a "previous state" to pass on to the next iteration. Now I know that map reduce does not have this featture. I shall develop my own script to slice the data with previous states.

Answers (0)

Products

MATLAB

Release

R2021a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Is map reduce suitable to analyze large data set that has an iterative function?

4 Comments
Show 2 older commentsHide 2 older comments

Answers (0)

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

Is map reduce suitable to analyze large data set that has an iterative function?

4 Comments Show 2 older commentsHide 2 older comments

Answers (0)

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

4 Comments
Show 2 older commentsHide 2 older comments