Search for minimal value in column of table when value in other columns are the same

2 views (last 30 days)
Hi,
I am trying to use Matlab to find the fastest route to travel from an origin to a destination. In the attached image I am showing a few rows of the table that originally contains 70000 rows. So what I need is to get a '1' in column '7. fastest_route' when the value in column '5. route_with_delay' is the smallest value of all values in column 5 with the same origin (column 2) and destination (column 3).
Maybe good to mention is that it is not always the case that each origin and destination have two different routes, but that could also be 1 or 10 or something.
It would be great if someone could help me out!
Best regards,
Aron
  4 Comments
dpb
dpb on 10 Apr 2021
Edited: dpb on 10 Apr 2021
What to do in case of a tie?
>> RouteTT_BPR40(ig==3790,:)
ans =
4×7 table
route_id origin_junction destination_junction route_tt2Bpr40 route_with_delay Junctiondelay RowNum
________ _______________ ____________________ ______________ ________________ _____________ ______
6860 104709 126249 11.437 113.44 102 9320
35766 104709 126249 18.296 60.296 42 9321
54486 104709 126249 18.296 60.296 42 9322
65445 104709 126249 11.707 123.71 112 9323
>> tFastest(tFastest.origin_junction=='104709'&tFastest.destination_junction=='126249',:)
ans =
1×5 table
origin_junction destination_junction GroupCount MiniumTime
_______________ ____________________ __________ __________
104709 126249 4 60.296
>>
Almost got there, there's a problem in my anonymous function yet to work out and have to go do other things for a while, but here's the general idea of how --
>> tFastest=rowfun(@(d,r) deal([min(d) r(d==min(d),1)]),RouteTT_BPR40, ...
'InputVariables',{'route_with_delay','RowNum'}, ...
'GroupingVariables',{'origin_junction','destination_junction'}, ...
'separateinputs',1, ...
'outputvariablenames',{'MiniumTime','MinimumRow'});
Error using tabular/rowfun>dfltErrHandler (line 517)
Applying the function '@(d,r)deal([min(d),r(d==min(d),1)])' to the 3790th group of rows in A generated the following error:
Dimensions of arrays being concatenated are not consistent.
.
Error in tabular/rowfun>@(s,varargin)dfltErrHandler(grouped,funName,s,varargin{:}) (line 262)
errHandler = @(s,varargin) dfltErrHandler(grouped,funName,s,varargin{:});
Error in tabular/rowfun (line 282)
[b_data{igrp,:}] = errHandler(struct('identifier',ME.identifier, 'message',ME.message, 'index',igrp),inArgs{:});
>>
Applying the above without the attempt to find the matching row ID results in the table tFastest (the above output shown is a subset using the grouping variable to find the dataset the caused the above error).
RowNum is just a serial number from 1:height(table) to have the index into the entire table of the row with the minimum since the index if returned by using the two output arguments from min() would just be the row inside the group, not the location in the overall table.
The brute force way is to take the output of the table then do an external find to locate the rows in the table and populate those in a second step. That's just not as elegant as one-step... :)
dpb
dpb on 10 Apr 2021
Took a break; problem is
>> sum(isnan(RouteTT_BPR40.route_with_delay))
ans =
482
>> isbad=(isnan(RouteTT_BPR40.route_with_delay));
>> tBad=RouteTT_BPR40(isbad,:);
>> head(tBad)
ans =
8×7 table
route_id origin_junction destination_junction route_tt2Bpr40 route_with_delay Junctiondelay RowNum
________ _______________ ____________________ ______________ ________________ _____________ ______
1017 100155 111931 NaN NaN NaN 114
52302 100155 137837 NaN NaN NaN 221
1125 100248 111931 NaN NaN NaN 354
52359 100248 137837 NaN NaN NaN 462
1433 101060 111931 NaN NaN NaN 594
32291 101060 111931 NaN NaN NaN 595
32306 101060 137837 NaN NaN NaN 711
1541 101536 111931 NaN NaN NaN 844
>>
There are 482 records for which your time values aren't finite.

Sign in to comment.

Accepted Answer

dpb
dpb on 11 Apr 2021
Edited: dpb on 11 Apr 2021
function [f,ix]=fastest(d,r)
% return minimum of input time of travel array, d and
% corresponding row index of minimum from row array, r
f=min(d);
ix=r(find(d==f,1));
% fixup if f returns NaN so is no match for ix
if isempty(ix)
ix=0;
end
end
SIDENOTE:
The NaN elements in the input mean can't write a working function as an anonymous function if must handle those; either a separate error handler or a fixup inside the function as below will work...
The first attempt was to just put into a try...catch block as
function [f,ix]=fastest(d,r)
% return minimum of input time of travel array, d and
% corresponding row index of minimum from row array, r
try
f=min(d);
ix=r(find(d==f,1));
catch
% fixup if f returns NaN so is no match for ix
ix=0;
end
end
but the error wasn't catchable owing to the work being passed off to the function inside rowfun; the catch clause error handler never gets triggered because the error doesn't actually occur until the attempted assignment; it isn't an error just to return the empty result, only when try to use it later.
ENDNOTE
The above returns
>> [head(tFastest);tail(tFastest)]
ans =
16×5 table
origin_junction destination_junction GroupCount MiniumTime MinimumRow
_______________ ____________________ __________ __________ __________
100155 103311 2 115.16 1
100155 103315 2 70.633 4
100155 103320 3 125.99 6
100155 103397 3 124.34 9
100155 103586 3 126.04 12
100155 103764 2 174.75 14
100155 103847 2 66.167 17
100155 103857 2 67.675 19
142610 140105 2 65.171 73919
142610 140161 3 14.78 73923
142610 140538 2 69.402 73924
142610 141012 4 31.059 73928
142610 141046 1 101.33 73930
142610 141091 3 53.786 73932
142610 142288 3 13.644 73936
142610 142539 2 8.4156 73937
>>
for
tFastest=rowfun(@fastest,RouteTT_BPR40,'InputVariables',{'route_with_delay','RowNum'}, ...
'GroupingVariables',{'origin_junction','destination_junction'}, ...
'separateinputs',1, ...
'outputvariablenames',{'MiniumTime','MinimumRow'});
To put the indicator flag back into the original table if needs must do that, use the .MinimumRow location as indexing vector --
fastest_route=zeros(height(RouteTT_BPR40),1); % allocate, prefill
fastest_route(tFastest.RowNum)=1; % set indicator flag
RouteTT_BPR40.fastest_route=fastest_route; % store in original table
It's hard to convince people don't need to loop or break up large tables or arrays into multiple pieces, but rowfun along with grouping variables can often do in only a few steps what seems remarkable.

More Answers (0)

Categories

Find more on Numeric Types in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!