Slow Triple for loop

Hi,
I have created a program that has a tripple for loop but runs very slowly and was hoping for some tips to speed it up... Here is the code:
if WeightingVectorWells(1) > 0 || WeightingVectorWells(4) > 0
for LatCounter = 1:length(handles.Latitude)
for LongCounter = 1:length(handles.Longitude)
for WellDataCounter = 1:size(handles.WellData,1)
if sqrt(((handles.WellData{WellDataCounter,2} - handles.Latitude(LatCounter)) ^ 2 + (handles.WellData{WellDataCounter,3} - handles.Longitude(LongCounter)) ^ 2)) < .01067
handles.ScoreSheetBPDOil(LatCounter,LongCounter) = handles.WellData{WellDataCounter,12} * WeightingVectorWells(1) + ...
handles.ScoreSheetBPDOil(LatCounter,LongCounter);
handles.ScoreSheetCumOil(LatCounter,LongCounter) = handles.WellData{WellDataCounter,13} * WeightingVectorWells(4) + ...
handles.ScoreSheetCumOil(LatCounter,LongCounter);
end
end
end
if LatCounter == length(handles.Latitude)
LatCounter = LatCounter - 1;
end
progressbar(LatCounter / length(handles.Latitude))
end
end
I created a GUI for this program (first time using GUIDE and handles and handles was the easiest way I found to talk to different functions and share data so sorry if that's bad pratice). The handles.ScoreSheets are preallocated. The LatCounter and LongCounter are both about 300 in size and the handles.WellData is a 19000 by 26 matrix (19000 being the max, it can be as little as 1 but is usually around 500 or so). Whenever I stop this mid processing (ctrl-c) it always seems to be doing the sqrt function. Also, as a side note I'm trying to get my company to get parallel processing tool box because I have six of these loops. progressbar is just a function I picked up off the file exchange and takes very little processing and the if statement at the bottom inside the outer most loop is just so that progressbar doesn't bug out.
Thanks for any help, I'm an engineer, not a programmer so I really do appreciate anything.

 Accepted Answer

Cedric
Cedric on 17 Jul 2013
Edited: Cedric on 17 Jul 2013
The very first thing is that you should not modify the loop index LatCounter within the loop. ( EDIT: the following sentence is wrong! see Iain's comment) What you are doing with the -1 will create an infinite loop blocked on LatCounter == length(handles.Latitude).
Then you can probably get rid of the two outer loops by creating a meshgrid of longitudes and latitudes. Example:
lat = 1:2:7
lon = 10:2:14
[LAT, LON] = meshgrid(lat, lon)
normCoordSq = LAT.^2 + LON.^2
cumResult = zeros(size(LAT)) ;
data = [2, 7, 5, 9] ;
for dataId = 1 : length(data)
mask = sqrt(normCoordSq + cumResult + data(dataId)^2) < 16 ;
cumResult(mask) = cumResult(mask) + data(dataId)^2 ;
end
cumResult
This leads to the following output:
lat =
1 3 5 7
lon =
10 12 14
LAT =
1 3 5 7
1 3 5 7
1 3 5 7
LON =
10 10 10 10
12 12 12 12
14 14 14 14
normCoordSq =
101 109 125 149
145 153 169 193
197 205 221 245
cumResult =
78 78 78 78
78 78 78 53
53 29 29 4
As you can see, there is only one loop, and the conditional statement is replaced by logical indexing based on the outcome of a relational operation. Then you could probably get rid of the last loop using ARRAYFUN, CELLFUN, or BSXFUN, but I doubt that it would bring much improvement (in the sense that these are essentially hidden loops which bring more conciseness than efficiency in term of computation time).

11 Comments

"What you are doing with the -1 will create an infinite loop blocked on LatCounter == length(handles.Latitude)."
That is incorrect. Matlab's for loops don't work on the premise of "from x in steps of y to z, they work on the premise of working through the columns of a matrix. - Really.
for i = [1:5; 5:9]
disp(i')
end
Cedric
Cedric on 17 Jul 2013
Edited: Cedric on 17 Jul 2013
Thank you! I'm am switching a lot between languages and I didn't re-sync' (think?) with MATLAB.
Cedric,
Thanks for the response. I had some questions about script you posted. I see how your LAT and LON would correspond to my constant Lat and Long that I'm counting through in my loop but then how would I incorporate the other lat and long that I'm using to measure the distance between the for loop lat and long? I also don't really get what this 'data' variable is and what you are dong with it, I'm assuming it's suppose to represent my
handles.WellData{WellDataCounter,12} * WeightingVectorWells(1)
but I dont really know. Sorry, I'm pretty new to all this but thanks for the help.
Cedric
Cedric on 17 Jul 2013
Edited: Cedric on 17 Jul 2013
No problem James; here is a more specific illustration with comments:
>> lat = 1:2:7 ;
>> lon = 10:2:14 ;
>> [LAT, LON] = meshgrid(lat, lon)
LAT =
1 3 5 7
1 3 5 7
1 3 5 7
LON =
10 10 10 10
12 12 12 12
14 14 14 14
We build a single data entry which has the structure [value, lat, lon]:
>> data = [9, 2, 11] ;
Now by computing the following:
>> dist = sqrt( (data(2)-LAT).^2 + (data(3)-LON).^2 )
dist =
1.4142 1.4142 3.1623 5.0990
1.4142 1.4142 3.1623 5.0990
3.1623 3.1623 4.2426 5.8310
we get in one shot the distance from the data point to all points of the grid of coordinates (you can see that with the coordinates of the point in the upper-left region, distances make sense).
We can e.g. test values of this array using a relational operator, and we obtain an array of logicals that indicates, for each value of the distance array, the outcome of the test:
>> mask = dist < 4
mask =
1 1 1 0
1 1 1 0
1 1 0 0
This array of logicals (that I name "mask" because we will use it as a mask) can be used for indexing another array (this is called logical indexing), e.g.
>> results = zeros(size(LAT))
results =
0 0 0 0
0 0 0 0
0 0 0 0
>> results(mask) = data(1)
results =
9 9 9 0
9 9 9 0
9 9 0 0
Now we can increase a bit the complexity and get closer to your setup by looping over a few data points (the second point is this time in the lower-right region):
results = zeros(size(LAT)) ;
data = [9, 2, 11; ...
8, 6, 13] ;
for dId = 1 : size(data, 1)
dist = sqrt( (data(dId,2)-LAT).^2 + (data(dId,3)-LON).^2 ) ;
mask = dist < 4 ;
results(mask) = results(mask) + data(dId) ;
end
results
which leads to:
results =
9 9 17 8
9 17 17 8
9 17 8 8
Let me know if this is still unclear.
Cedric
You've been a huge help, thank you so much! I did have one other question about a little change I made and wanted to see if it makes sense.
Let's say that I want rows to equal latitude and columns to equal longitude (the same way you would look at a map) and I have 10 lat values and 15 long values. If I were to do
[LAT,LON] = meshgrid(lat,long)
I would get 15 rows and 10 columns so to keep in line with having rows as latitude and columns as longitude, should I do
[LON,LAT] = meshgrid(long,lat)
? This will give me a 10 by 15 matrix for the 'results'.
Cedric
Cedric on 17 Jul 2013
Edited: Cedric on 17 Jul 2013
Yes, I initialized results based on size(LAT) and not as zeroes(length(lat),length(lon)) for this reason actually. This way you you can permute the arguments of MESHGRID if you want to, without having to update the rest of the code.
Note that as MATLAB displays arrays with increasing row indices pointing downwards, you still have growing latitudes pointing downwards. If all you do is displaying arrays in the command window, you can use e.g. FLIPUD for managing this point of "cosmetic".
James
James on 17 Jul 2013
Alright, well thank you very much for the help today. You made my program run probably 100 times faster and I learned about logical indexing. Both very valuable.
Cedric
Cedric on 17 Jul 2013
Edited: Cedric on 17 Jul 2013
You're most welcome! And you won't regret having learned logical indexing, because it's probably the most useful way to work with "indexable objects". In fact, using numeric indices is often unnecessary and less efficient. You'll often see people doing things like:
>> id = find(a>4) ;
>> a(id) = 0 ;
or even worse
>> id = find(a>4) ;
>> for k = 1 : length(id)
a(id(k)) = 0 ;
end
when they want to set all elements of vector a which are greater than 4 to 0 (driven by the idea that they need to loop over all indices of elements above 4), when
>> a(a>4) = 0 ;
would be simpler and more efficient. In the latter, the evaluation of a>4 generates an array of logicals, which is directly used (without storing it explicitly into an intermediary variable) for indexing. And this works with whatever size/dimension of a.
Very nice, after you said this I went through the rest of my code and got rid of most all my for loops (I have a filter process in which I was using lots of double for loops to do it). I did want to see if there was a way to get rid of this one...
CountyList = get(handles.listCountyLeft,'String');
if isempty(get(handles.listCountyRight,'String')) == 0
for ii = 1:length(CountyList)
%the 5 at the end is found from the Excel sheet column number
handles.WellData((strcmp(CountyList{ii},handles.WellData(:,5))),:) = [];
end
end
I have two list boxes and the user chooses some counties from one list box and puts them in the other for the counties he wants to delete which makes 'CountyList'. I was wonderig if there is a way to not have to loop through each county. It doesn't take that long (maybe 10 - 15 seconds) and is tons faster than the double loop I had going but was just curious if this is about the best I can do?
Cedric
Cedric on 18 Jul 2013
Edited: Cedric on 18 Jul 2013
I am not programming GUI in MATLAB actually (just tried once), but based on what I am doing in other languages, I would look for a way to work with IDs instead of strings, and keep strings either for display or for when users are allowed to type county names (which have to be compared using e.g. STRCMPI to the list of all counties for validation). Here are a few remarks..
If multiple entries/rows in the Excel file can be associated with similar counties, I would use UNIQUE to get a list of unique county names. UNIQUE outputs not only the list, but also arrays of indices that allow you to go back and forth between the original list and the list of unique names. Note that UNIQUE sorts too by default.
SORT could also be used, and it also outputs an array of indices.
In the following, I assume that you have a cell array of unique, sorted county names, named countyName. Let's also assume that you have four counties, named 'A', 'B', 'C', and 'D'. One way to work with IDs is as follows: you create arrays of IDs and you work with them, leaving countyName unchanged. To illustrate
county_displayID = 1 : length(countyName) ; % Display in left listbox.
county_deleteID = [] ; % Display in right listbox.
Now for building the left listbox:
if ~isempty(county_displayID)
set(handles.listboxLeft, 'string', countyName(county_displayID)) ;
end
For building the right listbox:
if ~isempty(county_deleteID)
set(handles.listboxRight, 'string', countyName(county_deleteID)) ;
end
For moving a county from the left to the right listbox:
% Get left listbox index and corresponding county ID.
listboxID_left = get(handles.listboxLeft, 'value')
countyID = county_displayID(listboxID_left) ;
% Remove this element from displayIDs.
county_displayID(listboxID_left) = [] ;
% Add it to deleteIDs.
county_deleteID = [county_deleteID, countyID] ;
In your context, if WellData (I let you add the "handles.") is what I call countyName but with extra columns, you can delete chosen rows just by doing:
WellData(county_deleteID,:) = [] ;
If WellData is unsorted or not unique, you have just one additional step to do using arrays of indices outputted by UNIQUE or SORT to translate county_deleteID back to the original indexing scheme.
Hope it helps,
Cedric
James
James on 19 Jul 2013
Thanks for the help Cedric. Looks like a good solution to me.

Sign in to comment.

More Answers (1)

Iain
Iain on 17 Jul 2013
1. Do the calculations as few times as possible: The following should be in the outer loop. - Unless you want latcounter to be allowed to be the maximum value for a single calculation.
if LatCounter == length(handles.Latitude)
LatCounter = LatCounter - 1;
end
2. Minimize the amount of calcs:
sqrt(x) < y, could be x < y^2 - given that y^2 is a constant, you've removed the sqrt calc from the loop.
3. Vectorise wherever possible.
Your inner loop can be REMOVED and replaced with.
index = ((([handles.WellData{:,2}] - handles.Latitude(LatCounter)) ^ 2 + ([handles.WellData{:,3}] - handles.Longitude(LongCounter)) ^ 2)) < .01067^2;
handles.ScoreSheetBPDOil(LatCounter,LongCounter) = sum([handles.WellData{index,12}]) * WeightingVectorWells(1);
handles.ScoreSheetCumOil(LatCounter,LongCounter) = sum( [handles.WellData{index,13}]) * WeightingVectorWells(4);
3a. Its easier to vectorise if your variables are straightforward numeric arrays (handles.WellData is not) 3b. With thought, you can use elementwise operations and bsxfun, to remove all three loops.

Categories

Find more on Programming in Help Center and File Exchange

Asked:

on 17 Jul 2013

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!