Numerical packing

27 views (last 30 days)
Sven
Sven on 19 Dec 2011
Commented: Sven on 14 May 2015
Hi all, here's a quick one. I've got an array and a limit:
A = [235 235 235 234 235 235 234 234 234 220 280 215 234]
lim = 1000
I want to group the elements of A so that:
1. The sum of each group's elements is below my lim
2. The group numbers of consecutive elements of A never decrease
3. A minimum number of groups is used. Ie, if a valid solution exists using 3 groups, all solutions using 4 or more groups are inferior.
4. The maximum of the sums within each group is minimized (clarified below)
As an example:
grps = zeros(size(A));
while any(grps==0)
openIdxs = find(grps==0);
mask = cumsum(A(openIdxs))<lim;
grps(openIdxs(mask)) = max(grps)+1;
end
The above returns:
grps =
1 1 1 1 2 2 2 2 3 3 3 3 4
This satisfies (1), (2) and (3), but note that the 4th group has only one element, and the sums of each group are:
grpSums = arrayfun(@(i)sum(A(grps==i)),1:max(grps))
grpSums =
939 938 949 234
Can anyone think of a nice way to "balance" the group sums so that the maximum group sum is minimised? Note that I want to keep the group count minimised (at 4 in this case)... I don't want to just make lots of groups of 1 element.
Bonus votes for simplicity of code over speed and/or exactness of solution... I don't mind if something is close to optimal but not quite there, and I'm only working with tiny arrays similar to the example above.
Clarification:
Point (4) refers to the grpSums variable calculated above. The command max(grpSums) in this example returns 949. If move some elements from group 3 to group 4:
grps = [1 1 1 1 2 2 2 2 3 3 4 4 4]
my grpSums becomes [939 938 734 449], making max(grpSums) give 939. An even better (and perhaps optimal for this case?) solution is:
grps = [1 1 1 2 2 2 3 3 3 3 4 4 4]
which gives grpSums = [705 704 922 729]. Note that it's not so much the range of numbers in grpSums that I care about - it's specifically its maximum.
  8 Comments
Sven
Sven on 19 Dec 2011
Sean I've added this point specifically above. Don't worry, I'm not moving goal posts... that point was part of my original example, just not explicitly put down with the other rules :)
Sean de Wolski
Sean de Wolski on 19 Dec 2011
Oh it's fine, and a great simplification (if what I'm doing makes any sense at all (which it might (probably does) not)).

Sign in to comment.

Accepted Answer

Sven
Sven on 29 Dec 2011
Thank you Sean, Elige, for your responses, and thanks Walter for naming the problem I'd posed. Even after seeing it as a "knapsack" problem, I had difficulty adopting one of the FEX knapsack solutions to this particular problem out of the box, so I also made a solution of my own (below).
Here are some codes, times, and comments:
My solution: (sorry about the line wrapping... I use a wider MATLAB page size than this website)
function grps = orderedMinOfMaxKnapsackSums(A, lim)
% Discover how many knapsacks (or groups) to use
grps = zeros(size(A));
while any(grps==0)
openIdxs = find(grps==0);
mask = cumsum(A(openIdxs))<lim;
grps(openIdxs(mask)) = max(grps)+1;
end
numGrps = max(grps);
% Build all possible indice pairs for each group
maxInds = arrayfun(@(i)find(grps==i,1,'last'),1:numGrps);
grpIndPairs = arrayfun(@(mn,mx)[mn mn; nchoosek(mn:mx,2); mx mx], 1:numGrps, maxInds, 'Un',0);
% Ensure that the first group "starts" at 1, and the last group "ends" at the length of A
grpIndPairs{1}(grpIndPairs{1}(:,1)>1,:) = [];
grpIndPairs{end}(grpIndPairs{end}(:,2)<length(A),:) = [];
% Remove the "possible" indice pairs that over-fill that knapsack
grpIndPairs = cellfun(@(pair)pair(arrayfun(@(fr,to)sum(A(fr:to)), pair(:,1),pair(:,2))<lim, :), grpIndPairs, 'Un',0);
% Reduce the "possible" indice pairs to those with valid adjacent knapsack possibilities. For
% example, if knapsack 1 only holds indices 1:2 or 1:3, knapsack 2 must start at element 3 or 4.
while 1
oldNumels = cellfun(@numel, grpIndPairs);
grpIndPairs(1:end-1) = cellfun(@(this,next)this(ismember(this(:,2), next(:,1)-1),:),...
grpIndPairs(1:end-1), grpIndPairs(2:end), 'Un',0);
grpIndPairs(2:end) = cellfun(@(this,last)this(ismember(this(:,1), last(:,2)+1),:) ,...
grpIndPairs(2:end), grpIndPairs(1:end-1), 'Un',0);
if isequal(oldNumels, cellfun(@numel, grpIndPairs)), break; end
end
% Build the set of all possible combinations from each of the groups
elNumsCell = cellfun(@(x)1:size(x,1), grpIndPairs, 'UniformOutput', false);
grpIndsCell = cell(size(elNumsCell));
[grpIndsCell{:}] = ndgrid(elNumsCell{:});
allCombs = cell2mat(arrayfun(@(i)grpIndPairs{i}(grpIndsCell{i}(:),:),1:numGrps,'Un',0));
% Remove combinations where knapsack elements don't immediately follow the previous knapsack
combsDiff = diff(allCombs,[],2);
allCombs = allCombs(all(combsDiff(:,2:2:end)==1, 2), :);
% Calculate the size of each knapsack for all the combinations that remain
combSums = zeros(size(allCombs,1),numGrps);
for i = 1:numGrps
combSums(:,i) = arrayfun(@(fr,to)sum(A(fr:to)), allCombs(:,2*(i-1)+1), allCombs(:,2*(i-1)+2));
end
% Choose the best combination
[~, bestCombIdx] = min(max(combSums,[],2));
grps = cell2mat(arrayfun(@(i)i*ones(1,1+diff(allCombs(bestCombIdx,2*(i-1)+(1:2)))), 1:numGrps,'Un',0));
end
And here's Elige's solution:
function best_combo = bruteForce(A, lim)
intmax = prod(A);
% WEIGHT
% >0.5 weight STD of sum more heavily
% <0.5 weight difference of lim and MEAN of sum more heavily
weight = 0.5;
best_combo = zeros(1,length(A));
best_fit = intmax;
for i = 1 : length(A)-1
B = nchoosek(1:length(A)-1,i);
for j = 1 : nchoosek(length(A)-1,i)
C = (i+1)*ones(1,length(A));
for k = sort(1 : i,'descend')
C(1:length(A)<=B(j,k))=k;
end
fit = (std(arrayfun(@(k)sum(A(C==k)),1:i+1))*(weight))^2 + ...
((lim-mean(arrayfun(@(k)sum(A(C==k)),1:i+1)))*(1-weight))^2;
if sum(arrayfun(@(k)sum(A(C==k)),1:i+1)>lim) == 0 && fit < best_fit
best_fit = fit;
best_combo = C;
end
end
end
end
Here are some timings:
A = [235 235 235 234 235 235 234 234 234 220 280 215 234]
lim = 1000
tic, for i = 1:10
orderedMinOfMaxKnapsackSums(A, lim);
end, toc
tic, for i = 1:10
bruteForce(A, lim);
end, toc
Elapsed time is 0.053719 seconds.
Elapsed time is 338.265262 seconds.
Elige, I know I stated that I was dealing with small data sets so efficiency wasn't an issue - but I think the brute force method here ran for a bit longer than I was anticipating.
Sean, I don't have the GO toolbox so I can't test out your solution. Can you post a little speed summary here for us?
Walter, I had a bit of trouble fitting my particular problem into one of the FEX knapsack entries. It seems that they had in-built criterion to optimise (whereas my criterion was to minimise the maximum knapsack size), so I went ahead making my own solution as an exercise. Did I miss something obvious that could have saved time?
So anyway Sean, you get the accepted answer... I'm tempted to accept my own but it doesn't meet my own criteria (of short and sweet) and want to at least give you guys the credit for tackling someone else's problem :)
  3 Comments
Sean de Wolski
Sean de Wolski on 30 Dec 2011
For the example above, timing for mine on R2011b 64bit Windows:
Elapsed time is 0.313661 seconds.
You could also speed your's up a little bit by using 'prodofsize' as the function for cellfun. ( http://www.mathworks.com/matlabcentral/answers/10966-is-there-any-process-to-convert-cell-type-array-to-numarical-array Jan's comment and my time test)
(And don't worry about the Answer; at least you didn't delete this question :) and ask it again)
Sean de Wolski
Sean de Wolski on 30 Dec 2011
Ans 0.55 seconds for A = [A;A];
Afetr that it's not obeying constraints.

Sign in to comment.

More Answers (4)

Walter Roberson
Walter Roberson on 19 Dec 2011
After the clarifications, this is a knapsack problem. Knapsack problems are generally NP-complete (exponential time) to solve exactly, but in the restricted case of non-negative integer "weights", one can use Dynamic Programming; the algorithm is hinted at over here

Seth DeLand
Seth DeLand on 10 Mar 2014
Seeing how Optimization Toolbox has a mixed-integer solver in R2014a, I figured I would post this alternate solution that uses intlinprog to solve this problem. If you examine the output structure from intlinprog, the absolute gap is zero, meaning the solver has proven that this is the optimal solution to this problem.
A few things to note:
  1. This finds the same solution found in the brute force approach, with a max group size of 922.
  2. This code runs in a few hundredths of a second, likely much faster than brute force or global optimization methods for this problem.
%%Data
AA = [235 235 235 234 235 235 234 234 234 220 280 215 234];
lim = 1000; % maximum allowable in any group
ng = 4; % number of groups
%%Useful sizes
na = length(AA);
nx = ng*na;
nz = 1;
%%Variables
% There are two types of variables in this formulation:
% x: ng x na matrix of binary variables. x_i,j = 1 if A(i) is in group j.
% z: auxiliary variable used to keep track of maximum size of any group.
% The goal is to minimize z.
%%Objective function coefficients
f = zeros(nx + nz, 1);
f(end) = 1; % Minimize z, which is last in f
%%Sum of each group is less than limit
% A*x <= b
A = zeros(ng,nx+nz);
for ii = 1:na
A(:,ng*(ii-1)+1:ng*ii) = diag(repmat(AA(ii),ng,1));
end
b = repmat(lim,ng,1);
%%Minimax formulation
% Variable z changes the minimization problem into a minimax problem:
% min z s.t. Sum Ax - z <=0 (z is greater than or equal to size of each
% group)
Atmp = A;
Atmp(:,end) = -1;
A = [A; Atmp];
b = [b; zeros(ng,1)];
%%Groups of consecutive elements never increase (linear inequality)
Atmp = zeros(na-1,nx+nz);
for ii = 1:na-1
Atmp(ii,ng*(ii-1)+1:ng*ii) = 1:ng;
Atmp(ii,ng*ii+1:ng*(ii+1)) = -(1:ng)';
end
A = [A; Atmp];
b = [b; zeros(na-1,1)];
%%Equality constraint that assigns each member to exactly 1 group
Aeq = zeros(na,nx+nz);
for ii = 1:na
Aeq(ii,ng*(ii-1)+1:ng*ii) = 1;
end
beq = ones(na,1); % exaclty 1 in each group
%%Bounds
lb = zeros(nx+nz,1); % all variables positive
ub = ones(nx+nz,1); % upper bounds for binary variables
ub(end) = Inf; % no upper bound on z
%%Integer variables
intIdxs = 1:nx; % All x variables are binary integers
%%Solve
[xopt,fval,eflag,output] = intlinprog(f, intIdxs, A, b, Aeq, beq, lb, ub);
% If the problem is infeasible (negative eflag), might need to add more
% bins.
%%Post-process
x = xopt(1:end-1); % Extract x from solution vector
grpMax = xopt(end); % grpMax is z
x = round(reshape(x,ng,na));
optBins = zeros(size(AA)); % assigned bins
for ii = 1:na
optBins(ii) = find(x(:,ii));
end
binVals = A(1:ng,:)*xopt; % grpMax is z
The found solution:
binVals =
705.0000
704.0000
922.0000
729.0000
  1 Comment
Sven
Sven on 14 May 2015
By the way, Seth, this answer is simply excellent. I had encountered a larger problem (same scenario, more groups to the input data) that would hit memory limits on my brute force method but was still just fractions of a second via this integer linprog call.
Thanks!

Sign in to comment.


Dr. Seis
Dr. Seis on 19 Dec 2011
UGLY/BRUTE-FORCE: I assumed you were trying to minimize the standard deviation of your group sum AND the difference between your lim and mean group sum. I also assumed that there would always be at least 2 groups. I check all possible group sizes/combinations. Here goes:
A = [235 235 235 234 235 235 234 234 234 220 280 215 234];
lim = 1000;
intmax = prod(A);
% WEIGHT
% >0.5 weight STD of sum more heavily
% <0.5 weight difference of lim and MEAN of sum more heavily
weight = 0.5;
best_combo = zeros(1,length(A));
best_fit = intmax;
for i = 1 : length(A)-1
B = nchoosek(1:length(A)-1,i);
for j = 1 : nchoosek(length(A)-1,i)
C = (i+1)*ones(1,length(A));
for k = sort(1 : i,'descend')
C(1:length(A)<=B(j,k))=k;
end
fit = (std(arrayfun(@(k)sum(A(C==k)),1:i+1))*(weight))^2 + ...
((lim-mean(arrayfun(@(k)sum(A(C==k)),1:i+1)))*(1-weight))^2;
if sum(arrayfun(@(k)sum(A(C==k)),1:i+1)>lim) == 0 && fit < best_fit
best_fit = fit;
best_combo = C;
end
end
end
grpsum=arrayfun(@(k)sum(A(best_combo==k)),1:max(best_combo));
This results in:
best_combo =
1 1 1 2 2 2 3 3 3 3 4 4 4
grpsum =
705 704 922 729
  2 Comments
Sven
Sven on 19 Dec 2011
Nice one Elige - you get a vote for correctness but I'll hold off on accepting this one as final. Even though it's a small dataset so brute force won't be taxing on resources, I'm still holding out for "neat and tidy". I'm wondering if some accumarray() or diag() trick is lurking beyond my reach that some clever answer can bring closer :)
And a little note that "k=sort(1:i,'descend')" can be simplified as "k=i:-1:1" to avoid a sort.
Dr. Seis
Dr. Seis on 20 Dec 2011
Oooh, I learn a neat trick every day. Thanks!

Sign in to comment.


Sean de Wolski
Sean de Wolski on 20 Dec 2011
Alright, here goes!! accumarray is certainly a part of it, as is ga() the awesome genetic algorithm function. So yes, Global Optimization Toolbox is required :)
main.m:
%12/19/2011
%
%Data
A = [235 235 235 234 235 235 234 234 234 220 280 215 234]'; %column vector (because they're cool)
lim = 1000; %upper limit.
szA = size(A);
% Get bounds and number of variables:
[bounds, n] = getValidGroup(A,lim);
% Engine:
ObjectiveFunction = @(x)binWeight(A,index2groups(x,szA),lim); %get the weight
[x,fval] = ga(ObjectiveFunction,n,[],[],[],[],ones(1,n),bounds(2,:),[],1:n);
index = index2groups(x,szA);
fprintf('Groups: %s \n',num2str(index));
fprintf('Weight is: %i!\n',binWeight(A,index,lim));
binWeight.m:
function weight = binWeight(A,index,lim)
%
%data nx1 vector of values
%index = nx1 vector of groupings
%lim = threshold of group sum
%
groups = accumarray(index,A); %sum bins
groups(groups>lim) = lim*10; %lim is soft constraint with stiff penalty.
weight = max(groups); %worst one
getValidGroup.m:
function [bounds cnt] = getValidGroup(A,lim)
%
%inputs:
%A - nx1 vector of values
%lim - the group limit
%
%outputs:
%bounds - 2x(cnt) lower/upper bounds
%cnt - number of bins
%
cnt = 0;
szA = size(A);
index = zeros(szA);
A2 = A;
while ~isempty(A)
cnt = cnt+1;
index(cnt) = find(cumsum(A)<lim,1,'last');
A(1:index(cnt)) = [];
end
index = index(1:cnt);
index(end) = find(cumsum(flipud(A2))<lim,1,'last');
bounds = 1:cnt; %lower bounds
bounds(2,:) = index; %upper bounds
index2groups.m:
function groups = index2groups(index,szA)
%index is a vector cumulative max values
index = index(:);
index = cumsum([1;index(1:end-1)]);
groups = zeros(szA(1),1);
groups(index) = 1; %min(index,szA(1))
groups = cumsum(groups);
And executing main.m:
Optimization terminated: average change in the penalty fitness value less than options.TolFun and constraint violation is less than options.TolCon. Groups: 1112223333444 Weight is: 922!
I have to say this is one of the most fun/educational/rewarding MATLAB problems I've ever worked on. Thanks!
  2 Comments
Sven
Sven on 20 Dec 2011
@Sean:
1. I agree that column vecs are much cooler than row vecs.
2. Thanks for posting a great answer.
3. I don't have the GOT, but from your answer I can see other problems that I tackle where it may be helpful.
@all: I'm out tomorrow but will select an answer when I get back on a computer.
To give some background, this problem came up because I'm taking a set of ~20 small figure snapshots and trying to stitch them into one "summary" image. My limit of 1000 comes from the max px height that I have to display things. I want my summary to be as compact as possible (laterally first, then vertically) but keep my small figures in order. My current solution as given in the question does a great job of getting the number of columns correct, but I often see the final column of stitched images containing one measly little image. No matter how hard I shake my fist at it, it doesn't change. I'm about to go through around 10,000 of these sets remaking the stitched images so I thought I should ask if anyone knows a nicer way to organise them :)
Walter Roberson
Walter Roberson on 20 Dec 2011
Yup, a knapsack problem ;-)

Sign in to comment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!