How to get the average of a subset of data in a matrix?

I am having some difficulty finding the average of some subsets of data. For example, given the following matrix:
ItemA 7
ItemA 8
ItemA 9
ItemB 10
ItemB 12
ItemB 14
ItemC 8
ItemC 16
ItemC 0
Given the above data, I would like to find the average of item A, B, and C individually. So it should return that Item A the average is 8, Item B the average is 12, and item C the average is 8.
The above data is just an example, the actual data is many more items so I need it to work for an unknown amount of items.
If you have any questions feel free to ask,
Thanks

7 Comments

What form is the data in? Is it a N by 2 cell array, where the first column is the itemX word and the second column is the number?
I think I tried to simplify the data too much. A better representation of my data would be the following:
Item Xcoord YCoord
sample data could be:
ItemA 1 3
ItemA 2 5
ItemA 3 2
ItemB 2 4
ItemB 1 3
ItemC 2 9
ItemC 9 2
Given the above data I am actually wanting to run kmeans on item A, kmeans on item B, and kmeans on item C, but for this question I am mainly just wanting to know how I can run an operation such as averaging, kmeans, etc on each item's data individually. The data is in the format Item# first column, Xcoord second column, Ycoord third column.
And it is an Nx3 cell array, but for the kmeans it takes in a matrix so I would need to have columns 2 and 3 converted to a matrix before k means can use it
I don't understand how kmeans comes into this. For each number in either column 2 or column 3 you know the category (A, B, or C) it falls into. So what is there to do with kmeans? How can you cluster/classify this data when it already is clustered/categorized? You can do the mean of each category - no problem with that - but kmeans???? I don't know how or why that would apply.
I run kmeans and say that there is one cluster. Is that the same thing as just taking the mean of all the x's and the mean of all the y's? I thought that the advantage of using kmeans,1 would be that it would be weighted such that the outliers would not count as much in the average.
If the mean of the x's and the mean of the y's are the same as kmeans of one cluster then I would be able to just use accummarray right?
Also, I think kmeans tries to minimize the Euclidean distances to give a centroid with the minimal distance between all of the data points. I am open for suggestions though

Sign in to comment.

 Accepted Answer

C = {'ItemA' 1 3
'ItemA' 2 5
'ItemA' 3 2
'ItemB' 2 4
'ItemB' 1 3
'ItemC' 2 9
'ItemC' 9 2};
[a,ii,ii] = unique(C(:,1));
[j1,j2] = ndgrid(ii,1:2);
b = cat(1,C{:,2:3});
out = [a, accumarray([j1(:),j2(:)],b,[],@(x){mean(x)})];

1 Comment

That worked out perfectly. Is there a way to do standard deviation in the accumarray function instead of mean? like @stddev or something similar. Thanks a lot for your answer, it was very useful.

Sign in to comment.

More Answers (0)

Categories

Asked:

on 5 Nov 2013

Commented:

on 6 Nov 2013

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!