How can I match similar events between 2 matrices?

3 views (last 30 days)
So I have two matrices each with 5 columns of data. Both contain latitude, longitude, depth, time, and magnitude values. Matrix A has around 30,000 events or rows (each event is represented by lat,lon,dep,time, and mag)and matrix B has around 50,000 events. Both datasets represent the same sequence of earthquake data, but matrix B was created with less stringent error parameters and thus more events (earthquakes) were located and included in that matrix. So the 30,000 events in matrix A are also in matrix B along with ~20,000 others.
I need to match the earthquakes from each catalog. That is, an earthquake will have a unique lat, lon, depth, and time. I need to find the events in each catalog that have the same location and time and call those the same event. Now of course earthquakes can happen simultaneously so matching times alone won't cut it. I will need to match times (with some small amount of error) and locations to confidently say the events are the same.
Before I delve much deeper...Any suggestions on how to implement this? I have some working code that is slow, so I'm looking to optimize my solution.
I basically need to calculate distances between each lat,lon,depth in matrix A and each lat,lon,depth in matrix B. An event in one catalog should basically have the same location in the other catalog. There may be some small discrepancies but anything within a few meters is likely the same event. Right now, I'm using a nearest neighbor search to find distances between all the locations in one matrix from the other.
  3 Comments
per isakson
per isakson on 14 Dec 2016
Edited: per isakson on 14 Dec 2016
How important is speed?
There is an old trick (by John D'Errico, I think):
  • Create new matrices of whole numbers by converting one column at a time round(A(:,jj)/tol). This allows for different tolerance values for different columns.
  • Search matches with intersect(...,'rows'), or ismember(?)
psprinks
psprinks on 15 Dec 2016
Edited: psprinks on 15 Dec 2016
So I'm not having much luck using ismembertol. Well....I don't trust the results yet. I think it's because I'm not using to tool correctly. Or I need another tool.
Here is some code and the data file.
P_set and D_set are the matrices with the data columns(lat,lon,depth,time,magnitude). The times are in Matdays. P_set (29399 events) is smaller than D_set (40848 events), but both represent the same earthquakes. There are just extras in D_set. I need to find the events that are common to both matrices. I need to do this by matching locations and times. I have some tolerance values set for time and location, but they could be wrong.
ttol=0.000001; loctol=0.000308;%set time and location tolerances
[isinB, rowinB]=ismembertol(P_set(:,4),D_set(:,4),ttol); %search by times
temp=find(isinB ==1);
[isinB1, rowinB1]=ismembertol(P_set(:,1:3),D_set(:,1:3),loctol,'ByRows',true);%search by locations
temp1=find(isinB1 ==1);

Sign in to comment.

Accepted Answer

Guillaume
Guillaume on 14 Dec 2016
If I understood correctly, all you need is one line of code using ismembertol:
[isinB, rowinB] = ismembertol(A, B, 'ByRows', true)
You can specify a tolerance and a 'DataScale' vector to vary the amplitude of the tolerance for each column.
  5 Comments
Guillaume
Guillaume on 14 Dec 2016
There is absolutely no reason for the inputs to ismembertol (and ismember) to be the same length. It simply tells you which rows (with the 'rows' / |'ByRows' option) of the first input are found somewhere in the second input.
The link to the documentation of ismembertol is in my answer. As it says at the end, Introduced in R2015a.
You need the tol version since you don't want exact comparison. Time to upgrade? Replicating the full behaviour of ismembertol particularly with the 'ByRows' option is not going to be trivial.
Here's an attempt that loses the automatic tolerance, magnitude scaling and other niceties:
function [isfound, where] = ismembertolbyrow(A, B, tol)
%A, B: two matrices with the same number of columns
%tol: a vector with the same number of columns as A and B
%tol is absolute. u and v are within range if abs(u-v) < tol
validateattributes(A, {'numeric'}, {'2d'});
validateattributes(B, {'numeric'}, {'2d', 'ncols', size(A, 2)});
validateattributes(tol, {'numeric'}, {'positive', 'row', 'numel', size(A, 2)});
intol = squeeze(all(abs(bsxfun(@minus, A, permute(B, [3 2 1]))) <= tol, 2));
isfound = any(intol, 2);
where = zeros(size(isfound));
[r, c] = find(intol);
where(r) = c;
end
psprinks
psprinks on 14 Dec 2016
Ok, thank you Guillaume. I am requesting the upgrade to 2016b from my university now. Waiting on the download link.

Sign in to comment.

More Answers (0)

Categories

Find more on Cell Arrays in Help Center and File Exchange

Products

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!