Minimum Spanning Tree - Path or Start-End

12 views (last 30 days)
Hello,
I computed the minimum spanning tree of 3D points and now want to extract the path (i.e. node per node) of the whole MST.
[EDIT]
I did this with the following code:
% Delaunay Triangulation
DT = delaunayTriangulation(pts); % pts: [nx3] Input points
e = DT.edges;
% Distances of the edges for weights
plist1 = pts(e(:,1),:);
plist2 = pts(e(:,2),:);
diffs = plist2-plist1;
dists = vecnorm(diffs');
% create Graph
G = graph(e(:,1), e(:,2), dists);
% Create Minimum spanning tree
[mst, pred] = minspantree(G);
I totally forgot to describe my very special input data. It is data sampled from a rail-bound measurement system (3D Positions), so the MST is almost a perfect path with few exceptions.
The predecessor nodes vector doesnt seem to fit my needs. I would expect the points to be topologically sorted, but this is not the case if I sort the input points using this vector.
Basically, it would be enough for me to just determine the "start" and "end"-nodes of the tree, just like the built-in plot-function does. Then, I could compute the shortest path, which would give me a node list. In other words, I want to extract [19578, 2207] from the mst.
Here is a picture of the plot-function result. If the plot function can do this almost instantly, there must be a way I can do it.
Thank you!
[EDIT]
I came up with this solution for now. It does not take too long, but longer than the plot function, which finds the same node-pair as "start" and "end", so I thought there might be a "trick".
function idx = sortMST(pts)
% function to build the mst like shown above
[mst,~] = buildMST(pts);
% possible endpoints
d1nodes = find(degree(mst)==1);
% the node-pair with the maximum distance is the one I am looking for
dists = distances(mst, d1nodes, d1nodes);
maximum = max(max(dists));
[i,~]=find(dists==maximum);
% Breadth-first graph search, to connect all nodes and find the path
idx = bfsearch(mst,d1nodes(i));
end

Accepted Answer

Christine Tobler
Christine Tobler on 17 Jun 2021
In general, you can't rely on the fact that a minimum spanning tree will just be one path. However, if you have a graph for which that's the case, you can find the two leaf nodes between which the path lies as follows:
rng default;
G = graph(sprandn(10, 10, 0.6), 'upper')
G =
graph with properties: Edges: [22×2 table] Nodes: [10×0 table]
T = minspantree(G);
% Plot G, highlight nodes and edges that are part of T
p = plot(G);
highlight(p, T);
% Compute leaf nodes of T (nodes that only have one edge)
leafNodes = find(degree(T) == 1)
leafNodes = 4×1
1 2 6 8
% If T were just one long line, leafNodes would have two elements,
% and you could use
% shortestpath(t, leafNodes(1), leafNodes(2));
% to find that path
  5 Comments
Christine Tobler
Christine Tobler on 17 Jun 2021
Ah, I see, so then there are branches in that MST graph, just too small to see probably (that is, some nodes in MST have degree greather than 2).
You could try to use the data from the plot, the plot object has XData and YData of those coordinates available. But that algorithms just promises to give a good approximation of the structure, there's no guarantee that the extremes of XData / YData will contain the nodes that are furthest apart.
It's also possible to write up an iterative algorithm to avoid computing distances between all nodes, but it would get quite involved, so not sure this would be worthwhile, and I haven't tried coding any of the following. Proceed with caution.
As a first step, you'd choose one node as the root, and use shortestpathtree to compute a digraph starting at that root node.
Then, move upwards through the nodes, starting at the leaf nodes (reverse order of bfsearch or dfsearch from root).
For each node, keep track of what is the distance to the leaf node below it that's furthest away, and of the distance to that node. Also, globally we want to keep track of the maximum distance between two leaf nodes found as of now, and of those leaf nodes.
When a branch is reached (node with more than one successor), you can sum up the distance between the two successor nodes with the furthest distances from their leaf nodes. If that's larger than the current maximally distant pair of leaf nodes, update that.
When the top of the tree is reached, that global variable will contain the most distant pair of leaf nodes.
It occurs to me we'd probably need to make sure that the root node is not a leaf node in this algorithm.
Gern
Gern on 18 Jun 2021
Thank you for your detailed answer! I will mark this as solved for now but I will keep your idea of the iterative algorithm in mind, in case the distance computation gets to costly. Right now, at 50000 input points, there are only 200 leaf nodes, so its runs quite fast.

Sign in to comment.

More Answers (1)

Steven Lord
Steven Lord on 17 Jun 2021
If the plot function can do this almost instantly
The plot function plays "connect the dots". The start and end of the line to be drawn are the first and last points in the vector you pass into it, so there's no "determining" involved. [Well, if you're plotting a graph or digraph that does try to determine where the nodes should be placed using the various layout functions described in the documentation for the 'Layout' input to the graph and digraph plot functions, but I think that's not really what you meant.]
How are you computing the minimum spanning tree? Did you create a graph or digraph object and call minspantree on it?
What would you consider the "start" and "end" of the minimum spanning tree for the Bucky graph?
g = graph(bucky);
m = minspantree(g);
h = plot(g);
hold on
plot(m, 'XData', h.XData, 'YData', h.YData, 'EdgeColor', 'r', 'LineWidth', 4)
From visual inspection nodes 42, 58, 30, 47, 59, or 57 are all potential candidates as are many other nodes. Where would you say this tree starts and ends?
  1 Comment
Gern
Gern on 17 Jun 2021
Edited: Gern on 17 Jun 2021
Thank you for your answer! I updated my question with details. The plot function (or the layout manager) seems to sort the points somehow, because the order, or the start and end nodes are always exactly the nodes I am looking for. Unfortunately, these points do not simply correspond to the first and last line of the MST output. I have found a solution to my problem, but it is not really optimal. Maybe it is also not optimally possible, but it confuses me that the plot-function somehow manages it.

Sign in to comment.

Categories

Find more on Graph and Network Algorithms in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!