Disjoint sets
Disjoint sets (or union-find data structure) allow you to group vertices in a graph into connected components. Construction typically takes $O(V)$ time and space. There are two common optimizations that can bring down the time complexity: path halving (for optimizing find) and a weighted union by rank. While time complexity can be as bad as $\Theta(\log V)$, amortized time takes time proportional to the inverse Ackermann function, $O(\alpha(V))$, which is nearly constant time.
While working on a couple of LeetCode problems, problems that used disjoint sets also usually have a depth-first answer as well. Runtimes between the two approaches tend to be similar, but space for dfs can sometimes be $O(E + V)$ while disjoint sets use $O(V)$ space.
class UF(object):
def __init__(self, size: int):
self.roots = [i for i in range(size)]
self.ranks = [1] * size
def find(self, x: int) -> int:
rx = self.roots[x]
if x == rx:
return x
self.roots[x] = self.find(rx)
return self.roots[x]
def union(self, x: int, y: int):
rx, ry = self.find(rx), self.find(ry)
if rx == ry:
return
rrx, rry = self.ranks[x], self.ranks[y]
if rrx >= rry:
self.roots[y] = rx
self.ranks[x] += 1
else:
self.roots[x] = ry
self.ranks[y] += 1
Boruvka’s Algorithm
Originally derived in order to efficiently layout power lines, Boruvka’s algorithm calculates a minimum spanning tree in an edge-weighted, undirected graph. Edge weights can be positive, zero, or negative and edge weights can be duplicated. There can be cycles and self-loops. If the graph is not connected, the algorithm will calculate a minimum spanning forest.
The basic idea is to start with each vertex unconnected and then gradually connect vertices using the minimum weight edge between each “little forest”. Boruvka’s runs in $O(E \, logV)$ time and uses $O(V)$ extra space.
public class BoruvkaMST {
private Bag<Edge> mst = new Bag<Edge>();
private double weight;
public BoruvkaMST(EdgeWeightedGraph G) {
UF uf = new UF(G.V());
// repeat at most log V times or until we have V-1 edges
for (int t = 1; t < G.V() && mst.size() < G.V() - 1; t = t + t) {
Edge[] closest = new Edge[G.V()];
// if edge weights are equal, we pick whichever edge is first in G.edges()
for (Edge e: G.edges()) {
int v = e.either(), w = e.other(v);
int i = uf.find(v), j = uf.find(w);
if (i == j) continue; // same tree
if (closest[i] == null || less(e, closest[i])) closest[i] = e;
if (closest[j] == null || less(e, closest[j])) closest[j] = e;
}
// add newly discovered edges to MST
// (Why an indexed for loop? Can't we use in iterator here?)
for (int i; i < G.V(); i++) {
Edge e = closest[i];
if (e != null) {
int v = e.either(), w = e.other(v);
// don't add the same edge twice
if (uf.find(v) != uf.find(w)) {
mst.add(e);
weight += e.weight();
uf.union(v, w);
}
}
}
}
}
}