Disjoint sets

Disjoint sets (or union-find data structure) allow you to group vertices in a graph into connected components. Construction typically takes $O(V)$ time and space. There are two common optimizations that can bring down the time complexity: path halving (for optimizing find) and a weighted union by rank. While time complexity can be as bad as $\Theta(\log V)$, amortized time takes time proportional to the inverse Ackermann function, $O(\alpha(V))$, which is nearly constant time.

While working on a couple of LeetCode problems, problems that used disjoint sets also usually have a depth-first answer as well. Runtimes between the two approaches tend to be similar, but space for dfs can sometimes be $O(E + V)$ while disjoint sets use $O(V)$ space.

class UF(object):
    def __init__(self, size: int):
        self.roots = [i for i in range(size)]
        self.ranks = [1] * size

    def find(self, x: int) -> int:
        rx = self.roots[x]
        if x == rx:
            return x

        self.roots[x] = self.find(rx)
        return self.roots[x]

    def union(self, x: int, y: int):
        rx, ry = self.find(rx), self.find(ry)

        if rx == ry:
            return

        rrx, rry = self.ranks[x], self.ranks[y]
        if rrx >= rry:
            self.roots[y] = rx
            self.ranks[x] += 1
        else:
            self.roots[x] = ry
            self.ranks[y] += 1

Boruvka’s Algorithm

Originally derived in order to efficiently layout power lines, Boruvka’s algorithm calculates a minimum spanning tree in an edge-weighted, undirected graph. Edge weights can be positive, zero, or negative and edge weights can be duplicated. There can be cycles and self-loops. If the graph is not connected, the algorithm will calculate a minimum spanning forest.

The basic idea is to start with each vertex unconnected and then gradually connect vertices using the minimum weight edge between each “little forest”. Boruvka’s runs in $O(E \, logV)$ time and uses $O(V)$ extra space.

public class BoruvkaMST {
    private Bag<Edge> mst = new Bag<Edge>();
    private double weight;

    public BoruvkaMST(EdgeWeightedGraph G) {
        UF uf = new UF(G.V());

        // repeat at most log V times or until we have V-1 edges
        for (int t = 1; t < G.V() && mst.size() < G.V() - 1; t = t + t) {
            Edge[] closest = new Edge[G.V()];

            // if edge weights are equal, we pick whichever edge is first in G.edges()
            for (Edge e: G.edges()) {
                int v = e.either(), w = e.other(v);
                int i = uf.find(v), j = uf.find(w);
                if (i == j) continue; // same tree
                if (closest[i] == null || less(e, closest[i])) closest[i] = e;
                if (closest[j] == null || less(e, closest[j])) closest[j] = e;
            }

            // add newly discovered edges to MST
            // (Why an indexed for loop? Can't we use in iterator here?)
            for (int i; i < G.V(); i++) {
                Edge e = closest[i];
                if (e != null) {
                    int v = e.either(), w = e.other(v);
                    // don't add the same edge twice
                    if (uf.find(v) != uf.find(w)) {
                        mst.add(e);
                        weight += e.weight();
                        uf.union(v, w);
                    }
                }
            }
        }
    }
}