This article has been machine-translated from Chinese. The translation may contain inaccuracies or awkward phrasing. If in doubt, please refer to the original Chinese version.

During last weekend’s Blue Bridge Cup provincial mock contest, the last problem was a minimum spanning tree problem. Since I had just covered this topic on MOOC, I learned Prim’s algorithm on the spot and solved the problem (roughly). After the contest, I decided to study this type of problem more thoroughly. Problem link The minimum spanning tree problem refers to: given an undirected graph G=(V,E), a tree that connects all vertices in G and whose edge set is a subset of E is called a spanning tree of G, and the spanning tree with minimum total weight is called the Minimum Spanning Tree (MST).

I. Kruskal’s Algorithm

Kruskal’s algorithm is easy to implement and very efficient. The Purple Book p356 describes it as follows:

The first step of Kruskal’s algorithm is to sort all edges in ascending order of weight, which can be done directly using library functions qsort or sort. Then examine each edge (u, v) from smallest to largest weight.

Case 1: u and v are in the same connected component, so adding (u, v) would form a cycle, therefore it cannot be selected.
Case 2: If u and v are in different connected components, then adding (u, v) is certainly optimal. Why? We prove by contradiction — if not adding this edge yields an optimal solution T, then T+(u, v) must have exactly one cycle, and at least one edge (u’, v’) in the cycle has weight greater than or equal to that of (u, v). Removing that edge yields a new tree T’ = T + (u, v) - (u’, v’) that is no worse than T. Therefore, adding (u, v) is never worse than not adding it.

Below is the pseudocode:

把所有边(按权值)排序，记第i小的边为e[i](1 <= i < m)
初始化MST为空
初始化连通分量，让每个点自成一个独立的连通分量
for(int i = 0; i < m; i++) {
 if(e[i].u 和 e[i].v 不在同一个连通分量) {
  把边e[i]加入MST
  合并e[i].u和e[i].v所在的连通分量
 }
}

In the pseudocode above, the key part is “querying and merging connected components”: we need to know whether any two vertices are in the same connected component, and also need to merge two connected components. There is a concise and efficient algorithm for this problem — Union-Find Reference: Super Adorable Union-Find~ Each connected component is viewed as a set, containing all vertices in that connected component. These vertices are pairwise connected, and the specific way of connection doesn’t matter. In the graph, each vertex belongs to exactly one connected component, which in set representation means each element belongs to exactly one set. In other words, all connected components of a graph can be represented by several disjoint sets. Here’s my Union-Find template:

#include <iostream>
using namespace std;
#define div 1000000007
typedef long long ll;
const int maxn = 1001;
int fa[maxn];//
inline void init(int n) {
    for (int i = 1; i <= n; i++)
        fa[i] = i;
}
int find(int x) {//查询+路径压缩 把沿途的每个节点的父节点都设为根节点
    return x == fa[x] ? x : (fa[x] = find(fa[x]));
}
inline void merge(int i, int j) {
    fa[find(i)] = find(j);//前者的父节点设为后者
}

With Union-Find, the complete code for Kruskal’s algorithm is straightforward. Assume the two endpoints and weight of the i-th edge are stored in u[i], v[i], and w[i] respectively, and the index of the i-th smallest edge after sorting is stored in r[i] (indirect sorting, meaning the sorting key is the “index” of the object rather than the object itself), p is the Union-Find array.

Complete Code

int cmp(const int i, const int j) { return w[i] < w[j]; } //间接排序函数
int find(int x) { return p[x] == x ? x : p[x] = find(p[x])} //并查集的find
int Kruskal() {
 int ans = 0;
 for(int i = 0; i < n; i++) p[i] = i; //初始化并查集
 for(int i = 0; i < m; i++) r[i] = i;
 sort(r,r+m,cmp);
 for(int i = 0; i < m; i++) {
  int e = r[i];
  int x = find(u[e]);//找出一个端点所在集合编号
  int y = find(v[e]);//找出另一个端点所在集合编号
  if(x != y) {  //若在不同集合，合并
   ans += w[e];
   p[x] = y;
  }
 }
 return ans;
}

Practice Problems

Problem ID	Problem Name (English)	Notes
hdu1863	畅通工程	MST template problem
hdu1879	继续畅通工程	MST template problem
hdu1875	畅通工程再续	MST template problem
洛谷 P3366	【模板】最小生成树	MST template problem

1. hdu1863 畅通工程

Problem Description The provincial government’s “Connectivity Project” aims to make any two villages in the province reachable by road (not necessarily directly connected, as long as they can be reached indirectly through roads). After investigation and assessment, the statistical table lists the costs of several possible roads. Please write a program to calculate the minimum cost needed to make the entire province connected. Input The test input contains several test cases. The first line of each test case gives the number of evaluated roads N and the number of villages M (< 100); the following N lines correspond to road costs between villages, each giving a pair of positive integers representing two village numbers and the road cost between them (also a positive integer). For simplicity, villages are numbered from 1 to M. When N is 0, the input ends, and no corresponding result should be output. Output For each test case, output the minimum cost needed for full connectivity in one line. If the data is insufficient to guarantee connectivity, output ”?“.

#include <iostream>
#include <algorithm>
using namespace std;
const int maxn = 105;
int N,M,cnt;
int u[maxn],v[maxn],w[maxn];
int r[maxn],fa[maxn];
bool cmp(const int r1, const int r2) {
    return w[r1] < w[r2];
}
int find(int x) {
    return fa[x] == x ? x : fa[x] = find(fa[x]);
}
int Kruskal() {
    int ans = 0;
    for(int i = 0; i < M; i++) fa[i] = i;//初始化并查集，让每个点自成一个连通分量
    for(int i = 0; i < N; i++) r[i] = i;//存储边序号
    sort(r, r+N, cmp);//将边从小到大按权值排序
    for(int i = 0; i < N; i++) {
        int e = r[i];
        int x = find(u[e]);
        int y = find(v[e]);
        if(x != y) {
            ans += w[e];
            cnt++;
            fa[x] = y;
        }
    }
    return ans;
}
int main() {
    while(cin >> N >> M) {
        cnt = 0;
        if(N == 0) break;
        for(int i = 0; i < N; i++) {
            cin >> u[i] >> v[i] >> w[i];
        }
        int ans = Kruskal();
        if(cnt != M-1) cout << "?" << endl;
        else cout << ans << endl;
    }
    return 0;
}

2.hdu1879 继续畅通工程

Compared to the previous problem, this one simply sets the cost of already-built roads to 0. Also note that this problem seems to require scanf/printf instead of cin/cout, as the latter causes TLE.

Problem Description The provincial government’s “Connectivity Project” aims to make any two villages in the province reachable by road (not necessarily directly connected, as long as they can be reached indirectly through roads). Now we have the statistics table of city roads, listing the construction cost between any two cities and whether the road has already been built. Please write a program to calculate the minimum cost needed for full provincial connectivity. Input The test input contains several test cases. The first line of each test case gives the number of villages N (1< N < 100); the following N(N-1)/2 lines correspond to road costs and construction status between villages, each giving 4 positive integers: two village numbers (numbered from 1 to N), the road cost between them, and construction status: 1 means built, 0 means not built. When N is 0, input ends. Output Output one line per test case with the minimum cost needed for full provincial connectivity.

#include <iostream>
#include <algorithm>
using namespace std;
const int maxn = 5008;
int fa[101];
int find(int x) {
    return fa[x] == x ? x : fa[x] = find(fa[x]);
}
struct Edge {
public:
    int u, v, w;
    bool operator<(const Edge &a)const{
        return w < a.w;
    }
}E[maxn];

int main() {
    int N;
    while(scanf("%d", &N) != EOF) {
        int ans = 0;
        if(N == 0) break;
        int M = N*(N-1)/2;
        for(int i = 1; i <= M; ++i) {
            int flag;
            scanf("%d%d%d%d", &E[i].u, &E[i].v, &E[i].w, &flag);
            if(flag) E[i].w = 0;
        }
        for(int i = 1; i <= N; ++i) fa[i] = i;//初始化并查集，让每个点自成一个连通分量
        sort(E+1,E+1+M);//将边从小到大按权值排序
        for(int i = 1; i <= M; ++i) {
            int x = find(E[i].u);
            int y = find(E[i].v);
            if(x != y) {
                fa[x] = y;
                ans += E[i].w;
            }
        }
        printf("%d\n",ans);
    }
    return 0;
}

3.hdu1875 畅通工程再续

First input all vertices, then determine whether each pair of vertices can form a valid edge.

Problem Description You’ve probably heard of “Hundred Islands Lake.” The residents of Hundred Islands Lake live on different small islands, and when they want to travel to other islands, they have to row small boats. Now the government has decided to develop Hundred Islands Lake, and the first problem to solve is transportation. The government has decided to achieve full connectivity for Hundred Islands Lake! After thorough investigation, team RPRush decided to build bridges between qualifying islands. The qualifying condition is that the distance between two islands cannot be less than 10 meters or more than 1000 meters. Of course, to save money, only connectivity between any two islands is required. The bridge costs 100 yuan/meter. Input Input includes multiple test cases. First is an integer T (T <= 200) representing T test cases. Each test case starts with an integer C (C <= 100) representing the number of islands, followed by C pairs of coordinates representing each island’s position. These coordinates are integers where 0 <= x, y <= 1000. Output Each test case outputs one line representing the minimum bridge construction cost, rounded to one decimal place. If the project cannot achieve full connectivity, output “oh!“.

#include <iostream>
#include <cstdio>
#include <algorithm>
#include <cmath>
using namespace std;
const int maxn = 5008;
typedef long long ll;
struct Point {
    int x,y;
} V[101];
struct Edge {
public:
    int u, v;
    double w;
    bool operator<(const Edge &a)const{
        return w < a.w;
    }
}E[maxn];
int fa[105];
int find(int x) {
    return fa[x] == -1 ? x : fa[x] = find(fa[x]);
}
int main() {
    int T;
    scanf("%d", &T);
    while(T--) {
        int N;
        scanf("%d", &N);
        for(int i = 0; i < N; ++i) {
            scanf("%d%d", &V[i].x, &V[i].y);
        }
        int k = 0;
        for(int i = 0; i < N-1; ++i) {
            for(int j = i+1; j < N; ++j) {
                double d = sqrt( (V[i].x-V[j].x)*(V[i].x-V[j].x) +
                (V[i].y-V[j].y)*(V[i].y-V[j].y) );
                if (d <= 1000 && d >= 10) {
                    E[k].u = i;
                    E[k].v = j;
                    E[k].w = d*100;
                    k++;
                }
            }
        }
        for(int i = 0; i < N; ++i) fa[i] = -1;//初始化并查集，让每个点自成一个连通分量
        sort(E,E+k);//将边从小到大按权值排序
        double ans = 0;
        int cnt = 0;
        for(int i = 0; i < k; ++i) {
            int x = find(E[i].u);
            int y = find(E[i].v);
            if(x != y) {
                fa[x] = y;
                ans += E[i].w;
                cnt++;
            }
        }
        if (cnt == N-1) printf("%.1f\n",ans);
        else printf("oh!\n");
    }
    return 0;
}

4.洛谷 P3366【模板】最小生成树

Template problem.

#include <iostream>
#include <algorithm>
using namespace std;
const int maxm = 200000;
int fa[5005];//最大顶点数5000
int N,M,cnt;//顶点数 边数
struct edge {
    int u,v,w;
    bool operator<(const edge& a) const {
        return w < a.w;
    }
} E[maxm];//最大边数maxn
int find(int x) {
    return fa[x] == -1 ? x : fa[x] = find(fa[x]);
}
int Kruskal() {
    for(int i = 0; i < N; ++i) fa[i] = -1;
    int ans = 0;
    sort(E,E+M);
    for(int i = 0; i < M; ++i) {
        int x = find(E[i].u);
        int y = find(E[i].v);
        if(x != y) {
            fa[x] = y;
            ans += E[i].w;
            cnt++;
        }
    }
    if(cnt == N-1) return ans;
    else return -1;
}
int main() {
    scanf("%d%d", &N, &M);
    for(int i = 0; i < M; ++i) {
        scanf("%d%d%d", &E[i].u, &E[i].v,&E[i].w);
    }
    int ans = Kruskal();
    if(ans == -1) printf("orz\n");
    else printf("%d\n",ans);
    return 0;
}

II. Prim’s Algorithm

From Professor Chen Yue’s Data Structures The basic idea of Prim’s algorithm is to start from a root node and let a small tree grow. It is similar to Dijkstra’s algorithm. Dijkstra’s algorithm pseudocode:

void Dijkstra(Vertex s) {
 while(1) {
  V = 未收录顶点中dist最小者
  if (这样的V不存在)
   break;
  collected[V] = true;//将V收录
  for(V的每个邻接点W) {
   if(collected[W] == false && dist[V] + E(V,W) < dist[W]) {
    dist[W] = dist[V]+E<V,W>;
    path[W] = V;
   }
  }
}

Prim’s algorithm pseudocode:

void Prim(Vertex s) {
 while(1) {
  V = 未收录顶点中dist最小者
  if (这样的V不存在)
   break;
  将V收录进MST
  for(V的每个邻接点W) {
   if(W未被收录 && E(V,W) < dist[W]) {
    dist[W] = E<V,W>;
    parent[W] = V;
   }
  }
}

Prim’s Algorithm Complete Code

double edge[maxn][maxn];//存边的权值~
int vis[maxn];//该点是否已访问过
double dist[maxn];
double ans;
double Prim() {
    for(int i = 2; i <= n; i++) {
        dist[i] = edge[1][i];//初始化dist为根结点1到所有点的距离
    }
    vis[1] = 1;////收录初始点1 vis[i] == 0表示还未被收录
    for(int i = 2; i <= n; i++) {
        int min = INF;
        int v = -1;
        for(int j = 2; j <= n; j++) {//找v————未收录顶点中dist最小者
            if(!vis[j] && dist[j] < min) {
                min = dist[j];
                v = j;
            }
        }
        if(v != -1) {//找到了v~收入MST
            vis[v] = 1;
            ans += dist[v];
            for(int j = 2;j <= n; j++) {//更新距离dist
                if(!vis[j] && edge[v][j] < dist[j]) {//当这点未被访问且到任意一点的距离比现在到树的距离小就更新
                    dist[j] = edge[v][j];
                }
            }
        }
    }
    return ans;
}

Graph Theory -- Solving the Minimum Spanning Tree Problem (Kruskal's Algorithm & Prim's Algorithm)