This article has been machine-translated from Chinese. The translation may contain inaccuracies or awkward phrasing. If in doubt, please refer to the original Chinese version.

I. Heap

1. What is a Heap

A Heap is an array object that can be viewed as a complete binary tree, with the following properties:

The value of any node is the maximum/minimum value among all nodes in its subtree (ordering property)
A heap is always a complete binary tree represented by an array.

2. Max-Heap Operation Functions

Definition

typedef struct HeapStruct *MaxHeap;
struct HeapStruct {
 ElementType *Elements;//存储堆元素的数组
 int Size;//当前元素个数
 int Capacity;//最大容量
};

(1) Creating an Empty Max-Heap (Create Function)

MaxHeap Create(int MaxSize) {
 MaxHeap H = malloc(sizeof(struct HeapStruct) );
 H->Elements = malloc( (MaxSize+1) * sizeof(ElementType));//+1是由于我们从下标1开始存储
 H->Size = 0;
 H->Capacity = MaxSize;
 H->Elements[0] = MaxData;//下标0设为"哨兵" 为大于堆中所有可能元素的值，便于之后的操作
 return H;
}

(2) Max-Heap Insertion (Insert Function)

When inserting an element, compare it with its parent node. If the inserted element is larger, swap them, then compare with the parent again, and so on until the inserted element is smaller than its parent.

void Insert(MaxHeap H, ElementType item) {
 int i;
 if(IsFull(H)) {
  printf("最大堆已满");
  return;
 }
 i = ++H->Size;//i指向插入后队中最后一个元素的位置。
 for(; H->Elements[i/2] < item; i /= 2) {//当item的父结点的值小于item值循环才继续
  H->Elements[i] = H->Elements[i/2];//向下过滤结点（）
 }
 H->Elements[i] = item;//将item插入
}

(3) Max-Heap Deletion (Delete Function)

Extract the root node (maximum value) element while deleting the root node from the heap, ensuring the new root is still the maximum value in the heap.

Use the last element in the max-heap as the new root node, and delete the original last element
Check if the left and right children of the root are larger than it; if so, continue filtering down

ElementType DeleteMax(MaxHeap H) {
 int Parent,Child;//父结点，孩子结点
 ElementType MaxItem, temp;
 if(IsEmpty(H) ) {
  printf("最大堆已空");
  return;
 }
 MaxItem = H->Elements[1]; //取出根结点最大值，暂存在MaxItem中
 temp = H->Elements[H->Size--];//存储最后一个元素，然后size--
 for (Parent = 1; Parent*2 <= H->Size; Parent = Child) {
  Child = Parent * 2;//Child此时为Parent的左孩子
  if (Child != H->Size && H->Elements[Child] < H->Elements[Child+1] ) {
   Child++; //当且仅当右孩子存在且其值比左孩子大时，Child变成右孩子的下标
  }
  if (temp >= H->Elements[Child] ) break;//temp找到了应该放的地方
  else //用孩子结点的值取代父结点
   H->Elements[Parent] = H->Elements[Child];
 }
 H->Elements[Parent] = temp;
 return MaxItem;//返回删除前最大值
}

(3) Building a Max-Heap from Existing Elements

Store N existing elements in a one-dimensional array according to max-heap requirements.

Method 1: Through insertion operations, insert N elements one by one into an empty max-heap, with maximum time complexity O(NlogN).
Method 2: Build a max-heap in linear time complexity.
(1) Store N elements in input order, first satisfying the structural property of a complete binary tree
(2) Adjust node positions to satisfy the ordering property of a max-heap

Build heap time complexity O(n), which is the sum of heights of all nodes in the tree. Starting from the last node that has children, it must have a left child. Change the Elements array in the definition to a Data array to store existing elements.

void PercDown(MaxHeap H, int p) {//将H中以H->Data[p]为根的子堆调整为最大堆 原理同删除操作
 int Parent,Child;
 ElementType X;
 X = H->Data[p];//取出根结点值
 for(Parent = p; Parent*2 <= H->Size; Parent = Child) {
  Child = Parent * 2;
  if( Child != H->Size && H->Data[Child] < H->Data[Child+1]) {
   Child++;
  }
  if(X >= H->Data[Child]) break;//找到了合适位置
  else
   H->Data[Parent] = H->Data[Child];
 }
 H->Data[Parent] = X;
}
void BuildHeap(MaxHeap H) {//调整H->Data[]中的元素使其满足最大堆的有序性，此处假设所有H->Size个元素都已存在H->Data[]中
 int i;
 //从最后一个结点的父节点开始，到根结点1
 for(i = H->Size/2; i > 0; i--)
  PercDown(H,i);
}

II. Huffman Tree

1. What is a Huffman Tree

Weighted Path Length (WPL): Given a binary tree with n leaf nodes, each leaf node has weight w_k, and the length from the root to each leaf node is l_k, then the sum of weighted path lengths of all leaf nodes is: WPL = $\sum_{k=1}^n$ w_k l_k. Optimal Binary Tree, also called Huffman Tree: The binary tree with minimum WPL, with the following characteristics:

No nodes of degree 1
A Huffman tree with n leaf nodes has a total of 2n-1 nodes
Swapping left and right subtrees of any non-leaf node in a Huffman tree still yields a Huffman tree
The same set of weights may produce two structurally different Huffman trees

2. Huffman Tree Operations

Constructing a Huffman tree: merge the two binary trees with the smallest weights each time. Main problem: How to select the two smallest? Use a min-heap!

typedef struct TreeNode *HuffmanTree;
struct TreeNode {
 int Weight;
 HuffmanTree Left,Right;
};
HuffmanTree Huffman(MinHeap H) {
 //假设H->Size个权值已经存在了H->Elements[]->Wight里
 int i; HuffmanTree T;
 BuildMinHeap(H);//将H->Elements[]按权值调整为最小堆
 for(i = 1; i < H->Size; i++) {//做H->Size-1次合并
  T = malloc(sizeof(struct TreeNode));//建立新结点
  T->Left = DeleteMin(H);//从最小堆中删除一个结点，作为新T的左子结点
  T->Right = DeleteMin(H);//从最小堆中删除一个结点，作为新T的右子结点
  T->Weight = T->Left->Weight + T->Right->Weight;
  Insert(H,T);//将新T插入最小堆
 }
 T = DeleteMin(H);
 return T;
}

3. Application of Huffman Trees -- Huffman Coding

How to encode so that the total encoding space is minimized? Characters that appear frequently should have shorter encodings, while characters that appear infrequently can have longer encodings, while avoiding ambiguity. Prefix code: No character's encoding is a prefix of another character's encoding, which avoids ambiguity. A binary tree can be constructed for encoding, with left and right branches representing 0 and 1 respectively. When all characters are on leaf nodes, this works.

This is where the Huffman tree can be used!

III. Sets

Regarding sets, the main topic is Union-Find. I've studied this before and this blog explains it brilliantly: Super Adorable Union-Find~ (Original blog is down, repost) So I won't elaborate too much here~

~ Union-Find Template

#include <iostream>
#include <set>
using namespace std;
const int maxn = 1000000;
int fa[maxn];
int ans[maxn];
void init(int n) {
    for (int i = 1; i <= n; i++) {
        fa[i] = i;
    }
}
int find(int x) {//查询+路径压缩 找根节点并将沿途每个结点的父节点都设为根节点
    return x == fa[x]? x : (fa[x] = find(fa[x]));
}
inline void merge(int a, int b) {
    fa[find(a)] = find(b);
}
int main() {
    int m, n, k, x;
    cin >> m >> n >> k;
    x = n*m;
    init(x);
    for (int i = 0; i < k; i++) {
        int a,b;
        cin >> a >> b;
        merge(a, b);
    }
    for(int i = 1; i <= x; i++) {
        ans[find(i)] = 1;
    }
    int cnt = 0;
    for (int i = 1; i <= x; i++) {
        if(ans[i] == 1) cnt++;
    }
    cout << cnt << endl;
    return 0;
}

Data Structure Study Notes <6> Heaps, Huffman Trees, and Union-Find