Data Structure Study Notes <6> Heaps, Huffman Trees, and Union-Find

发表于 2020-04-07 23:10 1503 字 8 min read

cos avatar

cos

FE / ACG / 手工 / 深色模式强迫症 / INFP / 兴趣广泛养两只猫的老宅女 / remote

文章介绍了堆和哈夫曼树的基本概念与操作,以及并查集的简要说明。堆是一种基于完全二叉树的数组结构,支持插入、删除和建堆操作,最大堆可通过线性时间复杂度构建;哈夫曼树是带权路径长度最小的二叉树,通过最小堆实现节点合并,用于构造最优编码方案,实现高效压缩;并查集部分简要回顾,重点在于路径压缩与按秩合并。

This article has been machine-translated from Chinese. The translation may contain inaccuracies or awkward phrasing. If in doubt, please refer to the original Chinese version.

I. Heap

1. What is a Heap

A Heap is an array object that can be viewed as a complete binary tree, with the following properties:

  • The value of any node is the maximum/minimum value among all nodes in its subtree (ordering property)
  • A heap is always a complete binary tree represented by an array.
    在这里插入图片描述

2. Max-Heap Operation Functions

Definition

typedef struct HeapStruct *MaxHeap;
struct HeapStruct {
 ElementType *Elements;//存储堆元素的数组
 int Size;//当前元素个数
 int Capacity;//最大容量
};

(1) Creating an Empty Max-Heap (Create Function)

MaxHeap Create(int MaxSize) {
 MaxHeap H = malloc(sizeof(struct HeapStruct) );
 H->Elements = malloc( (MaxSize+1) * sizeof(ElementType));//+1是由于我们从下标1开始存储
 H->Size = 0;
 H->Capacity = MaxSize;
 H->Elements[0] = MaxData;//下标0设为"哨兵" 为大于堆中所有可能元素的值,便于之后的操作
 return H;
}

(2) Max-Heap Insertion (Insert Function)

When inserting an element, compare it with its parent node. If the inserted element is larger, swap them, then compare with the parent again, and so on until the inserted element is smaller than its parent.

在这里插入图片描述

void Insert(MaxHeap H, ElementType item) {
 int i;
 if(IsFull(H)) {
  printf("最大堆已满");
  return;
 }
 i = ++H->Size;//i指向插入后队中最后一个元素的位置。
 for(; H->Elements[i/2] < item; i /= 2) {//当item的父结点的值小于item值循环才继续
  H->Elements[i] = H->Elements[i/2];//向下过滤结点()
 }
 H->Elements[i] = item;//将item插入
}

(3) Max-Heap Deletion (Delete Function)

Extract the root node (maximum value) element while deleting the root node from the heap, ensuring the new root is still the maximum value in the heap.

  • Use the last element in the max-heap as the new root node, and delete the original last element
  • Check if the left and right children of the root are larger than it; if so, continue filtering down
ElementType DeleteMax(MaxHeap H) {
 int Parent,Child;//父结点,孩子结点
 ElementType MaxItem, temp;
 if(IsEmpty(H) ) {
  printf("最大堆已空");
  return;
 }
 MaxItem = H->Elements[1]; //取出根结点最大值,暂存在MaxItem中
 temp = H->Elements[H->Size--];//存储最后一个元素,然后size--
 for (Parent = 1; Parent*2 <= H->Size; Parent = Child) {
  Child = Parent * 2;//Child此时为Parent的左孩子
  if (Child != H->Size && H->Elements[Child] < H->Elements[Child+1] ) {
   Child++; //当且仅当右孩子存在且其值比左孩子大时,Child变成右孩子的下标
  }
  if (temp >= H->Elements[Child] ) break;//temp找到了应该放的地方
  else //用孩子结点的值取代父结点
   H->Elements[Parent] = H->Elements[Child];
 }
 H->Elements[Parent] = temp;
 return MaxItem;//返回删除前最大值
}

(3) Building a Max-Heap from Existing Elements

Store N existing elements in a one-dimensional array according to max-heap requirements.

  • Method 1: Through insertion operations, insert N elements one by one into an empty max-heap, with maximum time complexity O(NlogN).
  • Method 2: Build a max-heap in linear time complexity.
  • (1) Store N elements in input order, first satisfying the structural property of a complete binary tree
  • (2) Adjust node positions to satisfy the ordering property of a max-heap

Build heap time complexity O(n), which is the sum of heights of all nodes in the tree. Starting from the last node that has children, it must have a left child. Change the Elements array in the definition to a Data array to store existing elements.

void PercDown(MaxHeap H, int p) {//将H中以H->Data[p]为根的子堆调整为最大堆 原理同删除操作
 int Parent,Child;
 ElementType X;
 X = H->Data[p];//取出根结点值
 for(Parent = p; Parent*2 <= H->Size; Parent = Child) {
  Child = Parent * 2;
  if( Child != H->Size && H->Data[Child] < H->Data[Child+1]) {
   Child++;
  }
  if(X >= H->Data[Child]) break;//找到了合适位置
  else
   H->Data[Parent] = H->Data[Child];
 }
 H->Data[Parent] = X;
}
void BuildHeap(MaxHeap H) {//调整H->Data[]中的元素使其满足最大堆的有序性,此处假设所有H->Size个元素都已存在H->Data[]中
 int i;
 //从最后一个结点的父节点开始,到根结点1
 for(i = H->Size/2; i > 0; i--)
  PercDown(H,i);
}

II. Huffman Tree

1. What is a Huffman Tree

Weighted Path Length (WPL): Given a binary tree with n leaf nodes, each leaf node has weight wk, and the length from the root to each leaf node is lk, then the sum of weighted path lengths of all leaf nodes is: WPL = k=1n\sum_{k=1}^nwk lk. Optimal Binary Tree, also called Huffman Tree: The binary tree with minimum WPL, with the following characteristics:

  • No nodes of degree 1
  • A Huffman tree with n leaf nodes has a total of 2n-1 nodes
  • Swapping left and right subtrees of any non-leaf node in a Huffman tree still yields a Huffman tree
  • The same set of weights may produce two structurally different Huffman trees
    在这里插入图片描述

2. Huffman Tree Operations

Constructing a Huffman tree: merge the two binary trees with the smallest weights each time. Main problem: How to select the two smallest? Use a min-heap!

typedef struct TreeNode *HuffmanTree;
struct TreeNode {
 int Weight;
 HuffmanTree Left,Right;
};
HuffmanTree Huffman(MinHeap H) {
 //假设H->Size个权值已经存在了H->Elements[]->Wight里
 int i; HuffmanTree T;
 BuildMinHeap(H);//将H->Elements[]按权值调整为最小堆
 for(i = 1; i < H->Size; i++) {//做H->Size-1次合并
  T = malloc(sizeof(struct TreeNode));//建立新结点
  T->Left = DeleteMin(H);//从最小堆中删除一个结点,作为新T的左子结点
  T->Right = DeleteMin(H);//从最小堆中删除一个结点,作为新T的右子结点
  T->Weight = T->Left->Weight + T->Right->Weight;
  Insert(H,T);//将新T插入最小堆
 }
 T = DeleteMin(H);
 return T;
}

3. Application of Huffman Trees -- Huffman Coding

How to encode so that the total encoding space is minimized? Characters that appear frequently should have shorter encodings, while characters that appear infrequently can have longer encodings, while avoiding ambiguity. Prefix code: No character's encoding is a prefix of another character's encoding, which avoids ambiguity. A binary tree can be constructed for encoding, with left and right branches representing 0 and 1 respectively. When all characters are on leaf nodes, this works.

在这里插入图片描述
This is where the Huffman tree can be used!

III. Sets

Regarding sets, the main topic is Union-Find. I've studied this before and this blog explains it brilliantly: Super Adorable Union-Find~ (Original blog is down, repost) So I won't elaborate too much here~

在这里插入图片描述
在这里插入图片描述

~ Union-Find Template

#include <iostream>
#include <set>
using namespace std;
const int maxn = 1000000;
int fa[maxn];
int ans[maxn];
void init(int n) {
    for (int i = 1; i <= n; i++) {
        fa[i] = i;
    }
}
int find(int x) {//查询+路径压缩 找根节点并将沿途每个结点的父节点都设为根节点
    return x == fa[x]? x : (fa[x] = find(fa[x]));
}
inline void merge(int a, int b) {
    fa[find(a)] = find(b);
}
int main() {
    int m, n, k, x;
    cin >> m >> n >> k;
    x = n*m;
    init(x);
    for (int i = 0; i < k; i++) {
        int a,b;
        cin >> a >> b;
        merge(a, b);
    }
    for(int i = 1; i <= x; i++) {
        ans[find(i)] = 1;
    }
    int cnt = 0;
    for (int i = 1; i <= x; i++) {
        if(ans[i] == 1) cnt++;
    }
    cout << cnt << endl;
    return 0;
}

喜欢的话,留下你的评论吧~

© 2020 - 2026 cos @cosine
Powered by theme astro-koharu · Inspired by Shoka