Data Structure Study Notes <8> Sorting

发表于 2020-08-26 19:32 2879 字 15 min read

cos avatar

cos

FE / ACG / 手工 / 深色模式强迫症 / INFP / 兴趣广泛养两只猫的老宅女 / remote

文章系统介绍了图的拓扑排序与简单排序两大主题。首先讲解了AOV网络中的拓扑排序原理,强调通过入度为0的顶点进行排序,并结合关键路径问题介绍AOE网络中的最早完成时间、最晚完成时间和机动时间的计算方法;随后详细分析了多种排序算法,包括冒泡、插入、选择、堆、归并、快速和基数排序,对比了它们的时间复杂度、稳定性与适用场景,最后总结了不同算法在实际应用中的优劣与选择依据。

This article has been machine-translated from Chinese. The translation may contain inaccuracies or awkward phrasing. If in doubt, please refer to the original Chinese version.

I. Topological Sort

1. Concept Definition

AOV Network

For example, suppose a computer science student must complete all the courses listed in Figure 3-4. The figure clearly shows the prerequisite and subsequent relationships among the courses. For instance, course C5 has C2 as its prerequisite and C4 and C6 as its subsequent courses. Typically, a directed graph where vertices represent activities and edges represent precedence relationships among activities is called an Activity On Vertex network (AOV network).

Computer Science

Topological Order, DAG

  • If there is a directed path from V to W in the graph, then V must be ordered before W. A vertex sequence satisfying this condition is called a topological order.
  • The process of obtaining a topological order is called topological sorting.
  • If an AOV has a valid topological order, it must be a Directed Acyclic Graph (DAG).

2. Topological Sort Approach

The idea of topological sort is to repeatedly find a vertex with in-degree 0, output it, and decrement the in-degrees of all its adjacent vertices by 1. As we can see, finding vertices with in-degree 0 is the key. If we have to traverse every time, it would consume a lot of time and space. A smarter approach is to immediately place vertices whose in-degree becomes 0 into a container. Pseudocode:

void TopSort() {
 int cnt = 0;
    for(each vertex V in the graph)
  if( Indegree[W] == 0)
   Enqueue(V,Q);
 while(!isEmpty(Q)) {
  V = Dequeue(Q);
  Output V, or record V's output sequence number, cnt++;
        for(each adjacent vertex W of V)
         if(--Indegree[W] == 0)
             Enqueue(V,Q);
 }
 if(cnt != |V|)
  Error("The graph has a cycle");
}

Template code:

const int maxn = 1005;
int N,M;//Number of vertices, number of edges (activities)
int edge[maxn][maxn];
int mint[maxn];//Earliest completion time for each activity checkpoint
int In[maxn];//In-degree of each activity checkpoint
void init() {
    memset(edge, -1, sizeof(edge));
    memset(mint, 0, sizeof(mint));
    memset(In, 0, sizeof(In));
}
bool Topsort() {//Topological sort
    queue<int> q;
    for(int i = 0; i < N; ++i) {
        if(In[i] == 0)
            q.push(i);
    }
    int cnt = 0;
    while(!q.empty()) {
        int v = q.front();
        q.pop();
        cnt++;
        for(int i = 0; i < N; ++i) {
            if(v == i || edge[v][i] == -1) continue;//Check all edges starting from v
            In[i]--;
            //Other operations
            if(In[i] == 0) q.push(i);
        }
    }
    if(cnt != N) return false;
    else return true;
}

Example Problem

08-Graph 8 How Long Does It Take (25 points) This is a variation of topological sort. The program is not overly complex; give it a try! For problem description, code, and solution approach, see the referenced blog post.

3. Solving Practical Problems

Critical Path Problem

AOE (Activity On Edge) Network

  • Generally used for scheduling project tasks
  • In an AOE network, activities are represented on edges, and vertices are divided into three parts: vertex number, earliest completion time, and latest completion time
    https://www.icourse163.org/learn/ZJU-93001?tid=1459700443#/learn/content?type=detail&id=1235254066&cid=1254945243&replay=true
First derive the earliest completion time -- mint [ j ] = max( mint[ j ], mint[ i ]+edge[ i ][ j ])

Insert image description here

Then derive the latest completion time backwards -- maxt[ i ] = min( maxt[ j ], maxt[ j ]-edge[ i ][ j ])
This gives the slack time -- D[ i ][ j ] = maxt[ j ] - mint[ i ] - edge[ i ][ j ]

Insert image description here
The critical path consists of activities that absolutely cannot be delayed, i.e., the path with no slack time.

Example Problems

08-Graph 8 How Long Does It Take (25 points) A variation of topological sort to find the earliest completion time. 08-Graph 9 Critical Activities (30 points) Find the critical path. For code and solution approach, see the blog post: PTA Data Structure Problem Set Week 8 -- Graphs (Part 2)

II. Simple Sorting

1. Prerequisites

void X_Sort(ElementType A[], int N);
  • For simplicity, we discuss sorting integers in ascending order
  • N is a positive integer
  • Only comparison-based sorting is discussed (>, =, < are defined)
  • Only internal sorting is discussed
  • Stability: The relative positions of any two equal elements remain unchanged after sorting
  • No single sorting algorithm performs best in all cases!

2. Sorting Algorithms

Test problem: 09-Sorting 1 Sort (25 points)

Bubble Sort

It repeatedly steps through the list of elements to be sorted, compares two adjacent elements, and swaps them if they are in the wrong order (e.g., from large to small, or alphabetically from Z to A). The pass through the list is repeated until no adjacent elements need to be swapped, meaning the list is sorted. The name of this algorithm comes from the fact that smaller elements "bubble" to the top of the list (in ascending or descending order), just like carbon dioxide bubbles eventually rise to the top of a carbonated drink, hence the name "Bubble Sort."

-- Excerpted from Baidu Encyclopedia

void Bubble_Sort(ll a[], int N) {
    for(int P = N-1; P >= 0; P--) {
        bool flag = false;
        //One pass of bubble: compare from top to bottom, swap if upper > lower
        for(int i = 0; i < P; ++i) {
            if(a[i] > a[i+1]) {
                swap(a[i], a[i+1]);
                flag = true;
            }
        }
        if(!flag) break; //Already sorted after one pass, no swaps occurred
    }
}

Time Complexity

Best case: Already sorted, time complexity T = O(N) Worst case: Completely reversed, time complexity T = O(N2)

Pros and Cons

Pros: Simple to write, only swaps adjacent elements, can directly sort even singly linked lists, stable (the relative positions of equal elements remain unchanged after swapping) Cons: High time complexity, slow!

Test Results

Test results shown below, 3 test cases failed:

Insert image description here

Insertion Sort

Insertion sort, also commonly known as direct insertion sort, is an efficient algorithm for sorting a small number of elements. It is one of the simplest sorting methods. The basic idea is to insert a record into an already sorted list, resulting in a new sorted list with one more record. The implementation uses a double loop: the outer loop iterates over all elements except the first, and the inner loop searches for the insertion position within the already sorted portion and performs shifting.

-- Excerpted from Baidu Encyclopedia

void Insertion_Sort(ll a[], int N) {
    for(int P = 1; P < N; P++) {
        ll t = a[P];//Pick up the next card
        int i;
        for(i = P; i > 0 && a[i-1] > t; --i)
            a[i] = a[i-1];  //Make room until the preceding element is less than the current element
        a[i] = t;   //Place the new card
    }
}

Time Complexity

Best case: Already sorted, time complexity T = O(N) Worst case: Completely reversed, time complexity T = O(N2) General case time complexity lower bound calculation: Swapping two adjacent elements eliminates exactly 1 inversion pair. Let there be I inversion pairs. Then T(N, I) = O(N + I)

Pros and Cons

Pros: Stable Cons: The number of comparisons varies; fewer comparisons lead to more data movement after the insertion point, especially when the data volume is large.

Test Results

Test results shown below, quite impressive:

Insert image description here

How to Improve Efficiency

The following theorems hold:

  • Any sequence of N distinct elements has an average of N(N-1)/4 inversion pairs
  • Any sorting algorithm that only swaps adjacent elements has an average time complexity of O(N2)

Therefore, to improve algorithm efficiency, we must:

  • Eliminate more than 1 inversion pair at a time!
  • Swap elements that are far apart each time!

Shell Sort

Takes advantage of insertion sort's simplicity while overcoming the limitation that insertion sort can only swap adjacent elements.

Shell sort groups records by a certain increment of their indices, uses direct insertion sort on each group; as the increment gradually decreases, each group contains more and more elements. When the increment reaches 1, the entire array becomes one group, and the algorithm terminates.

-- Excerpted from Baidu Encyclopedia

Define an increment sequence DM > DM-1 > ... > D1 = 1 For each Dk, perform "Dk-interval" sorting (k = M, M-1, ..., 1) Note: A sequence that is "Dk-interval" sorted will remain "Dk-interval" sorted after performing "Dk-1-interval" sorting!

Shell Sort Increment Sequence Selection

  • Original Shell sort increment sequence: DM = N/2, Dk = Dk+1 / 2
    • If increment elements are not coprime, smaller increments may have no effect!
      Insert image description here
void Shell_Sort(ll a[], int N) {
    for(int D = N/2; D > 0; D /= 2) { //Shell increment sequence
        for(int P = D; P < N; ++P) { //Insertion sort
            ll t = a[P];
            int i;
            for(i = P; i >= D && a[i-D] > t; i -= D)
                a[i] = a[i-D];
            a[i] = t;
        }
    }
}
  • Hibbard increment sequence
    • Dk = 2k - 1 -- adjacent elements are coprime
  • Sedgewick increment sequence, etc.

Pros and Cons

Pros: Fast, with less data movement! Suitable for large data volumes. Cons: Different increment sequence selections lead to varying algorithm complexities; how to choose the increment sequence can only be based on experience; unstable.

Test Results

As you can see, execution time never exceeded 100ms. The speed for these test cases is very satisfactory:

Insert image description here

Selection Sort

Before introducing heap sort, let's introduce selection sort, an old friend.

void Selection_Sort(ll a[], int N) {
    for(int i = 0; i < N; ++i) {
        int mini = 0;
        ll ans = inf;
        //Find the minimum element after position i and assign its position to mini
        for(int j = i; j <= N-1; ++j) {
            if(a[j] < ans) {
                ans = a[j];
                mini = j;
            }
        }
        //Swap the minimum of the unsorted portion to the end of the sorted portion
        swap(a[i], a[mini]);
    }
}

Time Complexity

The complexity is O(N2) regardless of the input.

Test Results

Test results shown below. Although all test cases pass, the last few cases take a long time:

Insert image description here

Heap Sort

Here we use ascending order as an example. We need to adjust the original array into a max-heap starting from index 0, then swap the max-heap's top with the current last element (equivalent to deleting the max-heap's top) and readjust.

void swap(ll& x, ll& y) {
    ll t = x;
    x = y;
    y = t;
}
void PercDown(ll a[], int N, int rt) {
    //Adjust the subtree rooted at a[now] in an array of N elements into a max-heap
    int father, son;
    ll tmp = a[rt];
    for(father = rt; (father*2+1) < N; father = son) {
        son = father * 2 + 1;//Left child
        if(son != N-1 && a[son] < a[son+1]) //Right child exists and is larger than left child
            son++;
        if(tmp >= a[son]) break;//Found the right place
        else a[father] = a[son];//Percolate down
    }
    a[father] = tmp;
}
inline void BuildHeap(ll a[], int N) {
    for(int i = N/2-1; i >= 0; --i) {
        PercDown(a, N, i);
    }
}
void Heap_Sort(ll a[], int N) {
    BuildHeap(a, N);
    for(int i = N-1; i > 0; --i) {
        swap(a[0], a[i]);//Swap max-heap top a[0] with a[i]
        PercDown(a, i, 0);//Readjust after deletion
    }
}

Time Complexity

Heap sort provides the best average time complexity: Best case O(nlogn) Worst case O(nlogn) Average time complexity O(nlogn)

Pros and Cons

Pros: Fast! Even in the worst case, performance is excellent, and it uses little auxiliary space. Cons: Unstable, not suitable for sorting objects.

Test Results

Test results shown below, seems even better than Shell sort:

Insert image description here

Merge Sort

The core is merging sorted subsequences. Here is the recursive implementation. For the non-recursive implementation, see Merge Sort Iterative Implementation

void Merge(ll a[], int s, int m, int e, ll tmp[]) {
    //Merge the local portions a[s,m] and a[m+1,e] of array a into array tmp, keeping tmp sorted
    //Then copy back to a[s,m]  Time complexity O(e-m+1), i.e., O(n);
    int pb = s;//pb is the index for tmp array
    int p1 = s, p2 = m+1;//p1 points to the first half, p2 points to the second half
    while (p1 <= m && p2 <= e) {
        if (a[p1] < a[p2])
            tmp[pb++] = a[p1++];
        else
            tmp[pb++] = a[p2++];
    }
    while(p1 <= m)
        tmp[pb++] = a[p1++];
    while(p2 <= e)
        tmp[pb++] = a[p2++];
    for (int i = 0; i < e-s+1; ++i)
        a[s+i] = tmp[i];
}
void MergeSort(ll a[], int s, int e, ll tmp[]) {
    if (s < e) {//Do nothing if s >= e
        int m = s + (e-s)/2;
        MergeSort(a, s, m, tmp);//Sort the first half
        MergeSort(a, m+1, e, tmp);//Sort the second half
        Merge(a, s, m, e, tmp);//Merge: merge the sorted subarrays a[s..m] and a[m+1..e]
    }
}

Time Complexity

Best case O(nlogn) Worst case O(nlogn) Average time complexity O(nlogn)

Pros and Cons

Pros: Stable, fast Cons: Requires more space

Test Results

Test results shown below, take a close look:

Insert image description here

Quick Sort

  1. Set k = a[0], move k to the appropriate position so that all elements smaller than k are on its left and all elements larger than k are on its right (done in O(n) time). 2. Quick sort the left part of k. 3. Quick sort the right part of k. k is the pivot.
void QuickSort(ll a[], int s, int e){//Quick sort a[s,e]
    if (s >= e)
        return;
    int k = a[s];
    int i = s,j = e;
    while (i != j) {
        while (j > i && a[j] >= k)  --j;
        swap(a[i],a[j]);
        while (i < j && a[i] <= k)  ++i;
        swap(a[i],a[j]);
    }//After processing, a[i] = k;
    QuickSort(a, s, i-1);//Quick sort the left part
    QuickSort(a, i+1, e);//Quick sort the right part
}

Time Complexity

Best case: Perfect partition each time, O(nlogn) Worst case: O(N2) Average time complexity O(nlogn)

Pros and Cons

Pros: The fastest algorithm among all internal sorting algorithms. Cons: Unstable, slow in the worst case!

Test Results

Test results shown below:

Insert image description here

Table Sort

When the data volume is large and the elements to be sorted are objects whose movement requires significant time, we need indirect sorting. Define a pointer array as a "table" to keep track of the elements to be sorted.

Insert image description here

Time Complexity

Best case: Initially sorted. Worst case: N/2 cycles, each containing 2 elements, swapping two elements takes three steps, requiring 3N/2 element moves. T = O(m N), where m is the copy time for each element A.

Radix Sort (Generalization of Bucket Sort)

The algorithms discussed earlier all require comparisons, and in the worst case they all have Nlogn complexity. Can we go faster? Suppose we have N students with grades between 0 and 100 (so there are M = 101 distinct grade values). How can we sort students by grade in linear time? LSD (Least Significant Digit first) MSD (Most Significant Digit first)

Insert image description here
>
Insert image description here

I referenced this blog post for the radix sort code, which uses a very clever method: Radix Sort

ll getMax(ll a[], int n) {//Find the maximum in array a of n elements
    int maxx = a[0];
    for(int i = 1; i < n; ++i) {
        if(a[i] > maxx) maxx = a[i];
    }
    return maxx;
}
void radixsort(ll a[], int n, int exp) { //Sort array a of n elements by "a certain digit" (bucket sort), radix 10
    ll tmp[maxn];
    ll T[20] = {0}; //With negative numbers in decimal, 20 buckets
    for(int i = 0; i < n; ++i) //T stores how many numbers are in each bucket
        T[(a[i]/exp)%10 + 10]++;
    for(int i = 1; i < 20; ++i) //Make T values represent positions in tmp
        T[i] += T[i-1];
    for(int i = n - 1; i >= 0; --i) {
        int now = T[(a[i]/exp)%10 + 10];//The position where this number should go
        tmp[now-1] = a[i];
        T[(a[i]/exp)%10 + 10]--;
    }
    for(int i = 0; i < n; ++i)
        a[i] = tmp[i]; //Copy the sorted tmp back to a
}
void Radix_Sort(ll a[], int n) {
    ll maxnum = getMax(a, n);
    for(int exp = 1; maxnum/exp > 0; exp *= 10)
        radixsort(a, n, exp);
}

Time Complexity

N is the number of elements to sort, and B is the number of buckets. O(P(N+B)): One pass of distribution takes O(N), one pass of collection takes O(B), and P passes of distribution and collection are performed in total.

Pros and Cons

Pros: Suitable for cases where the number of digits is small and the maximum number of digits is not particularly large; fast. Cons: Trades space for time.

Test Results

Test results shown below, super fast:

Insert image description here

III. Comparison of Sorting Algorithms

Insert image description here

喜欢的话,留下你的评论吧~

© 2020 - 2026 cos @cosine
Powered by theme astro-koharu · Inspired by Shoka