What is a maximum heap

Binary heap

See related code in this article Algorithms / BinaryHeap

definition

The binary heap is essentially an almost entire binary tree represented by an array. The numbers in the array correspond one-to-one to the traversal of the binary tree at the BFS level.

The figure above shows the correspondence between the 1-index array and the binary tree sliced ​​from CLRS. For personal programming habits, we take the array 0 index as an example (1 index can be derived from the index relationship). The root node is stored in the index position of array 0, and the indexes of the parent node and the corresponding left and right child nodes have the following relationship

Maximum heap, minimum heap

When a binary heap is satisfied and all child nodes are no larger than the parent node, it is referred to as the maximum heap

most big clusters : array [parent (i)]> = array [i]

In contrast, if all the child nodes of a binary heap are not smaller than the parent node, that is the smallest heap
most small clusters : array [parent (i)] <= array [i]

Build a pile

Question: Assuming you get an array, how is the maximum heap (minimum heap) built based on that array, and what is the time complexity?


Introduction: Maintaining the Heap-Heapify Type
For an array A and the index i, the left (i) subtree and the right (i) subtree of the array satisfy the property of the largest (small) heap, while i, left (i) and correct (i) may not be satisfied. How do you work with the data so that the i-subtree can fulfill the type of heap? Take the largest pile as an example:
We just need to compare three values ​​and swap the maximum value with the root node. Note that the original heap properties of the child nodes may be destroyed after the replacement, creating a similar sub-problem. Solving the sub-problem recursively can complete all heap maintenance.

The minimum heap is available in the same way, so it is omitted here.
Let's analyze the complexity here. In the worst case, the size of the sub-problem is the original problem.23 This happens when the sub-problem tree is full and the subtree for the symmetrical position reacts to the completely empty height:

(∑ki = 02i) / 2 + 2k∑ki = 02i + 2k≈23

and the time it takes to find the minimum value and the exchange is constant O (1), so we have:

can be obtained from the main formula of the Divide and Conquer problem (we will create a separate blog to explain this), and its time complexity is O (lgn).

Build heap-buildHeap
We use heapify to create a bottom-up heap that must meet the conditions to use heapify.
In the case of a heap-shaped array, it obviously fulfills the nature of the heap, since its leaf nodes have no child nodes. So we do heapify servicing sequentially from the first non-leaf node, which may not match the type of heap, to the root node. Then we can get the final full heap.

An example of buildHeap is shown below

Its correctness is easily proven by the loop invariant "always fulfilling the heapify condition".
Since it is doing a maxHeapify operation on n / 2 nodes and each maxHeapify needs O (lgn) in the worst case, it seems that the entire buildHeap needs O (nlgn), which in fact it is an upper bound but not a tight upper one Border. This is because not all nodes need a recursion depth of O (lgn) to maintain the type of heap. For example, the lower Heapify node with no leaf only takes 1 call to complete. So a narrow limit is calculated as follows:
h represents the distance between the node and the lower node, since the node that requires O (h) calls to heap it has n / 2h + 1One, which is the time complexity:

∑h = 0lgnn2h + 1O (h) = O (n∑h = 0lgnh2h + 1) ≤O (2n) = O (n)

So buildHeap's narrow limit for a given array is O (n).
But watch outIf you add items one at a time to create a top-down heap, the time complexity is O (nlgn).

application

Sort piles

For the maximum heap, we take the first element of the array as the maximum value of the entire array and change the last element to the head of the array. At this point it meets the conditions to use heapify and the heap maintenance can still get a heap (relatively one element less than before), the recursive process can take out the maximum value of the remaining array one by one, and then you can get the sorted result receive. There are n calls to Heapify, so its complexity is O (nlgn). Note that the operation can be performed directly (as long as the extracted maximum value is swapped with the end of the heap), the properties can be said to be very good.

top k problem

Question: We want to find the top k values ​​in the stream data. What should we do?
maintains a minimal heap of size k and compares it to the top of the heap every time new data comes in. If it is larger than the top of the heap, replace the top of the heap (since the top of the heap represents the entire element. The smallest element of the heap, if there is an element larger than this, it cannot be the first k element be.)
The same way it calculates the maximum heap that will be used before the data stream.

Priority queue

Question: The general queue is first-in, first-out. Sometimes we need a queue that is selectively removed from the queue based on its key (priority). The higher priority is the first to be removed from the queue regardless of the insert order. For example, the queue of patients in the hospital is based on the urgency of the illness and has nothing to do with the order of arrival.
For the priority queue, we use the heap to implement it below, since the heap naturally retains the property that the heap head is the highest value of the entire array.
For the priority queue, we need to add three operations to the heap:
1. Enter the team push
First add new elements to the end of the array, then keep the heap properties from bottom to bottom. Its complexity is O (lgn)
(Of course you can also use the following increment key, first set its priority to infinitesimal and then increment the key to a certain value. This implementation is not used in the code as it is not good for generic setting infinitesimal)

2. Team leader max (min)
The first element of the heap array is the most valuable element. Just return the first item

3. Get out of the team pop
In fact, you can delete an item anywhere in the priority queue as long as you swap the item item with the end of the heap and delete the last item and then keep the heap nature of that item. It is pronounced as O (lgn)
As a special case, you only need to get the largest item first and then insert (0) when you leave the team.

4. Increase the priority increase key (depending on the priority definition it can sometimes also be a decrease key, i.e. the lower the priority, the higher the priority)
This operation is necessary because sometimes it is necessary to change the priority of the existing items in the queue, e.g. B. a patient in queue in a hospital with a sudden illness. Another very common point is that in the Single Source Shortest Path (SSSP) problem, if the path weight is not negative, then Dijkstra is used to update the current value for the element's shortest path.
For the largest heap, the increment key and its parent cannot make the heap satisfy the heap properties, so it is enough to keep the heap properties one at a time. Its complexity is O (lgn).

Conclusion

This article introduces two types of binary heaps: the largest and the smallest. Using the largest heap as an example, it describes how to create a heap and analyze the complexity of related operations. Finally, three uses of Heap for practical problems are presented.
The binary heap is basically over here, but the heap is far from over. Consider the following two questions:
1. How can two heaps be merged quickly (O (lgn))? -Binomial Cluster
2. How can you reduce the key quickly? (O (lgn) isn't fast enough, I hope to pay back O (1)) - Fibonacci stack
will later be broken down into several blogs to introduce these two data structures