1. heapq堆排序演算法
堆(heap)是一個樹形資料結構,其中子節點與父節點有一種有序關系,二叉堆(binary heap)可以使用一個有組織的串列或陣串列示,其中元素N的子元素位于2*N+1和2*N+2(索引從0開始),這種布局允許原地重新組織堆,從而不必再添加或洗掉元素時重新分配大量記憶體,
最大堆(max-heap)確保父節點大于或等于其兩個子節點,最小堆(min-heap)要求父節點小于或等于其子節點,Python的heapq模塊實作了一個最小堆,
1.1 創建堆
創建堆有兩種基本方式:heappush()和heapify(),
import heapq import math from io import StringIO data = [19, 9, 4, 10, 11] def show_tree(tree, total_width=36, fill=' '): """Pretty-print a tree.""" output = StringIO() last_row = -1 for i, n in enumerate(tree): if i: row = int(math.floor(math.log(i + 1, 2))) else: row = 0 if row != last_row: output.write('\n') columns = 2 ** row col_width = int(math.floor(total_width / columns)) output.write(str(n).center(col_width, fill)) last_row = row print(output.getvalue()) print('-' * total_width) print() heap = [] print('random :', data) print() for n in data: print('add {:>3}:'.format(n)) heapq.heappush(heap, n) show_tree(heap)
使用heappush(),從資料源增加新元素時會保持元素的堆排序順序,

如果資料已經在記憶體中,那么使用heapify()原地重新組織串列中的元素會更高效,
import heapq import math from io import StringIO data = [19, 9, 4, 10, 11] def show_tree(tree, total_width=36, fill=' '): """Pretty-print a tree.""" output = StringIO() last_row = -1 for i, n in enumerate(tree): if i: row = int(math.floor(math.log(i + 1, 2))) else: row = 0 if row != last_row: output.write('\n') columns = 2 ** row col_width = int(math.floor(total_width / columns)) output.write(str(n).center(col_width, fill)) last_row = row print(output.getvalue()) print('-' * total_width) print() print('random :', data) heapq.heapify(data) print('heapified :') show_tree(data)
如果按堆順序一次一個元素地構建串列,那么結果與構建一個無序串列再呼叫heapify()是一樣的,

1.2 訪問堆內容
一旦堆已經被正確組織,則可以使用heappop()洗掉有最小值的元素,
import heapq import math from io import StringIO data = [19, 9, 4, 10, 11] def show_tree(tree, total_width=36, fill=' '): """Pretty-print a tree.""" output = StringIO() last_row = -1 for i, n in enumerate(tree): if i: row = int(math.floor(math.log(i + 1, 2))) else: row = 0 if row != last_row: output.write('\n') columns = 2 ** row col_width = int(math.floor(total_width / columns)) output.write(str(n).center(col_width, fill)) last_row = row print(output.getvalue()) print('-' * total_width) print() print('random :', data) heapq.heapify(data) print('heapified :') show_tree(data) print() for i in range(2): smallest = heapq.heappop(data) print('pop {:>3}:'.format(smallest)) show_tree(data)
這個例子是由標準庫檔案改寫的,其中使用heapify()和heappop()對一個數字佇列進行排序,

如果希望在一個操作中洗掉現有元素并替換為新值,則可以使用heapreplace(),
import heapq import math from io import StringIO data = [19, 9, 4, 10, 11] def show_tree(tree, total_width=36, fill=' '): """Pretty-print a tree.""" output = StringIO() last_row = -1 for i, n in enumerate(tree): if i: row = int(math.floor(math.log(i + 1, 2))) else: row = 0 if row != last_row: output.write('\n') columns = 2 ** row col_width = int(math.floor(total_width / columns)) output.write(str(n).center(col_width, fill)) last_row = row print(output.getvalue()) print('-' * total_width) print() heapq.heapify(data) print('start:') show_tree(data) for n in [0, 13]: smallest = heapq.heapreplace(data, n) print('replace {:>2} with {:>2}:'.format(smallest, n)) show_tree(data)
通過原地替換元素,就這樣可以維持一個固定大小的堆,如按優先級排序的作業佇列,

1.3 堆的資料極值
heapq還包括兩個檢查可迭代物件(iterable)的函式,可以查找其中包含的最大或最小值的范圍,
import heapq import math from io import StringIO data = [19, 9, 4, 10, 11] def show_tree(tree, total_width=36, fill=' '): """Pretty-print a tree.""" output = StringIO() last_row = -1 for i, n in enumerate(tree): if i: row = int(math.floor(math.log(i + 1, 2))) else: row = 0 if row != last_row: output.write('\n') columns = 2 ** row col_width = int(math.floor(total_width / columns)) output.write(str(n).center(col_width, fill)) last_row = row print(output.getvalue()) print('-' * total_width) print() print('all :', data) print('3 largest :', heapq.nlargest(3, data)) print('from sort :', list(reversed(sorted(data)[-3:]))) print('3 smallest:', heapq.nsmallest(3, data)) print('from sort :', sorted(data)[:3])
只有當n值(n>1)相對小時使用nlargest()和nsmallest()才算高效,不過有些情況下這兩個函式會很方便,

1.4 高效合并有序序列
對于小資料集,將多個有序序列合并到一個新序列很容易,
list(sorted(itertools.chain(*data)))
對于較大的資料集,這個技術可能會占用大量記憶體,merge()不是對整個合并后的序列排序,而是使用一個堆一次一個元素地生成一個新序列,利用固定大小的記憶體確定下一個元素,
import heapq import random random.seed(2016) data = [] for i in range(4): new_data = list(random.sample(range(1, 101), 5)) new_data.sort() data.append(new_data) for i, d in enumerate(data): print('{}: {}'.format(i, d)) print('\nMerged:') for i in heapq.merge(*data): print(i, end=' ') print()
由于merge()的實作使用了一個堆,所以它會根據所合并的序列個數消耗記憶體,而不是根據這些序列中的元素個數,

轉載請註明出處,本文鏈接:https://www.uj5u.com/houduan/190363.html
標籤:Python
上一篇:Python 類 初學者筆記
