1. collections容器資料型別

collections模塊包含除內置型別list、dict和tuple以外的其他容器資料型別，

1.1 ChainMap搜索多個字典

ChainMap類管理一個字典序列，并按其出現的順序搜索以查找與鍵關聯的值，ChainMap提供了一個很多的“背景關系”容器，因為可以把它看作一個堆疊，堆疊增長時發生變更，堆疊收縮時這些變更被丟棄，

1.1.1 訪問值

ChainMap支持與常規字典相同的api來訪問現有的值，

import collections

a = {'a': 'A', 'c': 'C'}
b = {'b': 'B', 'c': 'D'}

m = collections.ChainMap(a, b)

print('Individual Values')
print('a = {}'.format(m['a']))
print('b = {}'.format(m['b']))
print('c = {}'.format(m['c']))
print()

print('Keys = {}'.format(list(m.keys())))
print('Values = {}'.format(list(m.values())))
print()

print('Items:')
for k, v in m.items():
    print('{} = {}'.format(k, v))
print()

print('"d" in m: {}'.format(('d' in m)))

按子映射傳遞到建構式的順序來搜索這些子映射，所以對應鍵 'c' 報告的值來自a字典，

1.1.2 重排

ChainMap會在它的maps屬性中存盤要搜索的映射串列，這個串列是可變的，所以可以直接增加新映射，或者改變元素的順序以控制查找和更新行為，

import collections

a = {'a': 'A', 'c': 'C'}
b = {'b': 'B', 'c': 'D'}

m = collections.ChainMap(a, b)

print(m.maps)
print('c = {}\n'.format(m['c']))

# reverse the list
m.maps = list(reversed(m.maps))

print(m.maps)
print('c = {}'.format(m['c']))

要映射串列反轉時，與之關聯的值將 'c' 更改，

1.1.3 更新值

ChainMap不會快取子映射中的值，因此，如果它們的內容有修改，則訪問ChainMap時會反映到結果中，

import collections

a = {'a': 'A', 'c': 'C'}
b = {'b': 'B', 'c': 'D'}

m = collections.ChainMap(a, b)
print('Before: {}'.format(m['c']))
a['c'] = 'E'
print('After : {}'.format(m['c']))

改變與現有鍵關聯的值與增加新元素的做法一樣，

也可以直接通過ChainMap設定值，不過實際上只有鏈中的第一個映射會被修改，

import collections

a = {'a': 'A', 'c': 'C'}
b = {'b': 'B', 'c': 'D'}

m = collections.ChainMap(a, b)
print('Before:', m)
m['c'] = 'E'
print('After :', m)
print('a:', a)

使用m存盤新值時，a映射會更新，

ChainMap提供了一種便利方法，可以用一個額外的映射在maps串列的最前面創建一個新實體，這樣就能輕松地避免修改現有的底層資料結構，

import collections

a = {'a': 'A', 'c': 'C'}
b = {'b': 'B', 'c': 'D'}

m1 = collections.ChainMap(a, b)
m2 = m1.new_child()

print('m1 before:', m1)
print('m2 before:', m2)

m2['c'] = 'E'

print('m1 after:', m1)
print('m2 after:', m2)

正是基于這種堆疊行為，可以很方便地使用ChainMap實體作為模板或應用背景關系，具體地，可以很容易地在一次迭代中增加或更新值，然后再下一次迭代中丟棄這些改變，

如果新背景關系已知或提前構建，還可以向new_child()傳遞一個映射，

import collections

a = {'a': 'A', 'c': 'C'}
b = {'b': 'B', 'c': 'D'}
c = {'c': 'E'}

m1 = collections.ChainMap(a, b)
m2 = m1.new_child(c)

print('m1["c"] = {}'.format(m1['c']))
print('m2["c"] = {}'.format(m2['c']))

這相當于：

m2 = collections.ChainMap(c, *m1.maps)

并且還會產生：

1.2 Counter統計可散列的物件

Counter是一個容器，可以跟蹤等效值增加的次數，這個類可以用來實作其他語言中常用包（bag）或多集合（multiset）資料結構實作的演算法，

1.2.1 初始化

Counter支持3種形式的初始化，呼叫Counter的建構式時可以提供一個元素序列或者一個包含鍵和計數的字典，還可以使用關鍵字引數將字串名映射到計數，

import collections

print(collections.Counter(['a', 'b', 'c', 'a', 'b', 'b']))
print(collections.Counter({'a': 2, 'b': 3, 'c': 1}))
print(collections.Counter(a=2, b=3, c=1))

這3種形式的初始化結果都是一樣的，

如果不提供任何引數，則可以構造一個空Counter，然后通過update()方法填充，

import collections

c = collections.Counter()
print('Initial :', c)

c.update('abcdaab')
print('Sequence:', c)

c.update({'a': 1, 'd': 5})
print('Dict    :', c)

計數值只會根據新資料增加，替換資料并不會改變計數，在下面的例子中，a的計數會從3增加到4，

1.2.2 訪問計數

一旦填充了Counter，便可以使用字典API獲取它的值，

import collections

c = collections.Counter('abcdaab')

for letter in 'abcde':
    print('{} : {}'.format(letter, c[letter]))

對于未知的元素，Counter不會產生KeyError，如果在輸入中沒有找到某個值(此例中的e)，則其計數為0.

elements()方法回傳一個迭代器，該迭代器將生成Counter知道的所有元素，

import collections

c = collections.Counter('extremely')
c['z'] = 0
print(c)
print(list(c.elements()))

不能保證元素的順序不變，另外計數小于或等于0的元素不包含在內，

使用most_common()可以生成一個序列，其中包含n個最常遇到的輸入值及相應計數，

import collections

c = collections.Counter()
with open('test.txt', 'rt') as f:
    for line in f:
        c.update(line.rstrip().lower())

print('Most common:')
for letter, count in c.most_common(3):
    print('{}: {:>7}'.format(letter, count))

這個例子要統計系統字典內所有單詞中出現的字母，以生成一個頻度分布，然后列印3個最常見的字母，如果不向most_common()提供引數，則會生成由所有元素構成的一個串列，按頻度排序，

1.2.3 算術操作

Counter實體支持用算術和集合操作來完成結果的聚集，下面這個例子展示了創建新Counter實體的標準運算子，不過也支持+=，-=，&=和|=等原地執行的運算子，

import collections

c1 = collections.Counter(['a', 'b', 'c', 'a', 'b', 'b'])
c2 = collections.Counter('alphabet')

print('C1:', c1)
print('C2:', c2)

print('\nCombined counts:')
print(c1 + c2)

print('\nSubtraction:')
print(c1 - c2)

print('\nIntersection (taking positive minimums):')
print(c1 & c2)

print('\nUnion (taking maximums):')
print(c1 | c2)

每次通過一個操作生成一個新的Counter時，計數為0或負數的元素都會被洗掉，在c1和c2中a的計數相同，所以減法操作后它的計數為0，

1.3 defaultdict缺少的鍵回傳一個默認值

標準字典包括一個setdefault()方法，該方法被用來獲取一個值，如果這個值不存在則建立一個默認值，與之相反，初始化容器時defaultdict會讓呼叫者提前指定默認值，

import collections

def default_factory():
    return 'default value'

d = collections.defaultdict(default_factory, foo='bar')
print('d:', d)
print('foo =>', d['foo'])
print('bar =>', d['bar'])

只要所有鍵都有相同的默認值，那么這個方法就可以被很好的使用，如果默認值是一種用于聚集或累加值的型別，如list、set或者int，那么這個方法尤其有用，標準庫檔案提供了很多以這種方式使用defaultdict的例子，

1.4 deque雙端佇列

雙端佇列或deque支持從任意一端增加和洗掉元素，更為常用的兩種結構(即堆疊和佇列)就是雙端佇列的退化形式，它們的輸入和輸出被限制在某一端，

import collections

d = collections.deque('abcdefg')
print('Deque:', d)
print('Length:', len(d))
print('Left end:', d[0])
print('Right end:', d[-1])

d.remove('c')
print('remove(c):', d)

由于deque是一種序列容器，因此同樣支持list的一些操作，如用__getitem__()檢查內容，確定長度，以及通過匹配標識從佇列中間洗掉元素，

1.4.1 填充

可以從任意一端填充deque，其在Python實作中被稱為"左端"和"右端" ，

import collections

# Add to the right
d1 = collections.deque()
d1.extend('abcdefg')
print('extend    :', d1)
d1.append('h')
print('append    :', d1)

# Add to the left
d2 = collections.deque()
d2.extendleft(range(6))
print('extendleft:', d2)
d2.appendleft(6)
print('appendleft:', d2)

extendleft()函式迭代處理其輸入，對各個元素完成與appendleft()同樣的處理，最終結果是deque將包含逆序的輸入序列，

1.4.2 消除

類似地，可以從兩端或任意一端消除deque的元素，這取決于所應用的演算法，

import collections

print('From the right:')
d = collections.deque('abcdefg')
while True:
    try:
        print(d.pop(), end='')
    except IndexError:
        break
print()

print('\nFrom the left:')
d = collections.deque(range(6))
while True:
    try:
        print(d.popleft(), end='')
    except IndexError:
        break
print()

使用pop()可以從deque的右端洗掉一個元素，使用popleft()可以從左端取一個元素，

由于雙端佇列是執行緒安全的，所以甚至可以在不同執行緒中同時從兩端消除佇列的內容，

import collections
import threading
import time

candle = collections.deque(range(5))

def burn(direction, nextSource):
    while True:
        try:
            next = nextSource()
        except IndexError:
            break
        else:
            print('{:>8}: {}'.format(direction, next))
            time.sleep(0.1)
    print('{:>8} done'.format(direction))
    return

left = threading.Thread(target=burn,
                        args=('Left', candle.popleft))
right = threading.Thread(target=burn,
                         args=('Right', candle.pop))

left.start()
right.start()

left.join()
right.join()

這個例子中的執行緒交替處理兩端，洗掉元素，直至這個deque為空，

1.4.3 旋轉

deque的另一個很有用的方面是可以按任意一個方向旋轉，從而跳過一些元素，

import collections

d = collections.deque(range(10))
print('Normal        :', d)

d = collections.deque(range(10))
d.rotate(2)
print('Right rotation:', d)

d = collections.deque(range(10))
d.rotate(-2)
print('Left rotation :', d)

將deque向右旋轉(使用一個正旋轉值)會從右端取元素，并且把它們移到左端，向左旋轉(使用一個負值)則從左端將元素移至右端，可以形象地把deque中的元素看作是刻在撥號盤上，這對于理解雙端佇列很有幫助，

1.4.4 限制佇列大小

配置deque實體時可以指定一個最大長度，使它不會超過這個大小，佇列達到指定長度時，隨著新元素的增加會洗掉現有的元素，如果要查找一個長度不確定的流中的最后n個元素，那么這種行為會很有用，

import collections
import random

# Set the random seed so we see the same output each time
# the script is run.
random.seed(1)

d1 = collections.deque(maxlen=3)
d2 = collections.deque(maxlen=3)

for i in range(5):
    n = random.randint(0, 100)
    print('n =', n)
    d1.append(n)
    d2.appendleft(n)
    print('D1:', d1)
    print('D2:', d2)

不論元素增加到哪一端，佇列長度都保持不變，

1.5 namedtuple帶命名欄位的元組子類

標準tuple使用數值索引來訪問其成員，

bob = ('Bob', 30, 'male')
print('Representation:', bob)

jane = ('Jane', 29, 'female')
print('\nField by index:', jane[0])

print('\nFields by index:')
for p in [bob, jane]:
    print('{} is a {} year old {}'.format(*p))

對于簡單的用途，tuple是很方便的容器，

另一方面，使用tuple時需要記住對應各個值要使用哪個索引，這可能會導致錯誤，特別是當tuple有大量欄位，而且構造元和使用元組的位置相距很遠時，namedtuple除了為各個成員指定數值索引外，還為其指定名字，

1.5.1 定義

與常規的元組一樣，namedtuple實體在記憶體使用方面同樣很高效，因為它們沒有每一個實體的字典，

各種namedtuple都由自己的類表示，這個類使用namedtuple()工廠數來創建，引數就是新類名和一個包含元素名的字串，

import collections

Person = collections.namedtuple('Person', 'name age')

bob = Person(name='Bob', age=30)
print('\nRepresentation:', bob)

jane = Person(name='Jane', age=29)
print('\nField by name:', jane.name)

print('\nFields by index:')
for p in [bob, jane]:
    print('{} is {} years old'.format(*p))

如這個例子所示，除了使用標準元組的位置索引外，還可以使用點記發(obj.attr)按名字訪問namedtuple的欄位，

與常規tuple類似，namedtuple也是不可修改的，這個限制允許tuple實體具有一致的散列值，這使得可以把它們用作字典中的鍵并包含在集合中，

import collections

Person = collections.namedtuple('Person', 'name age')

pat = Person(name='Pat', age=12)
print('\nRepresentation:', pat)

pat.age = 21

如果試圖通過命名屬性改變一個值，那么這會導致一個AttributeError，

1.5.2 非法欄位名

如果欄位名重復或者與Python關鍵字沖突，那么其就是非法欄位名，

import collections

try:
    collections.namedtuple('Person', 'name class age')
except ValueError as err:
    print(err)

try:
    collections.namedtuple('Person', 'name age age')
except ValueError as err:
    print(err)

決議欄位名時，非法值會導致ValueError例外，

如果要基于程式控制之外的值創建一個namedtuple(如表示一個資料庫查詢回傳的記錄行，而事先并不知道資料庫模式)，那么這種情況下應把rename選項設定為True，以對非法欄位重命名，

import collections

with_class = collections.namedtuple(
    'Person', 'name class age',
    rename=True)
print(with_class._fields)

two_ages = collections.namedtuple(
    'Person', 'name age age',
    rename=True)
print(two_ages._fields)

重命名欄位的新名字取決于它在元組中的索引，所以名為class的欄位會變成_1，重復的age欄位則變成_2，

1.5.3 指定屬性

namedtuple提供了很多有用的屬性和方法來處理子類和實體，所有這些內置屬性名都有一個下劃線(_)前綴，按慣例在大多數Python程式中，這都會指示一個私有屬性，不過，對于namedtuple，這個前綴是為了防止這個名字與用戶提供的屬性名沖突，

傳入namedtuple來定義新類的欄位名會保存在_fields屬性中，

import collections

Person = collections.namedtuple('Person', 'name age')

bob = Person(name='Bob', age=30)
print('Representation:', bob)
print('Fields:', bob._fields)

盡管引數是一個用空格分隔的字串，但儲存的值卻是由各個名字組成的一個序列，

可以使用_asdict()將namedtuple實體轉換為OrderedDict實體，

import collections

Person = collections.namedtuple('Person', 'name age')

bob = Person(name='Bob', age=30)
print('Representation:', bob)
print('As Dictionary:', bob._asdict())

OrderedDict的鍵與相應namedtuple的欄位順序相同，

_replace()方法構建一個新實體，在這個程序中會替換一些欄位的值，

import collections

Person = collections.namedtuple('Person', 'name age')

bob = Person(name='Bob', age=30)
print('\nBefore:', bob)
bob2 = bob._replace(name='Robert')
print('After:', bob2)
print('Same?:', bob is bob2)

盡管從名字上看似憾訓修改現有的物件，但由于namedtuple實體是不可變的，所以實際上這個方法會回傳一個新物件，

1.6 OrderedDict記住向字典中添加鍵的順序

OrderedDict是一個字典子類，可以記住其內容增加的順序，

import collections

print('Regular dictionary:')
d = {}
d['a'] = 'A'
d['b'] = 'B'
d['c'] = 'C'

for k, v in d.items():
    print(k, v)

print('\nOrderedDict:')
d = collections.OrderedDict()
d['a'] = 'A'
d['b'] = 'B'
d['c'] = 'C'

for k, v in d.items():
    print(k, v)

常規dict并不跟蹤插入順序，迭代處理時會根據散串列中如何存盤鍵來按順序生成值，而散串列中鍵的存盤會受一個隨機值的影響，以減少沖突，OrderedDict中則相反，它會記住元素插入的順序，并在創建迭代器時使用這個順序，

1.6.1 相等性

常規的dict在檢查相等性時會查看其內容，OrderedDict還會考慮元素增加的順序，

import collections

print('dict       :', end=' ')
d1 = {}
d1['a'] = 'A'
d1['b'] = 'B'
d1['c'] = 'C'

d2 = {}
d2['c'] = 'C'
d2['b'] = 'B'
d2['a'] = 'A'

print(d1 == d2)

print('OrderedDict:', end=' ')

d1 = collections.OrderedDict()
d1['a'] = 'A'
d1['b'] = 'B'
d1['c'] = 'C'

d2 = collections.OrderedDict()
d2['c'] = 'C'
d2['b'] = 'B'
d2['a'] = 'A'

print(d1 == d2)

在這個例子中，由于兩個有序字典由不同順序的值創建，所以認為這兩個有序字典是不同的，

1.6.2 重排

在OrderedDict中可以使用move_to_end()將鍵移至序列的起始或末尾位置來改變鍵的順序，

import collections

d = collections.OrderedDict(
    [('a', 'A'), ('b', 'B'), ('c', 'C')]
)

print('Before:')
for k, v in d.items():
    print(k, v)

d.move_to_end('b')

print('\nmove_to_end():')
for k, v in d.items():
    print(k, v)

d.move_to_end('b', last=False)

print('\nmove_to_end(last=False):')
for k, v in d.items():
    print(k, v)

last引數會告訴move_to_end()要把元素移動為鍵序列的最后一個元素(引數值為True)或者第一個元素(引數值為Flase)，

1.7 collections.abc容器的抽象基類

collections.abc模塊包含一些抽象基類，其為Python內置容器資料結構以及collections模塊定義的容器資料結構定義了API，

類	基類	API 用途
Container		基本容器特性，如in運算子
Hashable		增加了散列支持，可以為容器實體提供散列值
Iterable		可以在容器內容上創建一個迭代器
Iterator	Iterable	這是容器內容上的一個迭代器
Generator	Iterator	為迭代器擴展了生成器協議
Sized		為知道自己大小的容器增加方法
Callable		可以作為函式來呼叫的容器
Sequence	Sized, Iterable, Container	支持獲取單個元素以及迭代和改變元素順序
MutableSequence	Sequence	支持創建一個實體之后增加和洗掉元素
ByteString	Sequence	合并bytes和bytearray的API
Set	Sized, Iterable, Container	支持集合操作，如交集和并集
MutableSet	Set	增加了創建集合后管理集合內容的方法
Mapping	Sized, Iterable, Container	定義dict使用的只讀API
MutableMapping	Mapping	定義創建映射后管理映射內容的方法
MappingView	Sized	定義從迭代器訪問映射內容的方法
ItemsView	MappingView, Set	視圖API的一部分
KeysView	MappingView, Set	視圖API的一部分
ValuesView	MappingView	視圖API的一部分
Awaitable		await運算式中可用的物件的API，如協程
Coroutine	Awaitable	實作協程協議的類的API
AsyncIterable		與async for兼容的iterable的API
AsyncIterator	AsyncIterable	異步迭代器的API

除了明確的定義不同的容器的API，這些抽象基類還可以在呼叫物件前用isinstance()測驗一個物件是否支持一個API，有些類還提供了方法實作，它們可以作為“混入類”(min-in)構造定制容器型別，而不必從頭實作每一個方法，

轉載請註明出處，本文鏈接：https://www.uj5u.com/houduan/192009.html

標籤：Python

上一篇：python資料分析學習(2)pandas二維工具DataFrame講解

下一篇：輕松搞懂Python遞回函式的原理與應用

Python3標準庫：collections容器資料型別