1. zlib GNUzlib壓縮
zlib模塊為GNU專案zlib壓縮庫中的很多函式提供了底層介面,
1.1 處理記憶體中的資料
使用zlib最簡單的方法要求把所有將要壓碩訓解壓縮的資料存放在記憶體中,
import zlib import binascii original_data = b'This is the original text.' print('Original :', len(original_data), original_data) compressed = zlib.compress(original_data) print('Compressed :', len(compressed), binascii.hexlify(compressed)) decompressed = zlib.decompress(compressed) print('Decompressed :', len(decompressed), decompressed)
compress()和decompress()函式都取一個位元組序列引數,并且回傳一個位元組序列,

從前面的例子可以看到,少量資料的壓縮版本可能比未壓縮的版本還要大,具體的結果取決于輸入資料,不過觀察小資料集的壓縮開銷很有意思,
import zlib original_data = b'This is the original text.' template = '{:>15} {:>15}' print(template.format('len(data)', 'len(compressed)')) print(template.format('-' * 15, '-' * 15)) for i in range(5): data = original_data * i compressed = zlib.compress(data) highlight = '*' if len(data) < len(compressed) else '' print(template.format(len(data), len(compressed)), highlight)
輸出中的*突出顯示了哪些行的壓縮資料比未壓縮版本占用的記憶體更多,

zlib支持不同的壓縮級別,允許在計算成本和空間縮減量之間有所平衡,默認壓縮級別zlib.Z_DEFAULT_COMPRESSION為-1,這對應一個硬編碼值,表示性能和壓縮結果之間的一個折中,當前這對應級別6,
import zlib input_data = b'Some repeated text.\n' * 1024 template = '{:>5} {:>5}' print(template.format('Level', 'Size')) print(template.format('-----', '----')) for i in range(0, 10): data = zlib.compress(input_data, i) print(template.format(i, len(data)))
壓縮級別為0意味著根本沒有壓縮,級別9要求的計算最多,同時會生成最小的輸出,如下面的例子,對于一個給定的輸入,可以多個壓縮級別得到的空間縮減量是一樣的,

1.2 增量壓縮與解壓縮
這種記憶體中的壓縮方法有一些缺點,主要是系統需要有足夠的記憶體,可以在記憶體中同時駐留未壓縮和壓縮版本,因此這種方法對于真實世界的用例并不實用,另一種方法是使用Compress和Decompress物件以增量方式處理資料,這樣就不需要將整個資料集都放在記憶體中,
import zlibimport binascii compressor = zlib.compressobj(1) with open('lorem.txt','rb') as input: while True: block = input.read(64) if not block: break compressed = compressor.compress(block) if compressed: print('Compressed: {}'.format( binascii.hexlify(compressed))) else: print('buffering...') remaining = compressor.flush() print('Flushed: {}'.format(binascii.hexlify(remaining)))
這個例子從一個純文本檔案讀取小資料塊,并把這個資料集傳至compress(),壓縮器維護壓縮資料的一個記憶體緩沖區,由于壓縮演算法依賴于校驗和以及最小塊大小,所以壓縮器每次接收更多輸入時可能并沒有準備好回傳資料,如果它沒有準備好一個完整的壓縮塊,那便會回傳一個空位元組串,當所有

1.3 混合內容流
在壓縮和未壓縮資料混合在一起的情況下,還可以使用decompressobj()回傳的Decompress類,
import zlib lorem = open('lorem.txt','rb').read() compressed = zlib.compress(lorem) combined = compressed +lorem decompressor = zlib.decompressobj() decompressed = decompressor.decompress(combined) decompressed_matches = decompressed == lorem print('Decompressed matches lorem:',decompressed_matches) unused_matches = decompressor.unused_data =https://www.cnblogs.com/liuhui0308/p/= lorem print('Unused data matches lorem:',unused_matches)
解壓縮所有資料后,unused_data屬性會包含未用的所有資料,

1.4 校驗和
除了壓縮和解壓縮函式,zlib還包括兩個用于計算資料的校驗和的函式,分別是adler32()和crc32(),這兩個函式計算出的校驗和都不能認為是密碼安全的,它們只用于資料完整性驗證,
import zlib data = open('lorem.txt','rb').read() cksum = zlib.adler32(data) print('Adler32: {:12d}'.format(cksum)) print(' : {:12d}'.format(zlib.adler32(data,cksum))) cksum = zlib.crc32(data) print('CRC-32: {:12d}'.format(cksum)) print(' : {:12d}'.format(zlib.crc32(data,cksum)))
這兩個函式取相同的引數,包括一個包含資料的位元組串和一個可選值,這個值可作為校驗和的起點,這些函式會回傳一個32位有符號整數值,這個值可以作為一個新的起點引數再傳回給后續的呼叫,以生成一個動態變化的校驗和,

1.5 壓縮網路資料
下一個代碼清單中的服務器使用流壓縮器來回應檔案名請求,它將檔案的一個壓縮版本寫至與客戶通信的套接字中,
import zlib import logging import socketserver import binascii BLOCK_SIZE = 64 class ZlibRequestHandler(socketserver.BaseRequestHandler): logger = logging.getLogger('Server') def handle(self): compressor = zlib.compressobj(1) # Find out what file the client wants filename = self.request.recv(1024).decode('utf-8') self.logger.debug('client asked for: %r', filename) # Send chunks of the file as they are compressed with open(filename, 'rb') as input: while True: block = input.read(BLOCK_SIZE) if not block: break self.logger.debug('RAW %r', block) compressed = compressor.compress(block) if compressed: self.logger.debug( 'SENDING %r', binascii.hexlify(compressed)) self.request.send(compressed) else: self.logger.debug('BUFFERING') # Send any data being buffered by the compressor remaining = compressor.flush() while remaining: to_send = remaining[:BLOCK_SIZE] remaining = remaining[BLOCK_SIZE:] self.logger.debug('FLUSHING %r', binascii.hexlify(to_send)) self.request.send(to_send) return if __name__ == '__main__': import socket import threading from io import BytesIO logging.basicConfig( level=logging.DEBUG, format='%(name)s: %(message)s', ) logger = logging.getLogger('Client') # Set up a server, running in a separate thread address = ('localhost', 0) # let the kernel assign a port server = socketserver.TCPServer(address, ZlibRequestHandler) ip, port = server.server_address # what port was assigned? t = threading.Thread(target=server.serve_forever) t.setDaemon(True) t.start() # Connect to the server as a client logger.info('Contacting server on %s:%s', ip, port) s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) s.connect((ip, port)) # Ask for a file requested_file = 'lorem.txt' logger.debug('sending filename: %r', requested_file) len_sent = s.send(requested_file.encode('utf-8')) # Receive a response buffer = BytesIO() decompressor = zlib.decompressobj() while True: response = s.recv(BLOCK_SIZE) if not response: break logger.debug('READ %r', binascii.hexlify(response)) # Include any unconsumed data when # feeding the decompressor. to_decompress = decompressor.unconsumed_tail + response while to_decompress: decompressed = decompressor.decompress(to_decompress) if decompressed: logger.debug('DECOMPRESSED %r', decompressed) buffer.write(decompressed) # Look for unconsumed data due to buffer overflow to_decompress = decompressor.unconsumed_tail else: logger.debug('BUFFERING') to_decompress = None # deal with data reamining inside the decompressor buffer remainder = decompressor.flush() if remainder: logger.debug('FLUSHED %r', remainder) buffer.write(remainder) full_response = buffer.getvalue() lorem = open('lorem.txt', 'rb').read() logger.debug('response matches file contents: %s', full_response == lorem) # Clean up s.close() server.socket.close()
我們人為的將這個代碼清單做了一些劃分,以展示緩沖行為,如果將資料傳遞到compress()或decompress(),但沒有得到完整的壓碩訓未壓縮輸出塊,此時便會進行緩沖,
客戶連接到套接字,并請求一個檔案,然后回圈,接收壓縮資料塊,由于一個塊可能未包含足夠多的資訊來完全解壓縮,所以之前接收的剩余資料將與新資料結合,并且傳遞到解壓縮器,解壓縮資料時,會把它追加到一個緩沖區,處理回圈結束時將與檔案內容進行比較,

轉載請註明出處,本文鏈接:https://www.uj5u.com/houduan/163271.html
標籤:Python
上一篇:Pycharm-漢化的方法
下一篇:Python_常見內置函式
