1. zlib GNUzlib壓縮

zlib模塊為GNU專案zlib壓縮庫中的很多函式提供了底層介面，

1.1 處理記憶體中的資料

使用zlib最簡單的方法要求把所有將要壓碩訓解壓縮的資料存放在記憶體中，

import zlib
import binascii

original_data = b'This is the original text.'
print('Original     :', len(original_data), original_data)

compressed = zlib.compress(original_data)
print('Compressed   :', len(compressed),
      binascii.hexlify(compressed))

decompressed = zlib.decompress(compressed)
print('Decompressed :', len(decompressed), decompressed)

compress()和decompress()函式都取一個位元組序列引數，并且回傳一個位元組序列，

從前面的例子可以看到，少量資料的壓縮版本可能比未壓縮的版本還要大，具體的結果取決于輸入資料，不過觀察小資料集的壓縮開銷很有意思，

import zlib

original_data = b'This is the original text.'

template = '{:>15}  {:>15}'
print(template.format('len(data)', 'len(compressed)'))
print(template.format('-' * 15, '-' * 15))

for i in range(5):
    data = original_data * i
    compressed = zlib.compress(data)
    highlight = '*' if len(data) < len(compressed) else ''
    print(template.format(len(data), len(compressed)), highlight)

輸出中的*突出顯示了哪些行的壓縮資料比未壓縮版本占用的記憶體更多，

zlib支持不同的壓縮級別，允許在計算成本和空間縮減量之間有所平衡，默認壓縮級別zlib.Z_DEFAULT_COMPRESSION為-1，這對應一個硬編碼值，表示性能和壓縮結果之間的一個折中，當前這對應級別6，

import zlib

input_data = b'Some repeated text.\n' * 1024
template = '{:>5}  {:>5}'

print(template.format('Level', 'Size'))
print(template.format('-----', '----'))

for i in range(0, 10):
    data = zlib.compress(input_data, i)
    print(template.format(i, len(data)))

壓縮級別為0意味著根本沒有壓縮，級別9要求的計算最多，同時會生成最小的輸出，如下面的例子，對于一個給定的輸入，可以多個壓縮級別得到的空間縮減量是一樣的，

1.2 增量壓縮與解壓縮

這種記憶體中的壓縮方法有一些缺點，主要是系統需要有足夠的記憶體，可以在記憶體中同時駐留未壓縮和壓縮版本，因此這種方法對于真實世界的用例并不實用，另一種方法是使用Compress和Decompress物件以增量方式處理資料，這樣就不需要將整個資料集都放在記憶體中，

import zlibimport binascii

compressor = zlib.compressobj(1)

with open('lorem.txt','rb') as input:
    while True:
        block = input.read(64)
        if not block:
            break
        compressed = compressor.compress(block)
        if compressed:
            print('Compressed: {}'.format(
                binascii.hexlify(compressed)))
        else:
            print('buffering...')
    remaining = compressor.flush()
    print('Flushed: {}'.format(binascii.hexlify(remaining)))

這個例子從一個純文本檔案讀取小資料塊，并把這個資料集傳至compress()，壓縮器維護壓縮資料的一個記憶體緩沖區，由于壓縮演算法依賴于校驗和以及最小塊大小，所以壓縮器每次接收更多輸入時可能并沒有準備好回傳資料，如果它沒有準備好一個完整的壓縮塊，那便會回傳一個空位元組串，當所有

1.3 混合內容流

在壓縮和未壓縮資料混合在一起的情況下，還可以使用decompressobj()回傳的Decompress類，

import zlib

lorem = open('lorem.txt','rb').read()
compressed = zlib.compress(lorem)
combined = compressed +lorem

decompressor = zlib.decompressobj()
decompressed = decompressor.decompress(combined)

decompressed_matches = decompressed == lorem
print('Decompressed matches lorem:',decompressed_matches)

unused_matches = decompressor.unused_data =https://www.cnblogs.com/liuhui0308/p/= lorem
print('Unused data matches lorem:',unused_matches)

解壓縮所有資料后，unused_data屬性會包含未用的所有資料，

1.4 校驗和

除了壓縮和解壓縮函式，zlib還包括兩個用于計算資料的校驗和的函式，分別是adler32()和crc32()，這兩個函式計算出的校驗和都不能認為是密碼安全的，它們只用于資料完整性驗證，

import zlib

data = open('lorem.txt','rb').read()

cksum = zlib.adler32(data)
print('Adler32: {:12d}'.format(cksum))
print('       : {:12d}'.format(zlib.adler32(data,cksum)))

cksum = zlib.crc32(data)
print('CRC-32: {:12d}'.format(cksum))
print('       : {:12d}'.format(zlib.crc32(data,cksum)))

這兩個函式取相同的引數，包括一個包含資料的位元組串和一個可選值，這個值可作為校驗和的起點，這些函式會回傳一個32位有符號整數值，這個值可以作為一個新的起點引數再傳回給后續的呼叫，以生成一個動態變化的校驗和，

1.5 壓縮網路資料

下一個代碼清單中的服務器使用流壓縮器來回應檔案名請求，它將檔案的一個壓縮版本寫至與客戶通信的套接字中，

import zlib
import logging
import socketserver
import binascii

BLOCK_SIZE = 64


class ZlibRequestHandler(socketserver.BaseRequestHandler):

    logger = logging.getLogger('Server')

    def handle(self):
        compressor = zlib.compressobj(1)

        # Find out what file the client wants
        filename = self.request.recv(1024).decode('utf-8')
        self.logger.debug('client asked for: %r', filename)

        # Send chunks of the file as they are compressed
        with open(filename, 'rb') as input:
            while True:
                block = input.read(BLOCK_SIZE)
                if not block:
                    break
                self.logger.debug('RAW %r', block)
                compressed = compressor.compress(block)
                if compressed:
                    self.logger.debug(
                        'SENDING %r',
                        binascii.hexlify(compressed))
                    self.request.send(compressed)
                else:
                    self.logger.debug('BUFFERING')

        # Send any data being buffered by the compressor
        remaining = compressor.flush()
        while remaining:
            to_send = remaining[:BLOCK_SIZE]
            remaining = remaining[BLOCK_SIZE:]
            self.logger.debug('FLUSHING %r',
                              binascii.hexlify(to_send))
            self.request.send(to_send)
        return


if __name__ == '__main__':
    import socket
    import threading
    from io import BytesIO

    logging.basicConfig(
        level=logging.DEBUG,
        format='%(name)s: %(message)s',
    )
    logger = logging.getLogger('Client')

    # Set up a server, running in a separate thread
    address = ('localhost', 0)  # let the kernel assign a port
    server = socketserver.TCPServer(address, ZlibRequestHandler)
    ip, port = server.server_address  # what port was assigned?

    t = threading.Thread(target=server.serve_forever)
    t.setDaemon(True)
    t.start()

    # Connect to the server as a client
    logger.info('Contacting server on %s:%s', ip, port)
    s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    s.connect((ip, port))

    # Ask for a file
    requested_file = 'lorem.txt'
    logger.debug('sending filename: %r', requested_file)
    len_sent = s.send(requested_file.encode('utf-8'))

    # Receive a response
    buffer = BytesIO()
    decompressor = zlib.decompressobj()
    while True:
        response = s.recv(BLOCK_SIZE)
        if not response:
            break
        logger.debug('READ %r', binascii.hexlify(response))

        # Include any unconsumed data when
        # feeding the decompressor.
        to_decompress = decompressor.unconsumed_tail + response
        while to_decompress:
            decompressed = decompressor.decompress(to_decompress)
            if decompressed:
                logger.debug('DECOMPRESSED %r', decompressed)
                buffer.write(decompressed)
                # Look for unconsumed data due to buffer overflow
                to_decompress = decompressor.unconsumed_tail
            else:
                logger.debug('BUFFERING')
                to_decompress = None

    # deal with data reamining inside the decompressor buffer
    remainder = decompressor.flush()
    if remainder:
        logger.debug('FLUSHED %r', remainder)
        buffer.write(remainder)

    full_response = buffer.getvalue()
    lorem = open('lorem.txt', 'rb').read()
    logger.debug('response matches file contents: %s',
                 full_response == lorem)

    # Clean up
    s.close()
    server.socket.close()

我們人為的將這個代碼清單做了一些劃分，以展示緩沖行為，如果將資料傳遞到compress()或decompress()，但沒有得到完整的壓碩訓未壓縮輸出塊，此時便會進行緩沖，

客戶連接到套接字，并請求一個檔案，然后回圈，接收壓縮資料塊，由于一個塊可能未包含足夠多的資訊來完全解壓縮，所以之前接收的剩余資料將與新資料結合，并且傳遞到解壓縮器，解壓縮資料時，會把它追加到一個緩沖區，處理回圈結束時將與檔案內容進行比較，

轉載請註明出處，本文鏈接：https://www.uj5u.com/houduan/163271.html

標籤：Python

上一篇：Pycharm-漢化的方法