如何壓縮資料-有解無憂

我正在嘗試壓縮資料以提高空間復雜度，但我不確定我是否錯誤地壓縮資料或錯誤地測量了大小。

我在操場上嘗試了以下內容。

import Foundation
import Compression

// Example data
struct MyData: Encodable {
    let property = "Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum."
}

// I tried using MemoryLayout to measure the size of the uncompressed data
let size = MemoryLayout<MyData>.size
print("myData type size", size) // 16

let myData = MyData()
let myDataSize = MemoryLayout.size(ofValue: myData)
print("myData instance size", myDataSize) // 16

func run() {
    // 1. This shows the size of the encoded data
    guard let encoded = try? JSONEncoder().encode(myData) else { return }
    print("myData encoded size", encoded) // 589 bytes

    /// 2. This shows the size after using a first compression method
    guard let compressed = try? (encoded as NSData).compressed(using: .lzfse) else { return }
    let firstCompression = Data(compressed)
    print("firstCompression", firstCompression) // 491 bytes

    /// 3. Second compression method (just wanted to try a different compression method)
    let secondCompression = compress(encoded)
    print("secondCompression", secondCompression) // 491 bytes
    
    /// 4. Wanted to compare the size difference between compressed and uncompressed for a bigger data so here is the array of uncompressed data.
    var myDataArray = [MyData]()
    for _ in 0 ... 100 {
        myDataArray.append(MyData())
    }
    guard let encodedArray = try? JSONEncoder().encode(myDataArray) else { return }
    print("myData encodedArray size", encodedArray) // 59591 bytes
    print("memory layout", MemoryLayout.size(ofValue: encodedArray)) // 16
    
    /// 5. Compressed array
    var compressedArray = [Data]()
    for _ in 0 ... 100 {
        guard let compressed = try? (encoded as NSData).compressed(using: .lzfse) else { return }
        let data = Data(compressed)
        compressedArray.append(data)
    }
    guard let encodedCompressedArray = try? JSONEncoder().encode(compressedArray) else { return }
    print("myData compressed array size", encodedCompressedArray) // 66661 bytes
    print("memory layout", MemoryLayout.size(ofValue: encodedCompressedArray)) // 16

    /// 6. Compression using lzma
    var differentCompressionArray = [Data]()
    for _ in 0 ... 100 {
        guard let compressed = try? (encoded as NSData).compressed(using: .lzma) else { return }
        let data = Data(compressed)
        differentCompressionArray.append(data)
    }
    guard let encodedCompressedArray2 = try? JSONEncoder().encode(differentCompressionArray) else { return }
    print("myData compressed array size", encodedCompressedArray2) // 60702 bytes
    print("memory layout", MemoryLayout.size(ofValue: encodedCompressedArray2)) // 16
}

run()

// The implementation for the second compression method
func compress(_ sourceData: Data) -> Data {
    let pageSize = 128
    var compressedData = Data()
    
    do {
        let outputFilter = try OutputFilter(.compress, using: .lzfse) { (data: Data?) -> Void in
            if let data = data {
                compressedData.append(data)
            }
        }
        
        var index = 0
        let bufferSize = sourceData.count
        
        while true {
            let rangeLength = min(pageSize, bufferSize - index)
            
            let subdata = sourceData.subdata(in: index ..< index   rangeLength)
            index  = rangeLength
            
            try outputFilter.write(subdata)
            
            if (rangeLength == 0) {
                break
            }
        }
    }catch {
        fatalError("Error occurred during encoding: \(error.localizedDescription).")
    }
    
    return compressedData
}

該MemoryLayout物件似乎對測量編碼陣列的大小沒有幫助，無論它們是否被壓縮。我不知道如何測量一個結構或一個支柱陣列，而不用編碼它們JSONEncoder已經壓縮了資料。

(#1、#2 和 #3)的單個實體的壓縮前/壓縮后MyData似乎表明資料被正確壓縮，從 589 位元組壓縮到 491 位元組。但是，未壓縮資料陣列和壓縮資料陣列（#4、#5）之間的比較似乎表明，壓縮后大小從 59591 增加到 66661 。

最后，我嘗試使用不同的壓縮演算法lzma（#6）。它將大小減小到 60702，低于之前的壓縮，但仍不小于未壓縮的資料。

uj5u.com熱心網友回復：

首先要弄清一些混亂：MemoryLayout在編譯時為您提供有關型別布局的大小和結構的資訊，但不能用于確定Array值在運行時需要的存盤量，因為大小Array 結構本身的大小不取決于它包含多少資料。

高度簡化后，Array值的布局如下所示：

┌─────────────────────┐                        
│        Array        │                        
├──────────┬──────────┤    ┌──────────────────┐
│  length  │  buffer ─┼───?│     storage      │
└──────────┴──────────┘    └──────────────────┘
  1 word /   1 word /                          
  8 bytes    8 bytes                           
 └─────────┬─────────┘
           └─? MemoryLayout<Array<UInt8>>.size

一個Array值存盤它的長度，或者count（與一些標志混合，但我們不需要擔心）和一個指向存盤它包含的專案的實際空間的指標。這些專案不作為Array值本身的一部分存盤，而是單獨存盤在Array 指向的分配記憶體中。無論Array“包含” 10 個值還是 100000 個值，Array結構的大小都保持不變：長度為 1 個字（或 64 位系統上的 8 個位元組），指向實際底層存盤的指標為 1 個字。（然而，存盤緩沖區的大小完全取決于它在運行時能夠包含的元素數量。）

在實踐中，Array由于橋接和其他原因，比這要復雜得多，但這是基本要點；MemoryLayout.size(ofValue:)這就是為什么您每次只能看到回傳相同的數字。[順便說一句，由于類似的原因，大小String相同，這就是為什么還要報告。]ArrayMemoryLayout<MyData>.size16

為了知道一個Array或一個Data有效占用了多少位元組，向它們詢問它們的.count:Array<UInt8>和Data都是UInt8值（位元組）的集合就足夠了，它們.count將反映有效存盤在其底層存盤中的資料量。

至于步驟 (4) 和 (5) 之間的大小增加，請注意

第 4 步獲取 100 個您的副本MyData并將它們連接在一起，然后再將它們轉換為 JSON，而
第 5 步獲取 100 個單獨壓縮 MyData實體的副本，將它們連接在一起，然后將它們重新轉換為 JSON

與第 4 步相比，第 5 步有一些問題：

壓縮從資料中的重復中受益匪淺：壓縮和重復 100 次的資料不會像重復 100 次然后壓縮的資料那樣緊湊，因為每一輪壓縮都不能從知道還有另一輪壓縮中受益之前的資料的副本。舉個簡單的例子：
- 假設我們想使用一種運行長度編碼的形式來壓縮字串Hello：我們無能為力，除了可以把它變成Hel{2}o（其中{2}表示最后一個字符2時間的重復）
- 如果我們壓縮Hello并加入它 3 次，我們得到可能得到Hel{2}oHel{2}oHel{2}o，
- 但是如果我們先加入Hello3 次然后壓縮，我們可以得到{Hel{2}o}{3}，它更緊湊
壓縮通常還需要插入一些關于如何壓縮資料的資訊，以便以后能夠識別和解壓縮資料。通過壓縮MyData100 次并加入所有這些實體，您將重復該元資料 100 次
即使在壓縮MyData實體之后，將它們重新表示為 JSON也會降低它們的壓縮程度，因為它不能準確地表示二進制資料。相反，它必須將每個Datablob 轉換為Base64 編碼的字串，這會導致它再次增長

在這些問題之間，您的資料不斷增長也就不足為奇了。您真正想要的是對第 4 步的修改，即壓縮連接的資料：

guard let encodedArray = try? JSONEncoder().encode(myDataArray) else { fatalError() }
guard let compressedEncodedArray = try? (encodedArray as NSData).compressed(using: .lzma) else { fatalError() }
print(compressedEncodedArray.count) // => 520

這明顯優于

guard let encodedCompressedArray = try? JSONEncoder().encode(compressedArray) else { fatalError() }
print(encodedCompressedArray.count) // => 66661

As an aside: it seems unlikely that you're actually using JSONEncoder in practice to join data in this way, and this was just for measurement here — but if you actually are, consider other mechanisms for doing this. Converting binary data to JSON in this way is very inefficient storage-wise, and with a bit more information about what you might actually need in practice, we might be able to recommend a more effective way to do this.

If what you're actually doing in practice is encoding an Encodable object tree and then compressing that the one time, that's totally fine.

轉載請註明出處，本文鏈接：https://www.uj5u.com/shujuku/429151.html

標籤：IOS 迅速压缩内存布局

上一篇：無法在視圖中呼叫結構

下一篇：CoreData物體模型