在將大量檔案資料寫入XML時減少記憶體（RAM）消耗的有效方法-有解無憂

我必須將 7 個串列寫入 XML 檔案，每個串列的大小為 1 GB 到 5 GB。

預期的 XML 檔案如下：

<doc>
    <items1>
        <itemA>..</itemA>
        ..
    </items1>

    <items2>
        <itemB>..</itemB>
        ..
    </items2>

    <items3>
        <itemC>..</itemC>
        ..
    </items3>
    .
    .
    .
    <items7>
        <itemG>..</itemG>
        ..
    </items7>
</doc>

Java物件是這樣的：

List<ItemA> items1 = new List<>(); // 1GB-5GB
List<ItemB> items2 = new List<>(); // 1GB-5GB
List<ItemC> items3 = new List<>(); // 1GB-5GB
List<ItemD> items4 = new List<>(); // 1GB-5GB
List<ItemE> items5 = new List<>(); // 1GB-5GB
List<ItemF> items6 = new List<>(); // 1GB-5GB
List<ItemG> items7 = new List<>(); // 1GB-5GB

將所有串列包裝成一個物件（目錄）到一個 Java 物件中，并且一次編組會消耗大量記憶體，而且每次當這個串列大小增加時，我們都必須擴展我們的基礎設施。下面是代碼：

JAXBContext.newInstance("ta").createMarshaller().marshal(new ObjectFactory().createCatalogue(catalogue), new FileOutputStream(fileName));

這里的目錄是一個包含所有七個串列的 Java 物件。

有沒有什么聰明的方法可以通過分塊寫入資料來減少記憶體消耗。我為此探索了stax，但我找不到寫入資料串列的方法。

Java 中是否有任何方法可以有效地將多達 20 GB 的資料寫入 XML，而無需在基礎設施上擴展 RAM？

我們要單獨寫入每個串列，并且在寫入下一個串列時不應將先前寫入的檔案加載到堆中。

uj5u.com熱心網友回復：

什么是塊（維基百科）：

塊是資訊的片段。

分塊（維基百科）：

分塊是指通過使用特定情況的特殊知識來聚合相關的記憶體分配請求來提高性能的策略。例如，如果知道通常需要以 8 個為一組的某種物件，而不是單獨分配和釋放每個物件，向堆管理器發出 16 次呼叫，則可以分配和釋放包含 8 個物件的陣列。物件，將呼叫次數減少到兩次。

您正在尋找的是一種分塊演算法，它可以讓您將無法位于 RAM 上的大量資料分解為更小的塊。我建議您使用FastCDC演算法作為分塊演算法。

分塊的概念是基于重復切割小于要處理的總資料并且可以在硬體/軟體限制方面使用的資料的動態或固定部分（稱為“塊”）。

示例（偽）：

MIN_CHUNK_SIZE = 20KB (20,000 Bytes) // The minimum number of data to be processed
MAX_CHUNK_SIZE = 2MB (2,000,000 Bytes) // The maximum number of data to be processed

byte[] buffer = new byte[MAX_CHUNK_SIZE];

// Reading bytes from the database into the buffer
while (buffer.size < MAX_CHUNK_SIZE):
     buffer.readByteFrom(database);

// Do some processing (Compress, Encode, Encrypt etc)

// What to do after processing - Examples:

// Transmitting over the network example:
networkSocket.send(buffer, // Some ID used to append the current chunk to the other chunks that are supposed to make up the whole);

// Storing it in an XML
xmldocument.append(buffer);

你說的是減少你的程式造成的RAM消耗，很簡單，不要把所有的資料都存盤在你的RAM中！你需要處理嗎？使用塊，您是否需要通過網路傳輸資料？使用塊！不要將 7TB 的資料加載到您可憐的微小作業記憶體中。

我將向您推薦以下我找到的好文章以供進一步閱讀：

https://towardsdatascience.com/what-to-do-when-your-data-is-too-big-for-your-memory-65c84c600585

uj5u.com熱心網友回復：

使用 StAX 可能是最好的方法，不僅因為您不必將整個 XML 檔案保存在記憶體中，還因為您也不必將所有專案保存在記憶體中。不知道您在哪里尋找使用 StAX 撰寫的內容，但我在The Java EE 5 Tutorial 中找到了以下內容：

以下示例取自 StAX 規范，展示了如何實體化輸出工廠、創建撰寫器和撰寫 XML 輸出：

XMLOutputFactory output = XMLOutputFactory.newInstance();
XMLStreamWriter writer = output.createXMLStreamWriter( ... );
writer.writeStartDocument(); 
writer.setPrefix("c","http://c");
writer.setDefaultNamespace("http://c");
writer.writeStartElement("http://c","a");
writer.writeAttribute("b","blah");
writer.writeNamespace("c","http://c");
writer.writeDefaultNamespace("http://c");
writer.setPrefix("d","http://c");
writer.writeEmptyElement("http://c","d");
writer.writeAttribute("http://c","chris","fry");
writer.writeNamespace("d","http://c"); 
writer.writeCharacters("Jean Arp"); 
writer.writeEndElement(); 
writer.flush();

此代碼生成以下 XML（新行是非規范的）：

<?xml version=’1.0’ encoding=’utf-8’?> 
<a b="blah" xmlns:c="http://c" xmlns="http://c">
  <d:d d:chris="fry" xmlns:d="http://c"/>
  Jean Arp
</a>

轉載請註明出處，本文鏈接：https://www.uj5u.com/qita/321713.html

標籤：爪哇 xml serialization marshalling stax

上一篇：AndroidStudio不顯示布局預覽（也不會發生錯誤）

下一篇：使用ElementTree從XML獲取值