在我的專案中，我將詞向量用作維度為 300 的 numpy 陣列。我想將處理過的陣列存盤在 mongo 資料庫中，base64 編碼，因為這樣可以節省大量存盤空間。

Python代碼

import base64
import numpy as np

vector = np.zeros(300, dtype=np.float32) # represents some word-vector
vector = base64.b64encode(vector) # base64 encoding
# Saving vector to MongoDB...

在 MongoDB 中，它像這樣保存為二進制檔案。在 C 中，我想將此二進制資料作為 std::vector 加載。因此，我必須先解碼資料，然后才能正確加載。我能夠使用 mongocxx 將二進制資料匯入 c 程式，并將其作為 uint8_t* 大小為 1600 - 但現在我不知道該怎么做，如果有人能幫助我，我會很高興。謝謝（：

C 代碼

const bsoncxx::document::element elem_vectors = doc["vectors"];
const bsoncxx::types::b_binary vectors = elemVectors.get_binary();

const uint32_t b_size = vectors.size; // == 1600
const uint8_t* first = vectors.bytes;

// How To parse this as a std::vector<float> with a size of 300?

解決方案

我將這些行添加到我的 C 代碼中，并且能夠加載一個包含 300 個元素和所有正確值的向量。

    const std::string encoded(reinterpret_cast<const char*>(first), b_size);
    std::string decoded = decodeBase64(encoded);
    std::vector<float> vec(300);
    for (size_t i = 0; i < decoded.size() / sizeof(float);   i) {
        vec[i] = *(reinterpret_cast<const float*>(decoded.c_str()   i * sizeof(float)));
    }

提一下：感謝@Holt 的資訊，編碼 Numpy 陣列 base64 然后將其存盤為二進制是不明智的。在 numpy 陣列上呼叫“.to_bytes()”然后將其存盤在 MongoDB 中要好得多，因為它將檔案大小從 1.7kb (base64) 減少到 1.2kb (to_bytes())，然后節省了計算時間，因為編碼 (和解碼！）不必計算！

uj5u.com熱心網友回復：

感謝@Holt 指出我的錯誤。

首先，使用base64編碼不能節省存盤空間。相反，它會浪費您的存盤空間。對于具有 300 個浮點數的陣列，存盤空間僅為 300 * 4 = 1200 位元組。而在你編碼之后，存盤將是 1600 位元組！在此處查看有關 base64 的更多資訊。

其次，您想將位元組決議為vector<float>. 如果您仍然使用 base64 編碼，則需要對位元組進行解碼。我建議你使用一些第三方庫或試試這個問題。假設您已經擁有解碼功能。

std::string base64_decode(std::string const& encoded_string); // or something like that.

您需要使用reinterpret_cast來獲取值。

const std::string encoded(first, b_size);
std::string decoded = base64_decode(encoded);
std::vector<float> vec(300);
for (size_t i = 0; i < decode.size() / sizeof(float);   i) {
    vec[i] = *(reinterpret_cast<const double*>(decoded.c_str())   i);
}

轉載請註明出處，本文鏈接：https://www.uj5u.com/qukuanlian/388287.html

標籤：Python C MongoDB C 11 numpy-ndarray

上一篇：如何從Firestore檢索多個用戶名并將用戶名分配給地圖上的正確標記

下一篇：在C 中創建檔案時如何傳入檔案名？

如何解碼（來自base64）pythonnp-array并將其重新加載到C 中作為浮點數向量？

解決方案