C ：成對訪問值是否比訪問陣列元素更有效？-有解無憂

假設我們有雙打的陣列x和索引陣列y，我們希望這些指數由相應的值進行排序x（這樣，排序[i in y]的x[i]關鍵）。

然后我們可以創建一個對陣列，其中一個組件是鍵值，一個是索引，例如執行以下操作：

boost::sort::spreadsort::float_sort(sortdata, sortdata   n,
    [](const std::pair<double, int> &a, const unsigned offset) -> boost::int64_t {
        return boost::sort::spreadsort::float_mem_cast<double, boost::int64_t>(a.first) >> offset;
    },
    [](const std::pair<double, int> &a, const std::pair<double, int> &b) -> bool {
        return a.first < b.first;
    });

這需要相當多的記憶體，因此我們可以省略創建這個對陣列并直接使用如下資料：

boost::sort::spreadsort::float_sort(y, y   n,
    [x](const int a, const unsigned offset) -> boost::int64_t {
        return boost::sort::spreadsort::float_mem_cast<double, boost::int64_t>(x[a]) >> offset;
    },
    [x](const int a, const int b) -> bool {
        return x[a] < x[b];
    });

現在，當在非常大的資料（比如說 50000000 個條目）上使用它時，第二種方法花費的時間是第一種方法的兩倍多。據我所知， astd::pair只是一個struct. 那么，陣列訪問是否真的比結構訪問效率低得多？或者，我在這里做錯了什么？

這是一個完整的比較示例：

#include <cstdlib>
#include <ctime>
#include <boost/sort/spreadsort/spreadsort.hpp>
#include <iostream>

int main() {
    int n = 50000000;
    double *x = new double[n];
    int *y = new int[n];
    std::pair<double, int> *sortdata = new std::pair<double, int> [n];
    for (int i=0; i < n; i  ) {
        x[i] = ((double) std::rand()) / ((double) std::rand());
        sortdata[i].first = x[i];
        y[i] = i;
        sortdata[i].second = i;
    }

    std::time_t t = std::time(0);
    boost::sort::spreadsort::float_sort(sortdata, sortdata   n,
        [](const std::pair<double, int> &a, const unsigned offset) -> boost::int64_t {
            return boost::sort::spreadsort::float_mem_cast<double, boost::int64_t>(a.first) >> offset;
        },
        [](const std::pair<double, int> &a, const std::pair<double, int> &b) -> bool {
            return a.first < b.first;
        });

    std::cout << std::time(0)-t << "\n";

    t = std::time(0);
    boost::sort::spreadsort::float_sort(y, y   n,
        [x](const int a, const unsigned offset) -> boost::int64_t {
            return boost::sort::spreadsort::float_mem_cast<double, boost::int64_t>(x[a]) >> offset;
        },
        [x](const int a, const int b) -> bool {
            return x[a] < x[b];
        });

    std::cout << std::time(0)-t << "\n";
}

g -O2在我的系統上運行它會給出第一個 3 秒和最后一個 9 秒。

uj5u.com熱心網友回復：

不是專家，但這是我認為正在發生的事情：

無論使用哪種編程語言，在記憶體中跳轉都是一項代價高昂的操作。這與 CPU 的快取架構有關。

成對的資料交錯存盤在記憶體中。沒有類邊界或類似的東西。它看起來像這樣：

value0 index0 value1 index1 value2 index2 ...

現在，當您將資料存盤在兩個陣列中時，您的資料將如下所示：

value0 value1 value2 .........  index0 index1 index2 .....

好的，現在讓我們看看當排序演算法試圖確定索引 46 處的資料應該放在索引 47 處的資料之前還是之后會發生什么：

在對的情況下：

1. ask ram for value46 (expensive!)
   because of how cachelines work ~ 64bytes of data will be pulled. 
   that is: value46 index46 value47 index47 are pulled, maybe a bit more.
2. ask ram for value47 (cheap, it's already cached)

在兩個陣列的情況下：

1. ask ram for index46 (expensive)
   this will pull index46,index47,...
2. ask ram for index47 (cheap)
3. ask ram for x[index46] (expensive)
   this will pull x[index46],x[index46 1...]
4. ask ram for x[index47] (should be next to x[index46]? not sure... could be cheap, could be expensive)

好吧，我不確定最后一次記憶體獲取，但關鍵是你在記憶體中跳躍的次數大約是兩到三倍，我很確定這就是為什么你測量的時間大約是兩倍運行。

下面是一個可以理解這一點的最后一個例子：

int N = 100000; 
float data[N] = {... random data...};
int idx_1[N] = {0,1,....};
int idx_2[N] = {... random permutation of 0,...N-1};

// this will be very fast. 
// the cache predictor always gets it right. 
float sum_1 = 0; 
for(int i = 0; i < N; i  ) sum_1  = data[idx_1[N]];

// this will be very slow. 
// the cache predictor always gets it wrong. 
float sum_2 = 0; 
for(int i = 0; i < N; i  ) sum_2  = data[idx_2[N]];

轉載請註明出處，本文鏈接：https://www.uj5u.com/caozuo/405953.html

標籤：

上一篇：如何通過id值選擇陣列？

下一篇：PHP將兩個陣列合二為一，公共ID為文本