行內函式回傳嵌套陣列值未按預期執行-有解無憂

我想行內函式MyClass:at()，但性能不如我預期。

`MyClass.cpp`

#include <algorithm>
#include <chrono>
#include <iostream>
#include <vector>
#include <string>

// Making this a lot shorter than in my actual program
std::vector<std::vector<int>> arrarr = 
{
    { 1, 70, 54, 71, 83, 51, 54, 69, 16, 92, 33, 48, 61, 43, 52,  1, 89, 19, 67, 48},
    {24, 47, 32, 60, 99,  3, 45,  2, 44, 75, 33, 53, 78, 36, 84, 20, 35, 17, 12, 50},
    {32, 98, 81, 28, 64, 23, 67, 10, 26, 38, 40, 67, 59, 54, 70, 66, 18, 38, 64, 70},
    {67, 26, 20, 68,  2, 62, 12, 20, 95, 63, 94, 39, 63,  8, 40, 91, 66, 49, 94, 21},
    {24, 55, 58,  5, 66, 73, 99, 26, 97, 17, 78, 78, 96, 83, 14, 88, 34, 89, 63, 72},
    {21, 36, 23,  9, 75,  0, 76, 44, 20, 45, 35, 14,  0, 61, 33, 97, 34, 31, 33, 95},
    {78, 17, 53, 28, 22, 75, 31, 67, 15, 94,  3, 80,  4, 62, 16, 14,  9, 53, 56, 92},
    {16, 39,  5, 42, 96, 35, 31, 47, 55, 58, 88, 24,  0, 17, 54, 24, 36, 29, 85, 57},
    {86, 56,  0, 48, 35, 71, 89,  7,  5, 44, 44, 37, 44, 60, 21, 58, 51, 54, 17, 58},
    {19, 80, 81, 68,  5, 94, 47, 69, 28, 73, 92, 13, 86, 52, 17, 77,  4, 89, 55, 40},
    { 4, 52,  8, 83, 97, 35, 99, 16,  7, 97, 57, 32, 16, 26, 26, 79, 33, 27, 98, 66},
    {88, 36, 68, 87, 57, 62, 20, 72,  3, 46, 33, 67, 46, 55, 12, 32, 63, 93, 53, 69},
    { 4, 42, 16, 73, 38, 25, 39, 11, 24, 94, 72, 18,  8, 46, 29, 32, 40, 62, 76, 36},
    {20, 69, 36, 41, 72, 30, 23, 88, 34, 62, 99, 69, 82, 67, 59, 85, 74,  4, 36, 16},
    {20, 73, 35, 29, 78, 31, 90,  1, 74, 31, 49, 71, 48, 86, 81, 16, 23, 57,  5, 54},
    { 1, 70, 54, 71, 83, 51, 54, 69, 16, 92, 33, 48, 61, 43, 52,  1, 89, 19, 67, 48},
};

class MyClass
{
public:
    MyClass(std::vector<std::vector<int>> arr) : arr(arr)
    {
        rows = arr.size();
        cols = arr.at(0).size();
    }
    inline auto at(int row, int col) const { return arr[row][col]; }
    void arithmetic(int n) const;
private:
    std::vector<std::vector<int>> arr;
    int rows;
    int cols;
};

`MyClass.cpp`：

void MyClass::arithmetic(int n) const
{
    using std::chrono::high_resolution_clock;
    using std::chrono::duration_cast;
    using std::chrono::duration;
    using std::chrono::milliseconds;

    auto t1 = high_resolution_clock::now();
    int highest_product = 0;
    for (auto y = 0; y < rows;   y)
    {
        for (auto x = 0; x < cols;   x)
        {
            // Horizontal product
            if (x   n < cols)
            {
                auto product = 1;
                for (auto i = 0; i < n;   i)
                {
                    product *= at(y, x   i);
                }
                highest_product = std::max(highest_product, product);
            }
        }
    }
    auto t2 = high_resolution_clock::now();
    duration<double, std::milli> ms_double = t2 - t1;
    std::cout << ms_double.count() << "ms\n";

    return highestProduct;
};

現在我想知道的是為什么當我替換為時我會獲得更好product *= at(y, x i); 的性能product *= arr[y][x i];？當我用第一種情況測驗它時，我的大陣列上的時間大約需要6.7ms，第二種情況需要5.3ms. 我想當我行內函式時，它應該與第二種情況相同。

uj5u.com熱心網友回復：

直接在類定義中定義的成員函式（通常在頭檔案中）是隱式行內的，因此inline在這種情況下使用是無用的。inline不保證函式是行內的。這只是對編譯器的提示。關鍵字在鏈接程序中也很重要，以避免多定義問題。inline如果編譯器可以看到目標函式的代碼（即它在同一個翻譯單元中或應用了鏈接時間優化），則仍然可以行內不是 make的函式。有關這方面的更多資訊，請閱讀為什么類成員函式是行內的？

請注意，行內通常在編譯器的優化步驟中執行（例如-O1/ /O1）。因此，如果沒有優化，大多數編譯器將不會行內該函式。

使用std::vector<std::vector<int>>效率不高，因為它不是一個連續的資料結構，并且它需要 2 次間接來訪問一個專案。兩個相鄰的子向量可以存盤在很遠的記憶體中，這可能會導致更多的快取未命中（和/或由于對齊而導致的抖動）。請考慮使用一個大的扁平陣列并使用y*cols xwherecols是子向量的大小（此處為 20）訪問專案。或者int[16][20]，如果大小在編譯時固定，則資料型別應該可以很好地完成作業。

MyClass(std::vector<std::vector<int>> arr)導致輸入引數被復制（以及所有子向量）。請考慮使用const std::vector<std::vector<int>>&型別。

雖然at在運行時檢查邊界很方便，但此功能會大大降低性能。[]如果您不需要，請考慮使用運算子。您可以將斷言與展平陣列結合使用，以便在發布中獲得快速代碼并在除錯中獲得安全代碼（您可以通過定義NDEBUG宏來啟用/禁用它們）。

轉載請註明出處，本文鏈接：https://www.uj5u.com/yidong/512404.html

標籤：C 优化

上一篇：Fedora：'GLFW/glfw3.h：沒有這樣的檔案或目錄'

下一篇：在運行時在回圈中生成時復制C 參考

行內函式回傳嵌套陣列值未按預期執行

MyClass.cpp

MyClass.cpp：

`MyClass.cpp`

`MyClass.cpp`：