為什么這個C 代碼只比Python快一點？-有解無憂

我將一些 python 代碼轉換為 c 以期獲得一些性能優勢，但發現 c 實作只是稍微快一點。我正在轉換的代碼是來自 scipy 庫的 sos 過濾器。我的python測驗是；

a_wgt_coefs = [[0.2343006, 0.46860119, 0.2343006, 1., -0.22455845, 0.01260662],
               [1., -2., 1., 1., -1.89387049, 0.89515977, ],
               [1., -2., 1., 1., -1.99461446, 0.99462171]]

# Define input signal
fs = 48000
T = 1.0
N = int(fs * T)
t = np.linspace(0, T, N)

ip_signal = np.sin(2 * np.pi * 440 * t)

# Filter signal
num_runs = 1000


def main():
    durations = 0
    for n in range(num_runs):
        t_start = perf_counter()
        op_signal = signal.sosfilt(a_wgt_coefs, ip_signal)
        t_end = perf_counter()
        durations = durations   (t_end - t_start)
    avg_duration = durations / num_runs
    print(f'Average execution time = {avg_duration} seconds')


if __name__ == '__main__':
    main()

C 代碼是 scipy _sosfilt 函式的一行翻譯，我已經實作了；

inline void sosfilt_cls_4(float sos[3][6], float x[SAMPLE_RATE]) {
    float x_n = 0, x_c = 0;
    float zi[3][2] = { 0 };
    // iterate over every i sample section
    for (size_t i = 0; i < SAMPLE_RATE;   i)
    {
        x_c = x[i];
        // iterate over every j section sample
        for (size_t j = 0; j < 3;   j)
        {
            float* section = sos[j];
            float* zi_n = zi[j];
            x_n = section[0] * x_c   zi_n[0];
            zi_n[0] = section[1] * x_c - section[4] * x_n   zi_n[1];
            zi_n[1] = section[2] * x_c - section[5] * x_n;
            x_c = x_n;
        }
        x[i] = x_c;
    }
    return;
}

我使用 std::chrono 對此進行了基準測驗；

float input_array[SAMPLE_RATE];
float sum = 0;

float sos_fs_48k_array_flt[3][6] = {{0.2343006f,   0.46860119f,  0.2343006f,   1.f, -0.22455845f,  0.01260662f},
 { 1.f, -2.f,          1.f,          1.f, -1.89387049f,  0.89515977f,},
 { 1.f, -2.f,          1.f,          1.f, -1.99461446f,  0.99462171f} };

int main()
{
    auto lin = linspace(0.0, 1.0, double(samples));

    std::cout << "Testing\n\n";
    for (int x = 0; x < runs; x  ) {
        for (int x = 0; x < samples; x  ) {
            input_col_vector(x, 0) = sin(2 * M_PI * 440 * lin.coeff(x, 0));
            input_array[x] = sin(2 * M_PI * 440 * lin.coeff(x, 0));
        }
        auto start = std::chrono::steady_clock::now();
        sosfilt_cls_4(sos_fs_48k_array_flt, input_array);
        auto stop = std::chrono::steady_clock::now();
        auto elapsed = std::chrono::duration_cast<std::chrono::microseconds>(stop - start);
        sum  = elapsed.count();

    }
    std::cout << "Average time:\t" << (sum / runs)/ 1e6 << std::endl;
    std::cout << "End.\n";
    return 0;
}

python 代碼運行時間為 0.0002448 秒，平均超過 1000 次運行。相比之下，C 代碼平均運行 1000 次，運行時間為 0.0002336 秒。

我在 Visual Studio 中設定了我的編譯器選項，以優先考慮速度而不是空間，并設定了 Ox 標志。我也在使用 fp:fast 浮點模型和 AVX2 增強指令集。

Other things I've done to improve speed is allocating all input data in the stack, using floats instead of doubles but they haven't really made a difference.

Oddly enough, over a single run, the c code is way faster, running in 0.0002333 compared to python which takes 0.00045

Edit: I am now getting 0.00012 seconds

I put the function in the main.cpp file and started in a fresh project with default optimizations.;

template<typename T>
inline void sosfilt(T sos[3][6], T x[SAMPLE_RATE]) {
    T x_n = 0, x_c = 0;
    T zi[3][2] = { 0 };
    // iterate over every i sample section
    for (size_t i = 0; i < SAMPLE_RATE;   i)
    {
        x_c = x[i];
        // iterate over every j section sample
        for (size_t j = 0; j < 3;   j)
        {
            T* section = sos[j];
            T* zi_n = zi[j];
            x_n = section[0] * x_c   zi_n[0];
            zi_n[0] = section[1] * x_c - section[4] * x_n   zi_n[1];
            zi_n[1] = section[2] * x_c - section[5] * x_n;
            x_c = x_n;
        }
        x[i] = x_c;
    }
    return;
}

uj5u.com熱心網友回復：

您正在復制的功能在 Cython 中實作，請參閱https://github.com/scipy/scipy/blob/v1.7.1/scipy/signal/_sosfilt.pyx。

它也是專門針對這種float情況的。因此，如果仔細撰寫（我認為是這樣），它可能永遠不必回呼到 Python 中，并且實際上是純 C 代碼，編譯為 C，具有一些不同的語法，并且實際上等效于您的代碼。

uj5u.com熱心網友回復：

scipy/numpy 基本上呼叫了用 C/Fortran/C 撰寫的優化數字庫。

唯一的額外開銷是在 python 型別之間進行轉換，這對于 cpython api 來說非常快。

轉載請註明出處，本文鏈接：https://www.uj5u.com/qukuanlian/402505.html

標籤：

上一篇：可以報??告失敗原因的std::optional的替代方法

下一篇：如何迭代可變引數模板型別的數量