我將一些 python 代碼轉換為 c 以期獲得一些性能優勢,但發現 c 實作只是稍微快一點。我正在轉換的代碼是來自 scipy 庫的 sos 過濾器。我的python測驗是;
a_wgt_coefs = [[0.2343006, 0.46860119, 0.2343006, 1., -0.22455845, 0.01260662],
[1., -2., 1., 1., -1.89387049, 0.89515977, ],
[1., -2., 1., 1., -1.99461446, 0.99462171]]
# Define input signal
fs = 48000
T = 1.0
N = int(fs * T)
t = np.linspace(0, T, N)
ip_signal = np.sin(2 * np.pi * 440 * t)
# Filter signal
num_runs = 1000
def main():
durations = 0
for n in range(num_runs):
t_start = perf_counter()
op_signal = signal.sosfilt(a_wgt_coefs, ip_signal)
t_end = perf_counter()
durations = durations (t_end - t_start)
avg_duration = durations / num_runs
print(f'Average execution time = {avg_duration} seconds')
if __name__ == '__main__':
main()
C 代碼是 scipy _sosfilt 函式的一行翻譯,我已經實作了;
inline void sosfilt_cls_4(float sos[3][6], float x[SAMPLE_RATE]) {
float x_n = 0, x_c = 0;
float zi[3][2] = { 0 };
// iterate over every i sample section
for (size_t i = 0; i < SAMPLE_RATE; i)
{
x_c = x[i];
// iterate over every j section sample
for (size_t j = 0; j < 3; j)
{
float* section = sos[j];
float* zi_n = zi[j];
x_n = section[0] * x_c zi_n[0];
zi_n[0] = section[1] * x_c - section[4] * x_n zi_n[1];
zi_n[1] = section[2] * x_c - section[5] * x_n;
x_c = x_n;
}
x[i] = x_c;
}
return;
}
我使用 std::chrono 對此進行了基準測驗;
float input_array[SAMPLE_RATE];
float sum = 0;
float sos_fs_48k_array_flt[3][6] = {{0.2343006f, 0.46860119f, 0.2343006f, 1.f, -0.22455845f, 0.01260662f},
{ 1.f, -2.f, 1.f, 1.f, -1.89387049f, 0.89515977f,},
{ 1.f, -2.f, 1.f, 1.f, -1.99461446f, 0.99462171f} };
int main()
{
auto lin = linspace(0.0, 1.0, double(samples));
std::cout << "Testing\n\n";
for (int x = 0; x < runs; x ) {
for (int x = 0; x < samples; x ) {
input_col_vector(x, 0) = sin(2 * M_PI * 440 * lin.coeff(x, 0));
input_array[x] = sin(2 * M_PI * 440 * lin.coeff(x, 0));
}
auto start = std::chrono::steady_clock::now();
sosfilt_cls_4(sos_fs_48k_array_flt, input_array);
auto stop = std::chrono::steady_clock::now();
auto elapsed = std::chrono::duration_cast<std::chrono::microseconds>(stop - start);
sum = elapsed.count();
}
std::cout << "Average time:\t" << (sum / runs)/ 1e6 << std::endl;
std::cout << "End.\n";
return 0;
}
python 代碼運行時間為 0.0002448 秒,平均超過 1000 次運行。相比之下,C 代碼平均運行 1000 次,運行時間為 0.0002336 秒。
我在 Visual Studio 中設定了我的編譯器選項,以優先考慮速度而不是空間,并設定了 Ox 標志。我也在使用 fp:fast 浮點模型和 AVX2 增強指令集。
Other things I've done to improve speed is allocating all input data in the stack, using floats instead of doubles but they haven't really made a difference.
Oddly enough, over a single run, the c code is way faster, running in 0.0002333 compared to python which takes 0.00045
Edit: I am now getting 0.00012 seconds
I put the function in the main.cpp file and started in a fresh project with default optimizations.;
template<typename T>
inline void sosfilt(T sos[3][6], T x[SAMPLE_RATE]) {
T x_n = 0, x_c = 0;
T zi[3][2] = { 0 };
// iterate over every i sample section
for (size_t i = 0; i < SAMPLE_RATE; i)
{
x_c = x[i];
// iterate over every j section sample
for (size_t j = 0; j < 3; j)
{
T* section = sos[j];
T* zi_n = zi[j];
x_n = section[0] * x_c zi_n[0];
zi_n[0] = section[1] * x_c - section[4] * x_n zi_n[1];
zi_n[1] = section[2] * x_c - section[5] * x_n;
x_c = x_n;
}
x[i] = x_c;
}
return;
}
uj5u.com熱心網友回復:
您正在復制的功能在 Cython 中實作,請參閱https://github.com/scipy/scipy/blob/v1.7.1/scipy/signal/_sosfilt.pyx。
它也是專門針對這種float情況的。因此,如果仔細撰寫(我認為是這樣),它可能永遠不必回呼到 Python 中,并且實際上是純 C 代碼,編譯為 C,具有一些不同的語法,并且實際上等效于您的代碼。
uj5u.com熱心網友回復:
scipy/numpy 基本上呼叫了用 C/Fortran/C 撰寫的優化數字庫。
唯一的額外開銷是在 python 型別之間進行轉換,這對于 cpython api 來說非常快。
轉載請註明出處,本文鏈接:https://www.uj5u.com/qukuanlian/402505.html
標籤:
上一篇:可以報??告失敗原因的std::optional的替代方法
下一篇:如何迭代可變引數模板型別的數量
