如何在Python中有效地檢查整數是否存在于多個范圍值中-有解無憂

counter如果（即 idx）在多個范圍中的任何一個范圍內，則目標是執行特定程序

在這種情況下，范圍來自 a df，如下所示

df=pd.DataFrame(dict(rbot=[4,20],rtop=[8,25]))

例如，如果counter整數值在(4-8)OR范圍內，則會觸發某些活動(20,25)。

以下代碼應回答以下目標

import pandas as pd

df=pd.DataFrame(dict(rbot=[4,20],rtop=[8,25]))

r_bot=df['rbot'].values.tolist()
r_top=df['rtop'].values.tolist()
for idx in range (120):
    h=[True for x,y in zip(r_bot,r_top) if x <= idx <=y ]

    if True in h:
        print(f'Do some operation with  {idx}')

產生以下輸出

Do some operation with  4
Do some operation with  5
Do some operation with  6
Do some operation with  7
Do some operation with  8
Do some operation with  20
Do some operation with  21
Do some operation with  22
Do some operation with  23
Do some operation with  24
Do some operation with  25

在實際實作中，范圍對可以達到數百，而計數器可以達到數十萬。因此，我想知道這樣做是否更有效？

uj5u.com熱心網友回復：

一種選擇是使用 pandas cut 和間隔索引：

arr = np.arange(120)
intervals = pd.IntervalIndex.from_arrays(df.rbot, df.rtop, closed='both')

out = pd.cut(arr, intervals)

out = arr[pd.notna(out)]

for idx in out:
    print(f'Do some operation with  {idx}')


Do some operation with  4
Do some operation with  5
Do some operation with  6
Do some operation with  7
Do some operation with  8
Do some operation with  20
Do some operation with  21
Do some operation with  22
Do some operation with  23
Do some operation with  24
Do some operation with  25

您可以跳過并迭代，這再次取決于您的最終目標：


for idx in arr:
    if intervals.contains(idx).any():
        print(f'Do some operation with  {idx}')

感謝@user2246849 的測驗，我認為您應該看看它是否滿足您的需求。

uj5u.com熱心網友回復：

有很多方法可以解決這個問題，這里有一個~

df = pd.DataFrame(dict(rbot=[4,20],rtop=[8,25]))
df.rtop  = 1
for idx in range(120):
    if any(idx in range(*df.iloc[x]) for x in df.index):
        print(f'Do some operation with  {idx}')

輸出：

Do some operation with  4
Do some operation with  5
Do some operation with  6
Do some operation with  7
Do some operation with  8
Do some operation with  20
Do some operation with  21
Do some operation with  22
Do some operation with  23
Do some operation with  24
Do some operation with  25

uj5u.com熱心網友回復：

您可以嘗試使用 numpy 廣播創建一個布爾掩碼，該掩碼為每對rbot和rtop值之間的索引回傳 True 。然后將其與相乘range以獲得相關值。最后，用于flatnonzero選擇 True 值：

import numpy as np
arr = np.arange(120)
msk = ((df[['rbot']].to_numpy() <= arr) & (arr <= df[['rtop']].to_numpy())).sum(axis=0)
out = np.flatnonzero(msk*arr)
for idx in out:
    print(f'Do some operation with  {idx}')

輸出：

Do some operation with  4
Do some operation with  5
Do some operation with  6
Do some operation with  7
Do some operation with  8
Do some operation with  20
Do some operation with  21
Do some operation with  22
Do some operation with  23
Do some operation with  24
Do some operation with  25

uj5u.com熱心網友回復：

僅供參考，如果您只想對每個有效索引執行操作，并且不打算稍后執行任何需要 pandas 的其他聚合，那么這會更快且記憶體效率更高：

import pandas as pd

rbot = [i*1000 for i in range(10000)]
rtop = [(i 1)*1000-2 for i in range(10000)]
main_range = (0, 120)

df=pd.DataFrame(dict(rbot=[4,20],rtop=[8,25]))

intervals = zip(df['rbot'], df['rtop'])
for i in intervals:
    overlap = range(max(main_range[0], i[0]), min(main_range[1], i[-1]) 1)
    for idx in overlap:
         print(f'Do some operation with  {idx}')

只需計算主范圍與子范圍的重疊。

Do some operation with  4
Do some operation with  5
Do some operation with  6
Do some operation with  7
Do some operation with  8
Do some operation with  20
Do some operation with  21
Do some operation with  22
Do some operation with  23
Do some operation with  24
Do some operation with  25

具有較大資料集的運行時：

import pandas as pd
import numpy as np

rbot = [i*1000 for i in range(10000)]
rtop = [(i 1)*1000-2 for i in range(10000)]
main_range = (0, 120000)

df = pd.DataFrame({'rbot': rbot, 'rtop': rtop})

def python():
    intervals = zip(df['rbot'], df['rtop'])
    for i in intervals:
        overlap = range(max(main_range[0], i[0]), min(main_range[1], i[-1]) 1)
        for idx in overlap:
            pass#print(idx)

# 5.03 ms ± 58 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit python()

def pandas():
    arr = np.arange(*main_range)
    
    intervals = pd.IntervalIndex.from_arrays(df.rbot, df.rtop, closed='both')

    out = pd.cut(arr, intervals)

    out = arr[pd.notna(out)]
    
    for idx in out:
        pass#print(idx)

# 67 ms ± 467 μs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit pandas()


def numpy():
    arr = np.arange(*main_range)
    msk = ((df[['rbot']].to_numpy() <= arr) & (arr <= df[['rtop']].to_numpy())).sum(axis=0)
    out = np.flatnonzero(msk*arr)
    for idx in out:
        pass#print(idx)

# 2.77 s ± 7.48 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)    
%timeit numpy()

轉載請註明出處，本文鏈接：https://www.uj5u.com/qianduan/472471.html

標籤：Python 熊猫麻木的表现

上一篇：如何創建字典來查找丟棄的零？

下一篇：如何讓“if”陳述句中的變數成為全域變數？