counter如果(即 idx)在多個范圍中的任何一個范圍內,則目標是執行特定程序
在這種情況下,范圍來自 a df,如下所示
df=pd.DataFrame(dict(rbot=[4,20],rtop=[8,25]))
例如,如果counter整數值在(4-8)OR范圍內,則會觸發某些活動(20,25)。
以下代碼應回答以下目標
import pandas as pd
df=pd.DataFrame(dict(rbot=[4,20],rtop=[8,25]))
r_bot=df['rbot'].values.tolist()
r_top=df['rtop'].values.tolist()
for idx in range (120):
h=[True for x,y in zip(r_bot,r_top) if x <= idx <=y ]
if True in h:
print(f'Do some operation with {idx}')
產生以下輸出
Do some operation with 4
Do some operation with 5
Do some operation with 6
Do some operation with 7
Do some operation with 8
Do some operation with 20
Do some operation with 21
Do some operation with 22
Do some operation with 23
Do some operation with 24
Do some operation with 25
在實際實作中,范圍對可以達到數百,而計數器可以達到數十萬。因此,我想知道這樣做是否更有效?
uj5u.com熱心網友回復:
一種選擇是使用 pandas cut 和間隔索引:
arr = np.arange(120)
intervals = pd.IntervalIndex.from_arrays(df.rbot, df.rtop, closed='both')
out = pd.cut(arr, intervals)
out = arr[pd.notna(out)]
for idx in out:
print(f'Do some operation with {idx}')
Do some operation with 4
Do some operation with 5
Do some operation with 6
Do some operation with 7
Do some operation with 8
Do some operation with 20
Do some operation with 21
Do some operation with 22
Do some operation with 23
Do some operation with 24
Do some operation with 25
您可以跳過并迭代,這再次取決于您的最終目標:
for idx in arr:
if intervals.contains(idx).any():
print(f'Do some operation with {idx}')
感謝@user2246849 的測驗,我認為您應該看看它是否滿足您的需求。
uj5u.com熱心網友回復:
有很多方法可以解決這個問題,這里有一個~
df = pd.DataFrame(dict(rbot=[4,20],rtop=[8,25]))
df.rtop = 1
for idx in range(120):
if any(idx in range(*df.iloc[x]) for x in df.index):
print(f'Do some operation with {idx}')
輸出:
Do some operation with 4
Do some operation with 5
Do some operation with 6
Do some operation with 7
Do some operation with 8
Do some operation with 20
Do some operation with 21
Do some operation with 22
Do some operation with 23
Do some operation with 24
Do some operation with 25
uj5u.com熱心網友回復:
您可以嘗試使用 numpy 廣播創建一個布爾掩碼,該掩碼為每對rbot和rtop值之間的索引回傳 True 。然后將其與 相乘range以獲得相關值。最后,用于flatnonzero選擇 True 值:
import numpy as np
arr = np.arange(120)
msk = ((df[['rbot']].to_numpy() <= arr) & (arr <= df[['rtop']].to_numpy())).sum(axis=0)
out = np.flatnonzero(msk*arr)
for idx in out:
print(f'Do some operation with {idx}')
輸出:
Do some operation with 4
Do some operation with 5
Do some operation with 6
Do some operation with 7
Do some operation with 8
Do some operation with 20
Do some operation with 21
Do some operation with 22
Do some operation with 23
Do some operation with 24
Do some operation with 25
uj5u.com熱心網友回復:
僅供參考,如果您只想對每個有效索引執行操作,并且不打算稍后執行任何需要 pandas 的其他聚合,那么這會更快且記憶體效率更高:
import pandas as pd
rbot = [i*1000 for i in range(10000)]
rtop = [(i 1)*1000-2 for i in range(10000)]
main_range = (0, 120)
df=pd.DataFrame(dict(rbot=[4,20],rtop=[8,25]))
intervals = zip(df['rbot'], df['rtop'])
for i in intervals:
overlap = range(max(main_range[0], i[0]), min(main_range[1], i[-1]) 1)
for idx in overlap:
print(f'Do some operation with {idx}')
只需計算主范圍與子范圍的重疊。
Do some operation with 4
Do some operation with 5
Do some operation with 6
Do some operation with 7
Do some operation with 8
Do some operation with 20
Do some operation with 21
Do some operation with 22
Do some operation with 23
Do some operation with 24
Do some operation with 25
具有較大資料集的運行時:
import pandas as pd
import numpy as np
rbot = [i*1000 for i in range(10000)]
rtop = [(i 1)*1000-2 for i in range(10000)]
main_range = (0, 120000)
df = pd.DataFrame({'rbot': rbot, 'rtop': rtop})
def python():
intervals = zip(df['rbot'], df['rtop'])
for i in intervals:
overlap = range(max(main_range[0], i[0]), min(main_range[1], i[-1]) 1)
for idx in overlap:
pass#print(idx)
# 5.03 ms ± 58 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit python()
def pandas():
arr = np.arange(*main_range)
intervals = pd.IntervalIndex.from_arrays(df.rbot, df.rtop, closed='both')
out = pd.cut(arr, intervals)
out = arr[pd.notna(out)]
for idx in out:
pass#print(idx)
# 67 ms ± 467 μs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit pandas()
def numpy():
arr = np.arange(*main_range)
msk = ((df[['rbot']].to_numpy() <= arr) & (arr <= df[['rtop']].to_numpy())).sum(axis=0)
out = np.flatnonzero(msk*arr)
for idx in out:
pass#print(idx)
# 2.77 s ± 7.48 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit numpy()
轉載請註明出處,本文鏈接:https://www.uj5u.com/qianduan/472471.html
上一篇:如何創建字典來查找丟棄的零?
