編輯:根據要求,我提供了一個更接近我正在使用的真實資料的示例。
所以我有一張data看起來像的桌子
value0 value1 value2
run step
0 0 0.12573 -0.132105 0.640423
1 0.1049 -0.535669 0.361595
2 1.304 0.947081 -0.703735
3 -1.265421 -0.623274 0.041326
4 -2.325031 -0.218792 -1.245911
5 -0.732267 -0.544259 -0.3163
1 0 0.411631 1.042513 -0.128535
1 1.366463 -0.665195 0.35151
2 0.90347 0.094012 -0.743499
3 -0.921725 -0.457726 0.220195
4 -1.009618 -0.209176 -0.159225
5 0.540846 0.214659 0.355373
(想想:時間序列的集合)和第二張表 valid_range
start stop
run
0 1 3
1 2 5
對于每個run我想洗掉所有不滿足的行start≤step≤stop。
我嘗試了以下(最后生成表格的代碼)
for idx in valid_range.index:
slc = data.loc[idx]
start, stop = valid_range.loc[idx]
cond = (start <= slc.index) & (slc.index <= stop)
data.loc[idx] = data.loc[idx][cond]
但是,這會導致:
value0 value1 value2
run step
0 0 NaN NaN NaN
1 NaN NaN NaN
2 NaN NaN NaN
3 NaN NaN NaN
4 NaN NaN NaN
5 NaN NaN NaN
1 0 NaN NaN NaN
1 NaN NaN NaN
2 NaN NaN NaN
3 NaN NaN NaN
4 NaN NaN NaN
5 NaN NaN NaN
我也試過了,data.loc[idx].drop(slc[cond].index, inplace=True)但是沒有效果。。。
為表生成代碼
import numpy as np
from pandas import DataFrame, MultiIndex, Index
rng = np.random.default_rng(0)
valid_range = DataFrame({"start": [1, 2], "stop":[3, 5]}, index=Index(range(2), name="run"))
midx = MultiIndex(levels=[[],[]], codes=[[],[]], names=["run", "step"])
data = DataFrame(columns=[f"value{k}" for k in range(3)], index=midx)
for run in range(2):
for step in range(6):
data.loc[(run, step), :] = rng.normal(size=(3))
)
uj5u.com熱心網友回復:
首先合并data并valid range基于'run',使用merge方法
>>> data
value0 value1 value2
run step
0 0 0.12573 -0.132105 0.640423
1 0.1049 -0.535669 0.361595
2 1.304 0.947081 -0.703735
3 -1.26542 -0.623274 0.041326
4 -2.32503 -0.218792 -1.24591
5 -0.732267 -0.544259 -0.3163
1 0 0.411631 1.04251 -0.128535
1 1.36646 -0.665195 0.35151
2 0.90347 0.0940123 -0.743499
3 -0.921725 -0.457726 0.220195
4 -1.00962 -0.209176 -0.159225
5 0.540846 0.214659 0.355373
>>> valid_range
start stop
run
0 1 3
1 2 5
>>> merged = data.reset_index().merge(valid_range, how='left', on='run')
>>> merged
run step value0 value1 value2 start stop
0 0 0 0.12573 -0.132105 0.640423 1 3
1 0 1 0.1049 -0.535669 0.361595 1 3
2 0 2 1.304 0.947081 -0.703735 1 3
3 0 3 -1.26542 -0.623274 0.041326 1 3
4 0 4 -2.32503 -0.218792 -1.24591 1 3
5 0 5 -0.732267 -0.544259 -0.3163 1 3
6 1 0 0.411631 1.04251 -0.128535 2 5
7 1 1 1.36646 -0.665195 0.35151 2 5
8 1 2 0.90347 0.0940123 -0.743499 2 5
9 1 3 -0.921725 -0.457726 0.220195 2 5
10 1 4 -1.00962 -0.209176 -0.159225 2 5
11 1 5 0.540846 0.214659 0.355373 2 5
然后使用 選擇滿足條件的行eval。使用布爾陣列來屏蔽data
>>> cond = merged.eval('start < step < stop').to_numpy()
>>> data[cond]
value0 value1 value2
run step
0 2 1.304 0.947081 -0.703735
1 3 -0.921725 -0.457726 0.220195
4 -1.00962 -0.209176 -0.159225
或者如果你愿意,這里有一個類似的方法使用 query
res = (
data.reset_index()
.merge(valid_range, on='run', how='left')
.query('start < step < stop')
.drop(columns=['start','stop'])
.set_index(['run', 'step'])
)
uj5u.com熱心網友回復:
我會像這樣繼續 groupby:
(df.groupby(level=0)
.apply(lambda x: x[x['small']>1])
.reset_index(level=0, drop=True) # remove duplicate index
)
這使:
big small
animal animal attribute
cow cow speed 30.0 20.0
weight 250.0 150.0
falcon falcon speed 320.0 250.0
lama lama speed 45.0 30.0
weight 200.0 100.0
轉載請註明出處,本文鏈接:https://www.uj5u.com/net/349894.html
