我有這兩個資料集:
val_1 = [81.322, 81.342, 81.452, 81.552, 82.522, 82.562, 82.722, 81.723, 81.322, 81.332 ]
val_2 = [0.234, 0.231, 0.333, 0.324, 0.666, 0.645, 0.334, 0.345, 0.878, 0.888]
data_1 = pd.DataFrame({'val_1': val_1, 'val_2': val_2})
data_1
val_1 val_2
0 81.322 0.234
1 81.342 0.231
2 81.452 0.333
3 81.552 0.324
4 82.522 0.666
5 82.562 0.645
6 82.722 0.334
7 81.723 0.345
8 81.322 0.878
9 81.332 0.888
和
val_1_start = [81.100,81.41, 81.99, 81.320 ]
val_2_start = [0.230,0.331,0.32, 0.875 ]
val_1_end = [81.400,81.554,82.8, 81.333 ]
val_2_end = [0.281,0.335,0.68, 0.890]
value = [24,21,34,11]
data_2 = pd.DataFrame({'val_1_start': val_1_start, 'val_2_start': val_2_start,
'val_1_end': val_1_end, 'val_2_end':val_2_end, 'value': value})
data_2
val_1_start val_2_start val_1_end val_2_end value
0 81.10 0.230 81.400 0.281 24
1 81.41 0.331 81.554 0.335 21
2 81.99 0.320 82.800 0.680 34
3 81.32 0.875 81.333 0.890 11
我試圖找到 val_1 在 val_1_start 和 val_1_end 之間的位置,而 val_2 在 val_2_start 和 val_2_end 之間。然后我會將值添加到第一個資料集。
我不介意這里的任何輸出
val_1 val_2 value
81.322 0.234 24
81.342 0.231 24
81.452 0.333 21
81.552 0.324 NA
82.522 0.666 34
82.562 0.645 34
82.722 0.334 34
81.723 0.345 NA
81.322 0.878 11
81.332 0.888 11
val_1 val_2 value
81.322 0.234 24
81.342 0.231 24
81.452 0.333 21
82.522 0.666 34
82.562 0.645 34
82.722 0.334 34
81.322 0.878 11
81.332 0.888 11
請讓我知道我如何做到這一點
uj5u.com熱心網友回復:
使用交叉連接DataFrame.merge,然后過濾boolean indexing:
df = data_1.merge(data_2, how='cross')
m = (df['val_1'].between(df['val_1_start'], df['val_1_end']) &
df['val_2'].between(df['val_2_start'], df['val_2_end']))
df = df.loc[m, ['val_1','val_2','value']]
print (df)
val_1 val_2 value
0 81.322 0.234 24
4 81.342 0.231 24
9 81.452 0.333 21
18 82.522 0.666 34
22 82.562 0.645 34
26 82.722 0.334 34
35 81.322 0.878 11
39 81.332 0.888 11
uj5u.com熱心網友回復:
一種選擇是conditional_join從pyjanitor ; 它試圖避免非等連接的笛卡爾連接(以減少記憶體消耗):
# pip install git https://github.com/pyjanitor-devs/pyjanitor.git
import pandas as pd
import janitor
(data_1.conditional_join(
data_2,
# column from left, column from right, join type
('val_1', 'val_1_start', '>='),
('val_1', 'val_1_end', '<='),
('val_2', 'val_2_start', '>='),
('val_2', 'val_2_end', '<=')
)
.loc(axis=1)['val_1', 'val_2', 'value']
)
val_1 val_2 value
0 81.322 0.234 24
1 81.342 0.231 24
2 81.452 0.333 21
3 82.522 0.666 34
4 82.562 0.645 34
5 82.722 0.334 34
6 81.322 0.878 11
7 81.332 0.888 11
對于左連接,傳遞left給how引數:
(data_1.conditional_join(
data_2,
('val_1', 'val_1_start', '>='),
('val_1', 'val_1_end', '<='),
('val_2', 'val_2_start', '>='),
('val_2', 'val_2_end', '<='),
how = 'left'
)
.loc(axis=1)['val_1', 'val_2', 'value']
)
val_1 val_2 value
0 81.322 0.234 24.0
1 81.342 0.231 24.0
2 81.452 0.333 21.0
3 81.552 0.324 NaN
4 82.522 0.666 34.0
5 82.562 0.645 34.0
6 82.722 0.334 34.0
7 81.723 0.345 NaN
8 81.322 0.878 11.0
9 81.332 0.888 11.0
請注意,目前您必須安裝 pyjanitor dev 版本才能使用此功能。 pip install git https://github.com/pyjanitor-devs/pyjanitor.git
此頁面還包含您可能會發現有用/相關的選項。
轉載請註明出處,本文鏈接:https://www.uj5u.com/gongcheng/355123.html
上一篇:如何在回圈內正確解決這個問題?
