請幫助我根據其他資料框值創建另一列。
這是我的 df 示例:
data = {'Well':[1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3],
'Depth':[50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 210, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 210]}
data2 = {'Well': [1,1,1,1,1,1,2,2,2,2,2,2,3,3,3,3,3,3], 'Layer': ['A', 'Bot-A', 'B', 'Bot-B', 'C', 'Bot-C','A', 'Bot-A', 'B', 'Bot-B', 'C', 'Bot-C','A', 'Bot-A', 'B', 'Bot-B', 'C', 'Bot-C'], 'Depth': [75,100,125,150,175,200,61,78,89,141,152,189,50,68,98,135,155,189]}
df2 = pd.DataFrame(data2)
df = pd.DataFrame(data)
條件是,我需要在每個深度將圖層填充到 df。
例如在df2中,我們可以發現1號井的A層為75,Bottom-A的深度為100。所以如果df1中的深度大于75但小于100,我想用A填充它們。
下一個條件是,如果每個井的 df1 中的深度 < df2 中的深度 A,我需要用 GG 填充它們。如果深度 > Bot-C,我需要用 JJ 填充它們。
這是我想要的結果。
data3 = {'Well':[1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3],
'Depth':[50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 210, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 210],
'Layer': ['GG','GG','GG','A','A','Bot-A','Bot-A','Bot-A','B','B','Bot-B','Bot-B','Bot-B','C','C','Bot-C','JJ','GG','GG','GG','A','Bot-A','B','B','B','B','B','B','Bot-B','C','C','C','JJ','JJ','A','A','Bot-A','Bot-A','Bot-A','B','B','B','B','Bot-B','Bot-B','C','C','C','JJ','JJ','JJ']}
df3 = pd.DataFrame(data3)
非常感謝,新年快樂!
uj5u.com熱心網友回復:
這是一種解決方案。您可以df2使用自定義函式創建查找字典,并在自定義函式的幫助下get_layer使用查找字典將深度映射到層,您可以獲得預期的結果。
def get_layer(data):
depths = lookup[data['Well']]
if data['Depth'] < list(depths.keys())[-1]:
return 'GG'
elif data['Depth'] > list(depths.keys())[0]:
return 'JJ'
else:
for lim, layer in depths.items():
if data['Depth'] >= lim:
return layer
lookup = {}
for (k1,k2),v in df2.set_index(['Well','Depth'])['Layer'][::-1].to_dict().items():
lookup.setdefault(k1, {}).update({k2:v})
df['Layer'] = df.apply(get_layer, axis=1)
輸出:
Well Depth Layer
0 1 50 GG
1 1 60 GG
2 1 70 GG
3 1 80 A
4 1 90 A
5 1 100 Bot-A
6 1 110 Bot-A
7 1 120 Bot-A
8 1 130 B
9 1 140 B
10 1 150 Bot-B
11 1 160 Bot-B
12 1 170 Bot-B
13 1 180 C
14 1 190 C
15 1 200 Bot-C
16 1 210 JJ
17 2 40 GG
18 2 50 GG
19 2 60 GG
20 2 70 A
21 2 80 Bot-A
22 2 90 B
23 2 100 B
24 2 110 B
25 2 120 B
26 2 130 B
27 2 140 B
28 2 150 Bot-B
29 2 160 C
30 2 170 C
31 2 180 C
32 2 190 JJ
33 2 200 JJ
34 3 50 A
35 3 60 A
36 3 70 Bot-A
37 3 80 Bot-A
38 3 90 Bot-A
39 3 100 B
40 3 110 B
41 3 120 B
42 3 130 B
43 3 140 Bot-B
44 3 150 Bot-B
45 3 160 C
46 3 170 C
47 3 180 C
48 3 190 JJ
49 3 200 JJ
50 3 210 JJ
uj5u.com熱心網友回復:
使用merge_asof,
df = df.sort_values("Depth")
df2 = df2.sort_values("Depth")
dfmerge = pd.merge_asof(df, df2, by="Well",on = "Depth").fillna("GG")
dfmerge = pd.merge_asof(dfmerge, df2, by = "Well", on="Depth", direction ="forward")
dfmerge.loc[dfmerge["Layer_y"].isnull(), "Layer_x"] = "JJ"
dfmerge = dfmerge.sort_values(["Well", "Depth"]).drop(columns = "Layer_y")
uj5u.com熱心網友回復:
由于您正在使用井和地質層,因此您將來可能需要條件連接。沒有直接的方法可以在 pandas 中進行條件連接,但是您可以使用 pandasql 庫。
from pandasql import sqldf
pysqldf = lambda q: sqldf(q, globals())
query = '''
with
q1 as (
select *,
lead(Depth) over(partition by Well order by Depth) as Depth_lead,
ifnull(lag(Depth) over(partition by Well order by Depth), 0) as Depth_lag
from df2 )
,q2 as (select Well, min(Depth) as min, max(Depth) as max from q1 group by Well )
, q3 as (select t1.*, (CASE WHEN t2.Layer is null and t1.Depth < min THEN 'GG'
WHEN t2.Layer is null and t1.Depth > max THEN 'JJ'
ELSE t2.Layer END) as Layer
from df as t1
left join q1 as t2
on t1.Well = t2.Well and t1.Depth>=t2.Depth and t1.Depth < t2.Depth_lead
left join q2 as t3 on t1.well = t3.well)
Select * from q3
'''
data = sqldf(query)
轉載請註明出處,本文鏈接:https://www.uj5u.com/ruanti/399112.html
