基于python中的其他列值創建新列-有解無憂

請幫助我根據其他資料框值創建另一列。

這是我的 df 示例：

data = {'Well':[1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3], 
  'Depth':[50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 210, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 210]}
data2 = {'Well': [1,1,1,1,1,1,2,2,2,2,2,2,3,3,3,3,3,3], 'Layer': ['A', 'Bot-A', 'B', 'Bot-B', 'C', 'Bot-C','A', 'Bot-A', 'B', 'Bot-B', 'C', 'Bot-C','A', 'Bot-A', 'B', 'Bot-B', 'C', 'Bot-C'], 'Depth': [75,100,125,150,175,200,61,78,89,141,152,189,50,68,98,135,155,189]}
df2 = pd.DataFrame(data2)
df = pd.DataFrame(data)

條件是，我需要在每個深度將圖層填充到 df。

例如在df2中，我們可以發現1號井的A層為75，Bottom-A的深度為100。所以如果df1中的深度大于75但小于100，我想用A填充它們。

下一個條件是，如果每個井的 df1 中的深度 < df2 中的深度 A，我需要用 GG 填充它們。如果深度 > Bot-C，我需要用 JJ 填充它們。

這是我想要的結果。

data3 = {'Well':[1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3], 
  'Depth':[50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 210, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 210], 
  'Layer': ['GG','GG','GG','A','A','Bot-A','Bot-A','Bot-A','B','B','Bot-B','Bot-B','Bot-B','C','C','Bot-C','JJ','GG','GG','GG','A','Bot-A','B','B','B','B','B','B','Bot-B','C','C','C','JJ','JJ','A','A','Bot-A','Bot-A','Bot-A','B','B','B','B','Bot-B','Bot-B','C','C','C','JJ','JJ','JJ']}
df3 = pd.DataFrame(data3)

非常感謝，新年快樂！

uj5u.com熱心網友回復：

這是一種解決方案。您可以df2使用自定義函式創建查找字典，并在自定義函式的幫助下get_layer使用查找字典將深度映射到層，您可以獲得預期的結果。

def get_layer(data):
    depths = lookup[data['Well']]
    if data['Depth'] < list(depths.keys())[-1]:
        return 'GG'
    elif data['Depth'] > list(depths.keys())[0]:
        return 'JJ'
    else:
        for lim, layer in depths.items():
            if data['Depth'] >= lim:
                return layer
            
lookup = {}
for (k1,k2),v in df2.set_index(['Well','Depth'])['Layer'][::-1].to_dict().items():
    lookup.setdefault(k1, {}).update({k2:v})

df['Layer'] = df.apply(get_layer, axis=1)

輸出：

    Well  Depth  Layer
0      1     50     GG
1      1     60     GG
2      1     70     GG
3      1     80      A
4      1     90      A
5      1    100  Bot-A
6      1    110  Bot-A
7      1    120  Bot-A
8      1    130      B
9      1    140      B
10     1    150  Bot-B
11     1    160  Bot-B
12     1    170  Bot-B
13     1    180      C
14     1    190      C
15     1    200  Bot-C
16     1    210     JJ
17     2     40     GG
18     2     50     GG
19     2     60     GG
20     2     70      A
21     2     80  Bot-A
22     2     90      B
23     2    100      B
24     2    110      B
25     2    120      B
26     2    130      B
27     2    140      B
28     2    150  Bot-B
29     2    160      C
30     2    170      C
31     2    180      C
32     2    190     JJ
33     2    200     JJ
34     3     50      A
35     3     60      A
36     3     70  Bot-A
37     3     80  Bot-A
38     3     90  Bot-A
39     3    100      B
40     3    110      B
41     3    120      B
42     3    130      B
43     3    140  Bot-B
44     3    150  Bot-B
45     3    160      C
46     3    170      C
47     3    180      C
48     3    190     JJ
49     3    200     JJ
50     3    210     JJ

uj5u.com熱心網友回復：

使用merge_asof，

df = df.sort_values("Depth")
df2 = df2.sort_values("Depth")
dfmerge = pd.merge_asof(df, df2,  by="Well",on = "Depth").fillna("GG")
dfmerge = pd.merge_asof(dfmerge, df2, by = "Well", on="Depth", direction ="forward")
dfmerge.loc[dfmerge["Layer_y"].isnull(), "Layer_x"] = "JJ"
dfmerge = dfmerge.sort_values(["Well", "Depth"]).drop(columns = "Layer_y")

uj5u.com熱心網友回復：

由于您正在使用井和地質層，因此您將來可能需要條件連接。沒有直接的方法可以在 pandas 中進行條件連接，但是您可以使用 pandasql 庫。

from pandasql import sqldf

pysqldf = lambda q: sqldf(q, globals())
query = ''' 
with 
q1 as (
select *,
lead(Depth) over(partition by Well order by Depth) as Depth_lead,
ifnull(lag(Depth) over(partition by Well order by Depth), 0) as Depth_lag        
       from df2 )
,q2 as (select Well, min(Depth) as min, max(Depth) as max from q1 group by Well )

, q3 as (select t1.*, (CASE WHEN t2.Layer is null and t1.Depth < min THEN 'GG'
                       WHEN t2.Layer is null and t1.Depth  > max THEN 'JJ'
                       ELSE t2.Layer END) as Layer
    from df as t1
    left join q1 as t2 
    on t1.Well = t2.Well and t1.Depth>=t2.Depth and t1.Depth < t2.Depth_lead
    left join q2 as t3 on t1.well = t3.well)

   Select * from q3
   '''
   data = sqldf(query)

轉載請註明出處，本文鏈接：https://www.uj5u.com/ruanti/399112.html

標籤：Python 熊猫数据框

上一篇：pandasgroupby中的列總和

下一篇：讀入.txt檔案，轉換為Pandas資料幀，但空格分隔值因空格數而異