我有兩個資料框:
df1 是一個參考表,其中包含各個代碼及其對應值的串列。
df2 是一個較大資料集的摘錄,其中一列將包含多個代碼示例。它還將包含我想忽略的其他值,例如空白和“不適用”。
我需要從 df2 中拆分出每個單獨的代碼,并從參考表 df1 中找到相應的值。然后我想在 df2 中回傳一個列,其中包含整個代碼字串中的最大值。
import pandas as pd
df1 = [['H302',18],
['H312',17],
['H315',16],
['H316',15],
['H319',14],
['H320',13],
['H332',12],
['H304',11]]
df1 = pd.DataFrame(df1, columns=['Code', 'Value'])
df2 = [['H302,H304'],
['H332,H319,H312,H320,H316,H315,H302,H304'],
['H315,H312,H316'],
['H320,H332,H316,H315,H304,H302,H312'],
['H315,H319,H312,H316,H332'],
['H312'],
['Not Applicable'],
['']]
df2 = pd.DataFrame(df2, columns=['Code'])
我以前使用過以下內容:
df3 = []
for i in range(len(df2)):
df3.append(df2['Code'][i].split(","))
max_values = []
for i in range(len(df3)):
for j in range(len(df3[i])):
for index in range(len(df1)):
if df1['Code'][index] == df3[i][j]:
df3[i][j] = df1['Value'][index]
max_values.append(max(df3[i]))
df2["Max Value"] = max_values
但是,正在洗掉 .append 函式,使用時出現以下錯誤“'>' not supported between 'numpy.ndarray' and 'str'”
uj5u.com熱心網友回復:
代碼
df2['max'] = (
df2['Code']
.str.split(',')
.explode()
.map(df1.set_index('Code')['Value'])
.groupby(level=0).max()
)
這個怎么運作?
- 按分隔符分割
, - 展開以將串列轉換為行
- 用于
map替換值df1 - Groupby on
level=0以查找max每個行組的值
結果
Code max
0 H302,H304 18.0
1 H332,H319,H312,H320,H316,H315,H302,H304 18.0
2 H315,H312,H316 17.0
3 H320,H332,H316,H315,H304,H302,H312 18.0
4 H315,H319,H312,H316,H332 17.0
5 H312 17.0
6 Not Applicable NaN
7 NaN
uj5u.com熱心網友回復:
可以使用pandas.Series.apply自定義 lambda 函式,如下所示
df2['Max_Value'] = df2['Code'].apply(lambda x: max([df1.loc[df1['Code'] == i, 'Value'].values[0] for i in x.split(',') if i != 'Not Applicable' and i != ''], default=0))
[Out]:
Code Max_Value
0 H302,H304 18
1 H332,H319,H312,H320,H316,H315,H302,H304 18
2 H315,H312,H316 17
3 H320,H332,H316,H315,H304,H302,H312 18
4 H315,H319,H312,H316,H332 17
5 H312 17
6 Not Applicable 0
7 0
鑒于下面的第一個注釋,如果不想使用.apply(),可以使用串列推導如下
df2['Max_Value'] = [max([df1.loc[df1['Code'] == i, 'Value'].values[0] for i in x.split(',') if i != 'Not Applicable' and i != ''], default=0) for x in df2['Code']]
[Out]:
Code Max_Value
0 H302,H304 18
1 H332,H319,H312,H320,H316,H315,H302,H304 18
2 H315,H312,H316 17
3 H320,H332,H316,H315,H304,H302,H312 18
4 H315,H319,H312,H316,H332 17
5 H312 17
6 Not Applicable 0
7 0
筆記:
- 對使用有強烈的意見
.apply(),因此可能需要閱讀此內容。
uj5u.com熱心網友回復:
作為備選:
keys=df1['Code'].to_list()
df2["Code"] = df2["Code"].str.split(',')
def get_max(x):
max_=[]
for i in x:
if i in keys:
max_.append(df1.loc[df1.Code==i, 'Value'].values[0])
else:
pass
if len(max_)>0:
return max(max_)
else:
[]
df2['max_value']=df2['Code'].apply(lambda x:get_max(x))
print(df2)
'''
Code max_value
0 ['H302', 'H304'] 18.0
1 ['H332', 'H319', 'H312', 'H320', 'H316', 'H315', 'H302', 'H304'] 18.0
2 ['H315', 'H312', 'H316'] 17.0
3 ['H320', 'H332', 'H316', 'H315', 'H304', 'H302', 'H312'] 18.0
4 ['H315', 'H319', 'H312', 'H316', 'H332'] 17.0
5 ['H312'] 17.0
6 ['Not Applicable'] nan
7 [''] nan
'''
轉載請註明出處,本文鏈接:https://www.uj5u.com/qukuanlian/524025.html
標籤:Python熊猫数据框
