我有個問題。我在列內缺少數字materialNumber。但如果是相似price的,應該是一模一樣的materialNumber。如果兩個以上materialNumber的情況相同price,則應采用第一個。如果沒有materialNumber找到相同的price,它應該materialnumber根據價格選擇下一個最近的。
資料框
customerId materialNumber price
0 1 1234.0 100
1 1 4562.0 20
2 2 NaN 100
3 2 4562.0 30
4 3 1547.0 40
5 3 NaN 37
代碼
import pandas as pd
d = {
"customerId": [1, 1, 2, 2, 3, 3],
"materialNumber": [
1234,
4562,
None,
4562,
1547,
None,
],
"price": [100, 20, 100, 30, 40, 37],
}
df = pd.DataFrame(data=d)
print(df)
import numpy as np
def find_next(x):
if(x['materialNumber'] == None):
#if price occurs only once it should finde the next nearst price
if(x['price'].value_counts().shape[0] == 1):
return x.drop_duplicates(subset=['price'], keep="first")
else:
return x.iloc[(x['price']-input).abs().argsort()[:2]]
df['materialNumber'] = df.apply(lambda x: find_next(x), axis=1)
我想要的是
customerId materialNumber price
0 1 1234.0 100
1 1 4562.0 20
2 2 1234 100 # of index 0: 1234.0, 100 (same value)
3 2 4562.0 30
4 3 1547.0 40
5 3 1547 37 # of index 4: 1547.0, 40 (next similar value)
uj5u.com熱心網友回復:
merge_asof與每行有缺失值的匹配行一起使用,每materialNumber行沒有缺失值,并在 中分配值DataFrame.loc:
m = df['materialNumber'].isna()
new = pd.merge_asof(df[m].reset_index().sort_values('price'),
df[~m].sort_values('price'), on='price', direction='nearest')
df.loc[m, 'materialNumber'] = new.set_index('index')['materialNumber_y']
print(df)
customerId materialNumber price
0 1 1234.0 100
1 1 4562.0 20
2 2 1234.0 100
3 2 4562.0 30
4 3 1547.0 40
5 3 1547.0 37
uj5u.com熱心網友回復:
IIUC,您可以使用 amerge_asof找到相等或最接近的價格值,然后是update您的資料框:
# mask to split the DataFrame in NaN/non-NaN for materialNumber
m = df['materialNumber'].isna()
# sort by price (required for merge_asof)
df2 = df.sort_values(by='price')
# fill missing values
missing = pd.merge_asof(df2.reset_index().loc[m, ['index', 'price']],
df2.loc[~m, ['price', 'materialNumber']],
on='price',
direction='nearest') # direction='forward' for next only
# update in place
df.update(missing.set_index('index')['materialNumber'])
輸出:
customerId materialNumber price
0 1 1234.0 100
1 1 4562.0 20
2 2 1234.0 100
3 2 4562.0 30
4 3 1547.0 40
5 3 1547.0 37
轉載請註明出處,本文鏈接:https://www.uj5u.com/qianduan/491350.html
下一篇:根據資料框中另一列的值創建列
