我想洗掉NaN我的 pandas 資料框中的值,并將值相對于 a groupbyonCategory和向上移動Gender。這是我創建的一個示例,它模仿了我正在使用的資料:
import pandas as pd
test = {'Price':
[20, 10, 'NaN', 'NaN', 'NaN', 'NaN',21, 11,'NaN', 'NaN', 'NaN','NaN'],
'Gender':
['womens-clothing','womens-clothing','womens-clothing','womens-clothing','womens-clothing','womens-clothing','mens-clothing','mens-clothing','mens-clothing','mens-clothing','mens-clothing','mens-clothing'],
'Category':['dresses','dresses','dresses', 'dresses', 'dresses', 'dresses', 'jackets','jackets', 'jackets', 'jackets', 'jackets', 'jackets'],
'Title':['NaN', 'NaN', 'Cheap Dress', 'First Dress', 'NaN', 'NaN','NaN', 'NaN','Main Jacket', 'Black Jacket','NaN', 'NaN'],
'Review':['NaN','NaN','NaN','NaN',203,12,'NaN','NaN','NaN','NaN',201, 15]}
df = pd.DataFrame(test)
這是它的樣子:
Price Gender Category Title Review
0 20 womens-clothing dresses NaN NaN
1 10 womens-clothing dresses NaN NaN
2 NaN womens-clothing dresses Cheap Dress NaN
3 NaN womens-clothing dresses First Dress NaN
4 NaN womens-clothing dresses NaN 203
5 NaN womens-clothing dresses NaN 12
6 21 mens-clothing jackets NaN NaN
7 11 mens-clothing jackets NaN NaN
8 NaN mens-clothing jackets Main Jacket NaN
9 NaN mens-clothing jackets Black Jacket NaN
10 NaN mens-clothing jackets NaN 201
11 NaN mens-clothing jackets NaN 15
我想放棄行與NaN值從剩余的和值Gender和Category,然后一個轉移細胞起來,因此這樣的匹配以下:
Price Gender Category Title Review
0 20 womens-clothing dresses Cheap Dress 203
2 10 womens-clothing dresses First Dress 12
3 21 mens-clothing jackets Main Jacket 201
4 11 mens-clothing jackets Black Jacket 15
我努力了:
data = df.apply(lambda x: pd.Series(x.drop(index=x[x[0] == 'NaN'], inplace=True).values))
但是我似乎不能以這種方式洗掉特定的行。因為這些NaN是字串(它們對我來說是實際的 NA,我只是不知道如何在我可以為可重現代碼創建的字典中生成它們。)
我怎樣才能得到預期的輸出 - 給定的NaNs是實際的Nas。我已經嘗試groupby在上面的函式中包含 a ,但是我可以在 numpy 陣列上使用它。我可以在函式之外包含,但它沒有幫助。
uj5u.com熱心網友回復:
在理想的資料樣本中使用:
f = lambda x: x.apply(lambda x: x[x!='NaN'])
df = df.set_index(['Gender','Category']).groupby(['Gender','Category'], group_keys=False).apply(f).reset_index()
print (df)
Gender Category Price Title Review
0 mens-clothing jackets 21 Main Jacket 201
1 mens-clothing jackets 11 Black Jacket 15
2 womens-clothing dresses 20 Cheap Dress 203
3 womens-clothing dresses 10 First Dress 12
如果是一般資料,則意味著可能使用不同數量的非NaNs 值:
test = {'Price':
[20, 10, 'NaN', 'NaN', 'NaN', 'NaN',21, 11,45, 'NaN', 'NaN','NaN'],
'Gender':
['womens-clothing','womens-clothing','womens-clothing','womens-clothing','womens-clothing','womens-clothing','mens-clothing','mens-clothing','mens-clothing','mens-clothing','mens-clothing','mens-clothing'],
'Category':['dresses','dresses','dresses', 'dresses', 'dresses', 'dresses', 'jackets','jackets', 'jackets', 'jackets', 'jackets', 'jackets'],
'Title':['NaN', 'NaN', 'Cheap Dress', 'First Dress', 'NaN', 'NaN','NaN', 'NaN','Main Jacket', 'Black Jacket','NaN', 'NaN'],
'Review':['NaN','NaN','NaN','NaN',203,12,'NaN','NaN','NaN','NaN',201, 15]}
df = pd.DataFrame(test)
f = lambda x: x.apply(lambda x: pd.Series(x[x!='NaN'].to_numpy()))
#if NaNs are missing values
#f = lambda x: x.apply(lambda x: pd.Series(x.dropna().to_numpy()))
df = (df.set_index(['Gender','Category'])
.groupby(['Gender','Category'])
.apply(f)
.droplevel(-1)
.reset_index())
print (df)
Gender Category Price Title Review
0 mens-clothing jackets 21 Main Jacket 201
1 mens-clothing jackets 11 Black Jacket 15
2 mens-clothing jackets 45 NaN NaN
3 womens-clothing dresses 20 Cheap Dress 203
4 womens-clothing dresses 10 First Dress 12
轉載請註明出處,本文鏈接:https://www.uj5u.com/yidong/408925.html
標籤:
