我是一個 noob python 用戶,我的目的是得到名字并轉移到下一行
import pandas as pd
import numpy as np
df = pd.DataFrame({"1": ['Alfred', 'car', 'bike','Alex','car'],
"2": [np.nan, 'Ford', 'Giant',np.nan,'Toyota'],
"3": [pd.NaT, pd.Timestamp("2018-01-01"),
pd.Timestamp("2018-07-01"),np.nan,pd.Timestamp("2021-01-01")]})
1 2 3
0 Alfred NaN NaT
1 car Ford 2018-01-01
2 bike Giant 2018-07-01
3 Alex NaN NaT
4 car Toyota 2021-01-01
我的目標結果如下
df = pd.DataFrame({"transportation": ['car', 'bike','car'],
"Mark": ['Ford', 'Giant','Toyota'],
"BuyDate":[pd.Timestamp("2018-01-01"),
pd.Timestamp("2018-07-01"),pd.Timestamp("2021-01-01")],
"Name":['Alfred','Alfred','Alex']
})
transportation Mark BuyDate Name
0 car Ford 2018-01-01 Alfred
1 bike Giant 2018-07-01 Alfred
2 car Toyota 2021-01-01 Alex
我正在嘗試搜索某種方法,但無法解決此問題,感謝您查看我的帖子并提供幫助
uj5u.com熱心網友回復:
您可以使用輔助列然后向前填充來執行此操作:
# rename columns
df.columns = ["transportation", "Mark", "BuyDate"]
# assumption: the rows where "Mark" is NaN defines the name for the following rows
df["is_name"] = df["Mark"].isna()
# create a new column which is NaN everywhere except for the name rows
df["name"] = np.where(df.is_name, df["transportation"], np.nan)
# do a forward fill to extend the names to all rows
df["name"] = df["name"].fillna(method="ffill")
# filter by non-name rows and drop the temporary is_name column
df = df.loc[~df.is_name].drop("is_name", axis=1)
print(df)
Out:
transportation Mark BuyDate name
1 car Ford 2018-01-01 Alfred
2 bike Giant 2018-07-01 Alfred
4 car Toyota 2021-01-01 Alex
uj5u.com熱心網友回復:
想法是按Mark列向前填充缺失值Name,然后在同一掩碼中過濾行:
df.columns = ["Transportation", "Mark", "BuyDate"]
m = df["Mark"].notna()
df["Name"] = df["transportation"].mask(m).ffill()
df = df[m].reset_index(drop=True)
print(df)
Transportation Mark BuyDate Name
0 car Ford 2018-01-01 Alfred
1 bike Giant 2018-07-01 Alfred
2 car Toyota 2021-01-01 Alex
uj5u.com熱心網友回復:
您可以使用此管道:
m = df.iloc[:,1].notna()
(df.assign(Name=df.iloc[:,0].mask(m).ffill()) # add new column
.loc[m] # keep only the columns with info
# below: rework df to fit output
.rename(columns={'1': 'transportation', '2': 'Mark', '3': 'BuyDate'})
.reset_index(drop=True)
)
輸出:
transportation Mark BuyDate Name
0 car Ford 2018-01-01 Alfred
1 bike Giant 2018-07-01 Alfred
2 car Toyota 2021-01-01 Alex
uj5u.com熱心網友回復:
你可以這樣做:
>>> import pandas as pd
>>> import numpy as np
>>> df = pd.DataFrame({"1": ['Alfred', 'car', 'bike','Alex','car'],
... "2": [np.nan, 'Ford', 'Giant',np.nan,'Toyota'],
... "3": [pd.NaT, pd.Timestamp("2018-01-01"),
... pd.Timestamp("2018-07-01"),np.nan,pd.Timestamp("2021-01-01")]})
>>>
>>> df
1 2 3
0 Alfred NaN NaT
1 car Ford 2018-01-01
2 bike Giant 2018-07-01
3 Alex NaN NaT
4 car Toyota 2021-01-01
>>>
>>> new_df = pd.DataFrame(columns=['Transportation', 'Mark', 'BuyDate', 'Name'])
>>>
>>> j = 0
>>> for i in range(1, df.shape[0]):
... if df.loc[i][1] is np.nan:
... running_name = df.loc[i][0]
... continue
... new_df.loc[j] = list(df.loc[i]) [running_name]
... j = 1
...
>>> new_df
Transportation Mark BuyDate Name
0 car Ford 2018-01-01 Alfred
1 bike Giant 2018-07-01 Alfred
2 car Toyota 2021-01-01 Alex
>>>
轉載請註明出處,本文鏈接:https://www.uj5u.com/gongcheng/346968.html
