我有兩個資料框,我最終想要合并它們,以便比較領導者姓名的替代拼寫之間的差異。
我的第一個資料框如下所示:
year country_isocode country_name leader leader_start_date leader_end_date
20 1986 AFG Afghanistan Mohammed Najibullah 1986-05-04 1992-04-16
21 1987 AFG Afghanistan Mohammed Najibullah 1986-05-04 1992-04-16
22 1988 AFG Afghanistan Mohammed Najibullah 1986-05-04 1992-04-16
23 1989 AFG Afghanistan Mohammed Najibullah 1986-05-04 1992-04-16
24 1990 AFG Afghanistan Mohammed Najibullah 1986-05-04 1992-04-16
25 1991 AFG Afghanistan Mohammed Najibullah 1986-05-04 1992-04-16
26 1992 AFG Afghanistan Burhanuddin Rabbani 1992-06-28 1996-09-27
27 1993 AFG Afghanistan Burhanuddin Rabbani 1992-06-28 1996-09-27
28 1994 AFG Afghanistan Burhanuddin Rabbani 1992-06-28 1996-09-27
雖然我的第二個看起來像這樣:
leader_start_date leader_end_date LeaderCountryOrIGO LeaderCountryISO LeaderTitle LeaderLastName LeaderFullName
0 1986-05-04 1990-06-28 Afghanistan AFG General Secretary Najibullah Mohammad Najibullah
1 1989-02-21 1990-05-07 Afghanistan AFG Prime Minister Keshtmand Ali Keshtmand
2 1990-05-07 1992-04-15 Afghanistan AFG Prime Minister Khaliqyar Fazal Haq Khaliqyar
3 1992-04-16 1992-04-28 Afghanistan AFG President (Acting) Hatef Abdul Rahim Hatef
4 1992-04-28 1992-06-28 Afghanistan AFG President (Acting) Mojadedi Sibghatullah Mojadedi
5 1992-06-28 1996-09-27 Afghanistan AFG President Rabbani Burhanuddin Rabbani
第一個資料框為資料集中的每個國家/地區-年份條目提供單獨的行,而第二個資料框為唯一的領導者及其在職年份范圍內的一個單獨的行。我的目標是重新調整第二個資料框的形狀,使其與第一個的形狀相當。
我想采用第二個資料集中的兩個日期時間列“leader_start_date”和“leader_end_date”隱含的年份范圍,并“擴展”它們以在該范圍內為該范圍內的每一年創建新行集,其中包括有關國家和領導人的重復資訊名稱。然后我需要在第二個日期框架中為所有唯一的領導者名稱及其年份范圍迭代此解決方案。
雖然資料集不是完美匹配,但讓兩個資料框具有相同的形狀將使我能夠識別多個匹配項。
uj5u.com熱心網友回復:
用:
#convert both columns to datetimes
df['leader_start_date'] = pd.to_datetime(df['leader_start_date'])
df['leader_end_date'] = pd.to_datetime(df['leader_end_date'])
#create new column by years
df.insert(0, 'year', df['leader_start_date'].dt.year)
#subtract years for repeating, repalce missing values by actual year
s = df['leader_end_date'].dt.year.fillna(pd.to_datetime('now').year) - df['year']
#if output is previous year by leader_end_date
df = df.loc[df.index.repeat(s)].copy()
#if output match also year in leader_end_date
# df = df.loc[df.index.repeat(s 1)].copy()
#add counter to column year
df['year'] = df.groupby(level=0).cumcount()
#create default index
df = df.reset_index(drop=True)
print (df)
year leader_start_date leader_end_date LeaderCountryOrIGO \
0 1986 1986-05-04 1990-06-28 Afghanistan
1 1987 1986-05-04 1990-06-28 Afghanistan
2 1988 1986-05-04 1990-06-28 Afghanistan
3 1989 1986-05-04 1990-06-28 Afghanistan
4 1989 1989-02-21 1990-05-07 Afghanistan
5 1990 1990-05-07 1992-04-15 Afghanistan
6 1991 1990-05-07 1992-04-15 Afghanistan
7 1992 1992-06-28 1996-09-27 Afghanistan
8 1993 1992-06-28 1996-09-27 Afghanistan
9 1994 1992-06-28 1996-09-27 Afghanistan
10 1995 1992-06-28 1996-09-27 Afghanistan
LeaderCountryISO LeaderTitle LeaderLastName LeaderFullName
0 AFG General Secretary Najibullah Mohammad Najibullah
1 AFG General Secretary Najibullah Mohammad Najibullah
2 AFG General Secretary Najibullah Mohammad Najibullah
3 AFG General Secretary Najibullah Mohammad Najibullah
4 AFG Prime Minister Keshtmand Ali Keshtmand
5 AFG Prime Minister Khaliqyar Fazal Haq Khaliqyar
6 AFG Prime Minister Khaliqyar Fazal Haq Khaliqyar
7 AFG President Rabbani Burhanuddin Rabbani
8 AFG President Rabbani Burhanuddin Rabbani
9 AFG President Rabbani Burhanuddin Rabbani
10 AFG President Rabbani Burhanuddin Rabbani
轉載請註明出處,本文鏈接:https://www.uj5u.com/qukuanlian/331244.html
