我被困在下面的問題中。如果你能幫我寫代碼就好了。
所以我有這個資料框 df1 看起來像這樣:
Exams
0 exam7 (2017), exam9 (2018), exam3 (2018), exam...
1 exam2 (2017), exam2 (2017), exam8 (2018), exam...
2 exam7 (2017), exam6 (2017), exam2 (2017), exam...
3 exam10 (2019), exam4 (2019), exam4 (2019), exa...
4 exam4 (2019), exam4 (2019), exam4 (2019)
.. ...
95 exam10 (2019), exam4 (2019), exam6 (2017), exa...
96 exam1 (2016), exam8 (2018)
97 exam3 (2018), exam5 (2020), exam6 (2017)
98 exam3 (2018), exam9 (2018), exam3 (2018), exam...
99 exam8 (2018)
[100 rows x 1 columns]
我需要把它分成兩列:考試和年份。此外,我只需要每個考試和年份對的唯一值。
最終輸出應該是這樣的:
Exam,Year
exam1,2016
exam10,2019
exam2,2017
exam3,2018
exam4,2019
exam5,2020
exam6,2017
exam7,2017
exam8,2018
exam9,2018
我想我需要在 Pandas 中使用 iterrows 迭代每一行,然后使用 apply 方法來拆分考試和年份并僅保留唯一值。但我的代碼拋出錯誤:
def operations_exam(exam):
for index, row in df1.iterrows():
new=row[index].strip().split(' ')
exams=new[0]
year=new[1][1:-1]
f = lambda row: row[index].strip().split(' ')
for index, row in df1.iterrows():
df1["Exams"] = df1["Exams"].apply(f,axis=1)
uj5u.com熱心網友回復:
你可以用兩行來完成:
s = df['Exams'].str.findall(r'(exam\d )\s*\((\d )\)').explode().drop_duplicates().reset_index()[0]
new_df = pd.DataFrame({'Exam': s.str[0], 'Year': s.str[1]})
輸出:
>>> new_df
Exam Year
0 exam7 2017
1 exam9 2018
2 exam3 2018
3 exam2 2017
4 exam8 2018
5 exam6 2017
6 exam10 2019
7 exam4 2019
8 exam1 2016
9 exam5 2020
uj5u.com熱心網友回復:
如果您使用內置函式可能會更簡單:
df = pd.DataFrame({'Exams':['exam7 (2017), exam9 (2018), exam3 (2018)', 'exam7 (2017), exam2 (2017), exam8 (2018)']})
pairs = df['Exams'].str.split(', ').sum()
pairs = [p.split(' ') for p in pairs]
new_df = pd.DataFrame(pairs, columns = ['Exam', 'Year']).drop_duplicates(['Exam', 'Year'])
new_df['Year'] = new_df['Year'].str.extract('(\d )').astype('int')
轉載請註明出處,本文鏈接:https://www.uj5u.com/net/376270.html
