我有一個資料集,其中包含歐洲每個國家/地區的兩個最大城市。我想制作一個列作為兩個城市放在一起的串列。下面是資料集外觀的示例。
Country BiggestCity SecondBiggestCity
England London Birmingham
Spain Madrid Barcelona
Sweden Stockholm Gothenburg
Germany Berlin Frankfurt
這是我嘗試過的:
df['BothCities'] = df['BiggestCity'] df['SecondBiggestCity']
但它沒有給我想要的輸出。我想添加一個額外的列BothCities,將兩個城市列在一個串列中。
期望的輸出:
Country BiggestCity SecondBiggestCity BothCities
England London Birmingham [London, Birmingham]
Spain Madrid Barcelona [Madrid, Barcelona]
Sweden Stockholm Gothenburg [Stockholm, Gothenburg]
Germany Berlin Frankfurt [Berlin, Frankfurt]
關于如何做到這一點的任何建議?
uj5u.com熱心網友回復:
試試這個:
df['BothCities'] = df[['BiggestCity', 'SecondBiggestCity']].values.tolist()
print(df)
Country BiggestCity SecondBiggestCity BothCities
0 England London Birmingham [London, Birmingham]
1 Spain Madrid Barcelona [Madrid, Barcelona]
2 Sweden Stockholm Gothenburg [Stockholm, Gothenburg]
3 Germany Berlin Frankfurt [Berlin, Frankfurt]
不需要它,但如果你想以你的方式解決它,你可以像這樣實作它:
你所嘗試的只是你目標的一部分。您只需將兩個字串連接到一個大字串。您可以將它們與分隔符連接起來(例如'/')
df['BothCities'] = df['BiggestCity'] '/' df['SecondBiggestCity']
然后根據該分隔符將其拆分為串列
df['BothCities'] = df['BothCities'].str.split('/')
uj5u.com熱心網友回復:
您需要與串列連接
df['BothCities'] = df['BiggestCity'].apply(lambda x: [x]) df['SecondBiggestCity'].apply(lambda x: [x])
或者你可以試試apply list行
df['BothCities'] = df[['BiggestCity', 'SecondBiggestCity']].apply(list, axis=1)
print(df)
Country BiggestCity SecondBiggestCity BothCities
0 England London Birmingham [London, Birmingham]
1 Spain Madrid Barcelona [Madrid, Barcelona]
2 Sweden Stockholm Gothenburg [Stockholm, Gothenburg]
3 Germany Berlin Frankfurt [Berlin, Frankfurt]
你也可以試試zip
df['BothCities'] = list(map(list, zip(df['BiggestCity'], df['SecondBiggestCity'])))
uj5u.com熱心網友回復:
如果你想要它簡單,你可以用 zip 做一個簡單的串列理解。
df["BothCities"] = [[i,j]for i,j in zip(df['BiggestCity'],df['SecondBiggestCity'])]
或元組以獲得更快的性能
df["BothCities"] = tuple([i,j]for i,j in zip(df['BiggestCity'],df['SecondBiggestCity']))
供您參考(1000行資料)
%timeit [[i,j]for i,j in zip(df['BiggestCity'],df['SecondBiggestCity'])]
304 μs ± 9.55 μs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit tuple([i,j]for i,j in zip(df['BiggestCity'],df['SecondBiggestCity']))
298 μs ± 6.66 μs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
轉載請註明出處,本文鏈接:https://www.uj5u.com/qita/479728.html
上一篇:如何過濾熊貓中的串列值
