假設“標簽”列作為商店如下;如何拆分為多列或設定為一個串列?
期望為“要組合為串列并過濾掉重復項
"Tags"
['Saudi', 'law', 'Saudi Arabia', 'rules']
['Hindi', 'Tamil', 'imposition', 'cbse', 'neet', 'Tamil Nadu', 'India']
['Stephen', 'Hawkins', 'Tamil', 'predictions', 'future', 'science', 'scientist', 'top 5', 'five']
['Bigg Boss', 'Tamil', 'Kamal', 'big', 'boss']
['Mary', 'real', 'story', 'Tamil', 'history']
['football', 'Tamil', 'FIFA', '2018', 'world cup', 'MG', 'top', '10', 'ten']
['India', 'Tamil', 'poor', 'rich', 'money', 'MG', 'why', 'Indians']
uj5u.com熱心網友回復:
如果需要沒有重復的串列,請使用集合理解,set如果性能很重要:
L = list(set(y for x in df['Tags'] for y in x))
如果可能,有list像字串一樣保存的s 使用:
import ast
L = list(set(y for x in df['Tags'] for y in ast.literal_eval(x)))
print (L)
['FIFA', 'Mary', 'world cup', 'rich', 'story', 'Tamil', 'rules', 'neet', 'money', 'Kamal', 'Hindi', 'big', 'cbse', 'imposition', 'football', 'MG', 'history', 'predictions', 'why', 'Tamil Nadu', 'top 5', 'ten', '10', 'Bigg Boss', 'India', 'Stephen', 'top', 'poor', 'law', 'Saudi', 'real', 'Indians', 'future', 'boss', 'five', '2018', 'scientist', 'Saudi Arabia', 'science', 'Hawkins']
uj5u.com熱心網友回復:
嘗試:
df["Tags"].explode().unique()
或者:
np.unique(df["Tags"].sum())
編輯:
也許你需要:
import ast
df["Tags"].apply(ast.literal_eval).explode().unique()
uj5u.com熱心網友回復:
您可以展平串列并使用set():
out = []
for lst in df['Tags'].tolist():
out.extend(lst)
out = list(set(out))
輸出:
['cbse', '2018', 'future', 'India', '10', 'Indians', 'money',
'Hindi', 'rules', 'poor', 'Kamal', 'neet', 'top 5', 'world cup',
'five', 'law', 'ten', 'Stephen', 'Tamil', 'Mary', 'Bigg Boss',
'top', 'scientist', 'boss', 'Saudi Arabia', 'big', 'real', 'story',
'why', 'Hawkins', 'predictions', 'football', 'rich', 'science',
'imposition', 'Saudi', 'FIFA', 'history', 'Tamil Nadu', 'MG']
對于下面的串列,使用相同的代碼:
lsts = [['thamizh', 'kannada', 'karnataka', 'bangalore', 'mysore',
'bengaluru', 'Bengaluru', 'malayalam', 'kerala', 'chennai', 'yash',
'kgf', 'songs', 'kannada songs', 'news', 'today'],
['songs', 'kannada songs', 'news', 'today'],
['mysore', 'bengaluru', 'Bengaluru', 'malayalam',]]
輸出:
['today', 'songs', 'malayalam', 'bangalore', 'karnataka', 'kerala',
'bengaluru', 'mysore', 'kgf', 'Bengaluru', 'chennai', 'yash',
'thamizh', 'kannada', 'news', 'kannada songs']
轉載請註明出處,本文鏈接:https://www.uj5u.com/net/389660.html
上一篇:如何提取資料框位置的特定鍵
下一篇:給定日期串列過濾資料框
