我有以下資料框,
df = pd.DataFrame([['Coca-Cola','Coca-Cola Ltd Co'], ['BMW','Company BMW Ltd'], ['Nike','Adidas Ltd and Co.']], columns=['Brand','Company Name'])
我需要根據“品牌”和“公司名稱”列中的相同措辭來協調“狀態”列:
Brand Company Name
Coca-Cola Coca-Cola Ltd Co
BMW Company BMW Ltd
Nike Adidas Ltd and Co.
理想情況下,我希望我的“狀態”列將“相同”回傳到相同的品牌和公司名稱,將“不”回傳到不同的,如下所示:
Brand Company Name Status
Coca-Cola Coca-Cola same
BMW Company BMW Ltd same
Nike Nike Ltd and Co. not
我目前的方法是有效的,但并非適用于所有人,因為某些公司名稱介于兩者之間或不同(具有完整的公司名稱)
my_list=[]
for brands,names in zip(df.Brand,df["Company Name"]):
if brands==names:
my_list.append('same')
else:
my_list.append('not')
請分享您對我如何根據兩列中的相同措辭進行協調的建議。謝謝。
uj5u.com熱心網友回復:
如果品牌名稱不是多個詞,我們可以簡單地Company Name在空白處拆分并查找成員資格:
df['Status'] = df.apply(lambda x: 'same' if x['Brand'] in x['Company Name'].split() else 'not', axis=1)
輸出:
Brand Company Name Status
0 Coca-Cola Coca-Cola Ltd Co same
1 BMW Company BMW Ltd same
2 Nike Adidas Ltd and Co. not
uj5u.com熱心網友回復:
您可以使用List comprehension更好的時間。zip將列合并為一并檢查是否Brand存在于Company Name:
In [251]: df['Status'] = ['same' if x in y else 'not' for x,y in zip(df['Brand'], df['Company Name'].str.split())]
In [252]: df
Out[252]:
Brand Company Name Status
0 Coca-Cola Coca-Cola Ltd Co same
1 BMW Company BMW Ltd same
2 Nike Adidas Ltd and Co. not
時間:
In [261]: def f1():
...: df['Status'] = df.apply(lambda x: 'same' if x['Brand'] in x['Company Name'].split() else 'not', axis=1)
...:
In [259]: def f2():
...: df['Status'] = ['same' if x in y else 'not' for x,y in zip(df['Brand'], df['Company Name'].str.split())]
In [262]: %timeit f1() # @Manlai's solution
468 μs ± 15.2 μs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [263]: %timeit f2() # my solution
360 μs ± 6.67 μs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
轉載請註明出處,本文鏈接:https://www.uj5u.com/houduan/403839.html
標籤:
上一篇:將字串串列轉換為物件
