我正在嘗試創建一個條件字典或串列,用于從我的 df 轉換串列/字典中的產品名稱,例如“product_milk”串列中的每個專案都將在我的 df 中更改為“DAIRIES”。我的 df 包含簡單的物品,例如全脂牛奶品牌 X 1L 等。
ps:已編輯如何將 np.where 自動化為更智能的方式來為多個串列執行此操作?
df = pd.DataFrame({'product_name':
['WHOLE MILK TIGER 1L','LEITE INTEGRAL UHT COM TAMPA', 'LEITE SEMIDESNATADO UHT COM TAMPA', 'LEITE INTEGRAL UHT','BEER WHITE LION 350ML', 'WHISKY RED LABEL 1L']
})
product_milk = ['WHOLE MILK', 'LEITE INTEGRAL', 'LEITE DESNATADO', 'LEITE SEMIDESNATADO']
product_beer = ['WHISKY', 'BEER']
df['a'] = np.where(df['product_name'].str.contains('|'.join(product_milk), na=False), 'DAIRIES', 0)
df['b'] = np.where(df['product_name'].str.contains('|'.join(product_beer), na=False), 'beer', df['a'])
print(df['b'][df['b'] != '0'])
Obs:我嘗試了嵌套 np,但仍然不聰明,因為我必須手動傳遞每個名稱??和串列,例如“DAIRIES”、“BEER”等。
預期結果:只有一個系列具有此輸出。
注意:重要的是要注意我已經有了這個輸出,但是我使用的方法/方式很差,這就是我想要改變的:
0 DAIRIES
1 DAIRIES
2 DAIRIES
3 DAIRIES
4 beer
5 beer
Name: b, dtype: object
uj5u.com熱心網友回復:
首先,我會將所有資訊保留為dictionary
replacements = {
'DAIRIES': ['WHOLE MILK', 'LEITE INTEGRAL', 'LEITE DESNATADO', 'LEITE SEMIDESNATADO'],
'beer': ['WHISKY', 'BEER'],
}
然后我可以使用for-loop 使它更簡單。
我首先b使用默認值創建列(即空字串或來自 的字串product_name),然后用于mask替換此列中的值
mask = df['product_name'].str.contains('|'.join(product))
df['b'][ mask ] = new_name
完整的作業示例
import pandas as pd
df = pd.DataFrame({
'product_name': ['WHOLE MILK TIGER 1L','LEITE INTEGRAL UHT COM TAMPA', 'LEITE SEMIDESNATADO UHT COM TAMPA', 'LEITE INTEGRAL UHT','BEER WHITE LION 350ML', 'WHISKY RED LABEL 1L']
})
replacements = {
'DAIRIES': ['WHOLE MILK', 'LEITE INTEGRAL', 'LEITE DESNATADO', 'LEITE SEMIDESNATADO'],
'beer': ['WHISKY', 'BEER'],
}
df['b'] = ''
for new_name, products in replacements.items():
mask = df['product_name'].str.contains('|'.join(products))
df['b'][ mask ] = new_name
print(df)
結果:
product_name b
0 WHOLE MILK TIGER 1L DAIRIES
1 LEITE INTEGRAL UHT COM TAMPA DAIRIES
2 LEITE SEMIDESNATADO UHT COM TAMPA DAIRIES
3 LEITE INTEGRAL UHT DAIRIES
4 BEER WHITE LION 350ML beer
5 WHISKY RED LABEL 1L beer
編輯:
其他方法是使用df['product_name'].replace(dictionary, regex=True),但它需要字典
dictionary = {
'.*WHOLE MILK.*': 'DAIRIES'
'.*LEITE INTEGRAL.*': 'DAIRIES',
'.*LEITE DESNATADO.*': 'DAIRIES',
'.*LEITE SEMIDESNATADO.*': 'DAIRIES',
'.*WHISKY.*': 'beer',
'.*BEER.*': 'beer',
}
或者
dictionary = {
'.*(WHOLE MILK|LEITE INTEGRAL|LEITE DESNATADO|LEITE SEMIDESNATADO).*': 'DAIRIES'
'.*(WHISKY|BEER).*': 'beer',
}
完整的作業示例
import pandas as pd
df = pd.DataFrame({
'product_name': ['WHOLE MILK TIGER 1L','LEITE INTEGRAL UHT COM TAMPA', 'LEITE SEMIDESNATADO UHT COM TAMPA', 'LEITE INTEGRAL UHT','BEER WHITE LION 350ML', 'WHISKY RED LABEL 1L']
})
replacements = {
'DAIRIES': ['WHOLE MILK', 'LEITE INTEGRAL', 'LEITE DESNATADO', 'LEITE SEMIDESNATADO'],
'beer': ['WHISKY', 'BEER'],
}
dictionary = {}
# first type of dictionary
#for new_name, products in replacements.items():
# for p in products:
# dictionary[f'.*{p}.*'] = new_name
# second type of dictionary
for new_name, products in replacements.items():
p = '|'.join(products)
dictionary[f'.*({p}).*'] = new_name
df['b'] = df['product_name'].replace(dictionary, regex=True)
print(df)
print('--- dictionary ---')
import pprint
pprint.pprint(dictionary)
結果:
product_name b
0 WHOLE MILK TIGER 1L DAIRIES
1 LEITE INTEGRAL UHT COM TAMPA DAIRIES
2 LEITE SEMIDESNATADO UHT COM TAMPA DAIRIES
3 LEITE INTEGRAL UHT DAIRIES
4 BEER WHITE LION 350ML beer
5 WHISKY RED LABEL 1L beer
--- dictionary ---
{'.*(WHISKY|BEER).*': 'beer',
'.*(WHOLE MILK|LEITE INTEGRAL|LEITE DESNATADO|LEITE SEMIDESNATADO).*': 'DAIRIES'}
轉載請註明出處,本文鏈接:https://www.uj5u.com/net/425624.html
