在 Pandas 資料框中,其中一列是系列資料型別,即 food_column,我必須從該列中提取輸出列
Input : food_column
[ 'bread','bread','bread'] ,
[ 'meat','butter','butter'] ,
[ 'meat', 'butter','bread','meat']
['butter']
['bread','meat','bread','meat']
Output : main_column
['bread']
['butter']
['meat']
['butter']
['bread']
狀況:
- 如果任何字串元素重復多次,則應將其選為輸出元素,
- 如果任何兩個或三個元素計數相同,則應從這兩個或三個元素中選擇 np.random.choice
- 如果任何行中只有一個元素,則將該元素分配/映射到輸出列
- 否則將其標記為“未知”以輸出列
uj5u.com熱心網友回復:
import pandas as pd
import random
from collections import Counter
import numpy as np
food_list = [[ 'bread','bread','bread'] ,
['meat','butter','butter'] ,
['meat', 'butter','bread','meat'],
['butter'],
['bread','meat','bread','meat'],
['']]
food_series = pd.Series(food_list)
df = pd.DataFrame({'food_column': food_series})
# randomize list item order, since dict item order is constant in Python 3.6
df['random_food_list'] = [random.sample(z, len(z)) for z in df['food_column'].to_list()]
# get counts
df['food_counts'] = df['random_food_list'].apply(lambda x: Counter(x))
# get key with max value
df['main_column'] = df['food_counts'].apply(lambda x: max(x, key=x.get))
# replace empty strings with 'unknown'
df['main_column'] = np.where(df['main_column'] == '', 'unknown', df['main_column'])
轉載請註明出處,本文鏈接:https://www.uj5u.com/ruanti/409869.html
標籤:
下一篇:在回圈期間更改字串范圍
